3-21-2016 Banded Results



• Name: the name of the matrix (test)
• N: the dimension of the matrix (number of rows and columns)
• NNZ: number of non-zeros
• SPD: whether the matrix is specified by the user to be symmetric positive definite (values: 0 or 1)
• DB: indicate whether DB reordering is performed. Values: 0 or 1.
• K-DB: the half-bandwidth after DB reordering method (without any drop-off). If DB is specified not to be executed, then this reports the original half-bandwidth
• KnoDrp: the half-bandwidth after DB and CM reordering but before drop-off
• K: the half-bandwidth after reordering and drop-off
• FRate: fill-in rate. See NOTES below
• nuKf: non-uniform K factor. Indicates whether the K changes a lot from row to row. Values are between 0 and 1, with 0 indicating a perfectly uniform bandwidth over the entire matrix. See NOTES below
• Solves: indicates whether we managed to solve the problem or not. OK means solved fine, otherwise a reason is provided for failure
• Bstng: indicates whether we enable diagonal boosting when doing factorization. Values: 0 or 1
• SolAcc: infinity norm of the array storing the relative errors
• T-DB: time to run DB reordering for the matrix on the CPU
• T-CM: time to run CM reordering for the matrix on the CPU
• T-Drop: time to drop off off-diagonal elements to decrease bandwidth. Done on the CPU.
• T-Dtransf: data transfer from CPU to GPU
• T-Asmbl: after reordering and drop-off, copy the sparse matrix to banded matrix stored in GPU memory
• LU-M: LU method (complete, ILUT or ILUULT)
• Fill-in: the fill-in factor of ILUT (-1 indicates complete LU)
• NPrtns: the number of partitions used to solve the problem
• T-BC: time required to get off-diagonal right hand sides (Bs and Cs) from the banded matrix - done on the GPU
• T-LU: LU time
• GFlps-LU: LU GFLOPs
• T-SPK: time to solve for the spikes Vs and Ws - done on the GPU
• T-LUrdcd: time required to factorize the reduced matrices - done on the GPU
• T-PreP: the sum of all preprocessing times, see NOTES
• Kry-M: the method used in Krylov solving stage (can be BiCGStab2 (0), BiCGStab (1), or CG(2))
• nItrs: the number of Krylov-solve iterations to solve the problem
• T-Kry: time spent in the Krylov solver (on the GPU)
• Total: total time to solve the problem, sum of PreProc + T-Kry
• Pardiso: the time for the commercial tool "Pardiso" to solve the problem
• SlwD: the slowdown ratio of our solver compared to Pardiso (a value less than one means that we are faster than Pardiso. The value is shown in green if we run faster and shown in red if we run more than 5 times slower.)
• Fastest: the time when SaP runs fastest historically
• SpdUp: the speedup of this run compared to the historical fastest run (the value is shown in green if the speedup is more than 5% and shown in red if the slowdown is more than 5%)


NOTES:
1) nuKf = 1/(2KN)*[sum over i from 1 to N of (2K - K_{iLeft} - K_{iRight})], where K_{iLeft} is the row half-bandwidth to the left of the diagonal while K_{iRight} is the row half-bandwidth to the right of the diagonal.
2) FRate = the actual number of NNZ / ((2K+1)N).
3) All times reported are in miliseconds (1E-3 second)


N

K

d

NPrtns

Solves

RelR

T-LU

T-SwDef

T-MMDef

T-PreP

nItrs

T-SwInf

T-MVInf

T-Kry

T-KryPIt

T-Total

200000

50

0.6

1

OK

6.4545e-16

1189.13

1205.01

0.25

1152.16

1152.16

2357.17

200000

50

0.6

1

OK

6.4545e-16

1186.88

1203.42

0.25

1146.08

1146.08

2349.5

200000

50

0.6

10

OK

6.44883e-16

243.297

269.83

0.25

252.198

252.198

522.028

200000

50

0.6

10

OK

6.71695e-10

122.067

137.972

2.25

601.578

267.368

739.55

200000

50

0.6

20

OK

6.44352e-16

148.985

174.86

0.25

134.866

134.866

309.726

200000

50

0.6

20

OK

4.26468e-11

76.0203

91.385

2.25

381.257

169.448

472.642

200000

50

0.6

30

OK

6.44213e-16

101.113

128.639

0.25

97.194

97.194

225.833

200000

50

0.6

30

OK

9.29296e-10

52.7541

70.062

2.25

258.128

114.724

328.19

200000

50

0.6

40

OK

6.44208e-16

95.4677

122.064

0.25

81.285

81.285

203.349

200000

50

0.6

40

OK

5.34755e-10

50.4793

65.713

2.25

197.755

87.8911

263.468

200000

50

0.6

50

OK

6.42889e-16

112.642

140.915

0.25

67.86

67.86

208.775

200000

50

0.6

50

OK

7.00518e-10

77.4867

92.625

2.25

170.839

75.9284

263.464

200000

50

0.6

60

OK

6.43575e-16

107.314

136.306

0.25

59.662

59.662

195.968

200000

50

0.6

60

OK

9.17581e-10

65.5236

81.541

2.25

151.587

67.372

233.128

200000

50

0.6

70

OK

6.42548e-16

121.131

150.118

0.25

56.808

56.808

206.926

200000

50

0.6

70

OK

5.69506e-11

56.1736

72.087

2.25

160.939

71.5284

233.026

200000

50

0.6

80

OK

6.4345e-16

107.81

140.221

0.25

53.734

53.734

193.955

200000

50

0.6

80

OK

7.94136e-10

50.2298

66.122

2.25

154.162

68.5164

220.284

200000

50

0.6

90

OK

6.41755e-16

101.061

132.167

0.25

49.922

49.922

182.089

200000

50

0.6

90

OK

9.74425e-10

45.7136

60.801

2.25

112.779

50.124

173.58

200000

50

0.6

100

OK

6.41188e-16

106.184

139.324

0.25

50.407

50.407

189.731

200000

50

0.6

100

OK

6.36279e-10

56.2221

72.225

2.25

111.552

49.5787

183.777

200000

200

0.6

1

OK

1.20882e-15

1429.83

1487.25

0.25

1245.91

1245.91

2733.16

200000

200

0.6

1

OK

1.20882e-15

1430.01

1489.69

0.25

1244.1

1244.1

2733.79

200000

200

0.6

10

OK

1.20797e-15

1182.69

1286.57

0.25

290.593

290.593

1577.16

200000

200

0.6

10

OK

2.15342e-11

481.025

538.965

1.75

579.843

331.339

1118.81

200000

200

0.6

20

OK

1.20444e-15

1071.23

1200.6

0.25

181.525

181.525

1382.13

200000

200

0.6

20

OK

5.16992e-11

418.404

477.008

1.75

354.264

202.437

831.272

200000

200

0.6

30

OK

1.19999e-15

1051.87

1202.39

0.25

141.801

141.801

1344.19

200000

200

0.6

30

OK

3.67543e-11

408.432

466.182

1.75

278.112

158.921

744.294

200000

200

0.6

40

OK

1.19773e-15

1010.78

1186.68

0.25

123.1

123.1

1309.78

200000

200

0.6

40

OK

5.07585e-11

377.196

436.006

1.75

239.855

137.06

675.861

200000

200

0.6

50

OK

1.19337e-15

1000.74

1202.86

0.25

117.898

117.898

1320.76

200000

200

0.6

50

OK

2.11406e-10

371.367

429.212

1.75

217.531

124.303

646.743

200000

200

0.6

60

OK

1.19352e-15

1008.96

1234.46

0.25

110.139

110.139

1344.6

200000

200

0.6

60

OK

2.50188e-12

382.431

441.038

2.25

239.433

106.415

680.471

200000

200

0.6

70

OK

1.18627e-15

1002.62

1248.92

0.25

111.195

111.195

1360.12

200000

200

0.6

70

OK

4.114e-11

380.035

438.839

1.75

198.461

113.406

637.3

200000

200

0.6

80

OK

1.18456e-15

978.278

1249.44

0.25

111.339

111.339

1360.78

200000

200

0.6

80

OK

2.67732e-12

362.993

421.299

2.25

226.784

100.793

648.083

200000

200

0.6

90

OK

1.18216e-15

983.443

1271.31

0.25

109.658

109.658

1380.97

200000

200

0.6

90

OK

3.56348e-11

369.916

428.316

1.75

187.08

106.903

615.396

200000

200

0.6

100

OK

1.17747e-15

970.309

1277.8

0.25

109.211

109.211

1387.01

200000

200

0.6

100

OK

3.18728e-11

358.97

416.788

1.75

180.852

103.344

597.64