2-6-2016 Banded Results BCR Test



• Name: the name of the matrix (test)
• N: the dimension of the matrix (number of rows and columns)
• NNZ: number of non-zeros
• SPD: whether the matrix is specified by the user to be symmetric positive definite (values: 0 or 1)
• DB: indicate whether DB reordering is performed. Values: 0 or 1.
• K-DB: the half-bandwidth after DB reordering method (without any drop-off). If DB is specified not to be executed, then this reports the original half-bandwidth
• KnoDrp: the half-bandwidth after DB and CM reordering but before drop-off
• K: the half-bandwidth after reordering and drop-off
• FRate: fill-in rate. See NOTES below
• nuKf: non-uniform K factor. Indicates whether the K changes a lot from row to row. Values are between 0 and 1, with 0 indicating a perfectly uniform bandwidth over the entire matrix. See NOTES below
• Solves: indicates whether we managed to solve the problem or not. OK means solved fine, otherwise a reason is provided for failure
• Bstng: indicates whether we enable diagonal boosting when doing factorization. Values: 0 or 1
• SolAcc: infinity norm of the array storing the relative errors
• T-DB: time to run DB reordering for the matrix on the CPU
• T-CM: time to run CM reordering for the matrix on the CPU
• T-Drop: time to drop off off-diagonal elements to decrease bandwidth. Done on the CPU.
• T-Dtransf: data transfer from CPU to GPU
• T-Asmbl: after reordering and drop-off, copy the sparse matrix to banded matrix stored in GPU memory
• LU-M: LU method (complete, ILUT or ILUULT)
• Fill-in: the fill-in factor of ILUT (-1 indicates complete LU)
• NPrtns: the number of partitions used to solve the problem
• T-BC: time required to get off-diagonal right hand sides (Bs and Cs) from the banded matrix - done on the GPU
• T-LU: LU time
• GFlps-LU: LU GFLOPs
• T-SPK: time to solve for the spikes Vs and Ws - done on the GPU
• T-LUrdcd: time required to factorize the reduced matrices - done on the GPU
• T-PreP: the sum of all preprocessing times, see NOTES
• Kry-M: the method used in Krylov solving stage (can be BiCGStab2 (0), BiCGStab (1), or CG(2))
• nItrs: the number of Krylov-solve iterations to solve the problem
• T-Kry: time spent in the Krylov solver (on the GPU)
• Total: total time to solve the problem, sum of PreProc + T-Kry
• Pardiso: the time for the commercial tool "Pardiso" to solve the problem
• SlwD: the slowdown ratio of our solver compared to Pardiso (a value less than one means that we are faster than Pardiso. The value is shown in green if we run faster and shown in red if we run more than 5 times slower.)
• Fastest: the time when SaP runs fastest historically
• SpdUp: the speedup of this run compared to the historical fastest run (the value is shown in green if the speedup is more than 5% and shown in red if the slowdown is more than 5%)


NOTES:
1) nuKf = 1/(2KN)*[sum over i from 1 to N of (2K - K_{iLeft} - K_{iRight})], where K_{iLeft} is the row half-bandwidth to the left of the diagonal while K_{iRight} is the row half-bandwidth to the right of the diagonal.
2) FRate = the actual number of NNZ / ((2K+1)N).
3) All times reported are in miliseconds (1E-3 second)


N

K

d

NPrtns

Solves

T-LU

T-SwDef

T-MMDef

T-PreP

nItrs

T-SwInf

T-MVInf

T-Kry

T-KryPIt

T-Total

1000000

10

0.008

1

OK

99.9803

111.812

118.953

371.053

0.75

46.5699

149.601

231.687

231.687

602.74

1000000

10

0.008

1

OK

1518.75

1536.19

1.75

10588.2

6050.41

12124.4

1000000

10

0.006

1

OK

80.6893

113.343

118.609

353.046

0.75

47.1849

151.977

234.771

234.771

587.817

1000000

10

0.006

1

OK

1517.3

1533.23

0.75

5895.31

5895.31

7428.54

1000000

10

0.004

1

OK

79.4316

111.782

116.856

348.133

0.75

46.5026

149.437

231.241

231.241

579.374

1000000

10

0.004

1

OK

1514.86

1530.83

1.75

10577

6043.99

12107.8

1000000

10

0.002

1

OK

79.4082

111.82

117.413

348.709

0.75

46.5114

149.489

231.212

231.212

579.921

1000000

10

0.002

1

OK

1514.78

1530.67

1.75

10590.5

6051.7

12121.2

1000000

10

0.001

1

OK

79.7871

111.824

117.543

349.255

0.75

46.5275

149.504

231.223

231.223

580.478

1000000

10

0.001

1

OK

1516.53

1532.67

0.75

5886.24

5886.24

7418.91

1000000

20

0.008

1

OK

118.605

193.51

157.667

514.961

0.75

42.8459

163.57

252.923

252.923

767.884

1000000

20

0.008

1

OK

1654.01

1683.33

1.75

11015.7

6294.7

12699.1

1000000

20

0.006

1

OK

140.664

193.099

166.552

545.647

0.75

42.9932

163.805

254.516

254.516

800.163

1000000

20

0.006

1

OK

1654.27

1683.59

0.75

6126.79

6126.79

7810.38

1000000

20

0.004

1

OK

119.923

195.225

159.322

519.978

0.75

43.2899

165.845

256.004

256.004

775.982

1000000

20

0.004

1

OK

1654.8

1684

2.75

15902.9

5782.88

17586.9

1000000

20

0.002

1

OK

140.926

193.436

158.107

537.974

0.75

42.8047

163.551

252.788

252.788

790.762

1000000

20

0.002

1

OK

1654.76

1684.11

0.75

6120.32

6120.32

7804.43

1000000

20

0.001

1

OK

141.351

193.471

158.812

539.241

0.75

42.9621

163.789

253.451

253.451

792.692

1000000

20

0.001

1

OK

1678.46

1707.81

1.75

11138.1

6364.61

12845.9

1000000

50

0.008

1

OK

268.243

290.868

368.827

1018.61

0.75

49.9164

166.155

282.23

282.23

1300.84

1000000

50

0.008

1

OK

5924.53

5992.77

0.75

7128.53

7128.53

13121.3

1000000

50

0.006

1

OK

289.034

290.571

368.995

1039.43

0.75

50.0369

166.346

282.657

282.657

1322.09

1000000

50

0.006

1

OK

5925.15

5993.6

0.75

7125.4

7125.4

13119

1000000

50

0.004

1

OK

268.11

290.706

368.302

1017.89

0.75

49.9056

166.141

282.281

282.281

1300.17

1000000

50

0.004

1

OK

5925.19

5993.42

0.75

7123.58

7123.58

13117

1000000

50

0.002

1

OK

269.338

290.823

370.418

1024.79

0.75

49.9228

166.146

281.986

281.986

1306.78

1000000

50

0.002

1

OK

5922.28

5990.73

0.75

7126.72

7126.72

13117.5

1000000

50

0.001

1

OK

271.003

293.042

373.129

1028.01

0.75

50.3509

168.453

285.43

285.43

1313.44

1000000

50

0.001

1

OK

5929.16

5997.65

0.75

7140.04

7140.04

13137.7

1000000

100

0.008

1

OK

560.529

1023.66

849.364

2659.95

0.75

71.4263

205.894

387.399

387.399

3047.35

1000000

100

0.008

1

OK

5233.76

5368.74

0.75

7318.91

7318.91

12687.6

1000000

100

0.006

1

OK

561.079

1024.11

849.34

2661.67

0.75

71.5038

206.021

387.492

387.492

3049.16

1000000

100

0.006

1

OK

5229.79

5364.43

0.75

7316

7316

12680.4

1000000

100

0.004

1

OK

580.23

1024.63

849.518

2681.79

0.75

71.4836

206.1

387.493

387.493

3069.28

1000000

100

0.004

1

OK

5282.01

5416.68

0.75

7356.29

7356.29

12773

1000000

100

0.002

1

OK

566.728

1030.41

859.264

2683.49

0.75

71.778

207.769

390.811

390.811

3074.31

1000000

100

0.002

1

OK

5230.42

5364.9

0.75

7315.91

7315.91

12680.8

1000000

100

0.001

1

OK

560.601

1023.61

849.445

2659.8

0.75

71.4096

205.888

387.196

387.196

3046.99

1000000

100

0.001

1

OK

5233.21

5367.79

0.75

7320.89

7320.89

12688.7

1000000

200

0.008

1

OoM (in setup stage)

1000000

200

0.008

1

OoM (in setup stage)

1000000

200

0.006

1

OoM (in setup stage)

1000000

200

0.006

1

OoM (in setup stage)

1000000

200

0.004

1

OoM (in setup stage)

1000000

200

0.004

1

OoM (in setup stage)

1000000

200

0.002

1

OoM (in setup stage)

1000000

200

0.002

1

OoM (in setup stage)

1000000

200

0.001

1

OoM (in setup stage)

1000000

200

0.001

1

OoM (in setup stage)