2-5-2016 Banded Results BCR Test



• Name: the name of the matrix (test)
• N: the dimension of the matrix (number of rows and columns)
• NNZ: number of non-zeros
• SPD: whether the matrix is specified by the user to be symmetric positive definite (values: 0 or 1)
• DB: indicate whether DB reordering is performed. Values: 0 or 1.
• K-DB: the half-bandwidth after DB reordering method (without any drop-off). If DB is specified not to be executed, then this reports the original half-bandwidth
• KnoDrp: the half-bandwidth after DB and CM reordering but before drop-off
• K: the half-bandwidth after reordering and drop-off
• FRate: fill-in rate. See NOTES below
• nuKf: non-uniform K factor. Indicates whether the K changes a lot from row to row. Values are between 0 and 1, with 0 indicating a perfectly uniform bandwidth over the entire matrix. See NOTES below
• Solves: indicates whether we managed to solve the problem or not. OK means solved fine, otherwise a reason is provided for failure
• Bstng: indicates whether we enable diagonal boosting when doing factorization. Values: 0 or 1
• SolAcc: infinity norm of the array storing the relative errors
• T-DB: time to run DB reordering for the matrix on the CPU
• T-CM: time to run CM reordering for the matrix on the CPU
• T-Drop: time to drop off off-diagonal elements to decrease bandwidth. Done on the CPU.
• T-Dtransf: data transfer from CPU to GPU
• T-Asmbl: after reordering and drop-off, copy the sparse matrix to banded matrix stored in GPU memory
• LU-M: LU method (complete, ILUT or ILUULT)
• Fill-in: the fill-in factor of ILUT (-1 indicates complete LU)
• NPrtns: the number of partitions used to solve the problem
• T-BC: time required to get off-diagonal right hand sides (Bs and Cs) from the banded matrix - done on the GPU
• T-LU: LU time
• GFlps-LU: LU GFLOPs
• T-SPK: time to solve for the spikes Vs and Ws - done on the GPU
• T-LUrdcd: time required to factorize the reduced matrices - done on the GPU
• T-PreP: the sum of all preprocessing times, see NOTES
• Kry-M: the method used in Krylov solving stage (can be BiCGStab2 (0), BiCGStab (1), or CG(2))
• nItrs: the number of Krylov-solve iterations to solve the problem
• T-Kry: time spent in the Krylov solver (on the GPU)
• Total: total time to solve the problem, sum of PreProc + T-Kry
• Pardiso: the time for the commercial tool "Pardiso" to solve the problem
• SlwD: the slowdown ratio of our solver compared to Pardiso (a value less than one means that we are faster than Pardiso. The value is shown in green if we run faster and shown in red if we run more than 5 times slower.)
• Fastest: the time when SaP runs fastest historically
• SpdUp: the speedup of this run compared to the historical fastest run (the value is shown in green if the speedup is more than 5% and shown in red if the slowdown is more than 5%)


NOTES:
1) nuKf = 1/(2KN)*[sum over i from 1 to N of (2K - K_{iLeft} - K_{iRight})], where K_{iLeft} is the row half-bandwidth to the left of the diagonal while K_{iRight} is the row half-bandwidth to the right of the diagonal.
2) FRate = the actual number of NNZ / ((2K+1)N).
3) All times reported are in miliseconds (1E-3 second)


N

K

d

NPrtns

Solves

T-LU

T-SwDef

T-MMDef

T-PreP

nItrs

T-SwInf

T-MVInf

T-Kry

T-KryPIt

T-Total

500000

10

0.08

1

OK

54.1894

56.6664

64.0629

196.799

0.75

24.503

76.4015

122.434

122.434

319.233

500000

10

0.08

1

OK

759.047

767.825

2.75

7659.41

2785.24

8427.23

500000

10

0.06

1

OK

48.5217

56.6569

63.3497

190.101

0.75

24.4693

76.34

122.181

122.181

312.282

500000

10

0.06

1

OK

758.436

767.225

0.75

2943.98

2943.98

3711.21

500000

10

0.04

1

OK

43.2565

56.6946

63.0661

184.568

0.75

24.4433

76.2391

122.037

122.037

306.605

500000

10

0.04

1

OK

767.719

776.505

2.75

7735.94

2813.07

8512.45

500000

10

0.02

1

OK

54.0404

56.6479

63.1539

195.638

0.75

24.4269

76.2225

121.884

121.884

317.522

500000

10

0.02

1

OK

757.779

766.568

2.75

7654

2783.27

8420.56

500000

10

0.01

1

OK

53.8457

56.6606

64.0978

196.498

0.75

24.5131

76.4177

122.463

122.463

318.961

500000

10

0.01

1

OK

758.641

767.397

1.25

4124.73

3299.79

4892.13

500000

20

0.08

1

OK

64.0145

98.5108

90.2292

277.324

0.75

22.9954

83.2662

133.501

133.501

410.825

500000

20

0.08

1

OK

827.916

843.659

0.75

3061.75

3061.75

3905.41

500000

20

0.06

1

OK

74.6267

98.374

91.0998

288.936

0.75

23.0767

83.4032

133.693

133.693

422.629

500000

20

0.06

1

OK

827.265

842.884

1.75

5510.03

3148.59

6352.91

500000

20

0.04

1

OK

74.3864

98.3311

90.593

288.067

0.75

22.9724

83.2039

133.174

133.174

421.241

500000

20

0.04

1

OK

828.05

843.763

2.75

7958.91

2894.15

8802.68

500000

20

0.02

1

OK

74.2546

98.3352

91.2356

288.741

1.75

41.5055

150.064

238.415

136.237

527.156

500000

20

0.02

1

OK

827.491

843.115

1.75

5514.52

3151.15

6357.63

500000

20

0.01

1

OK

74.1594

98.3124

90.6977

288.02

0.75

22.9782

83.1988

133.21

133.21

421.23

500000

20

0.01

1

OK

828.008

843.706

1.75

5510.32

3148.76

6354.03

500000

50

0.08

1

OK

137.873

149.205

193.217

526.97

0.75

27.6907

84.3658

147.291

147.291

674.261

500000

50

0.08

1

OK

2961.3

2996.32

1.75

6413.63

3664.93

9409.94

500000

50

0.06

1

OK

149.083

150.44

195.297

541.586

0.75

27.9293

85.497

148.21

148.21

689.796

500000

50

0.06

1

OK

2962.88

2997.8

0.75

3564.95

3564.95

6562.75

500000

50

0.04

1

OK

147.821

149.187

193.759

537.511

0.75

27.8263

84.5668

147.435

147.435

684.946

500000

50

0.04

1

OK

3006.74

3041.7

0.75

3607.97

3607.97

6649.67

500000

50

0.02

1

OK

137.846

149.35

193.132

526.919

0.75

27.7545

84.409

146.818

146.818

673.737

500000

50

0.02

1

OK

2960.81

2995.79

2.75

9262.72

3368.26

12258.5

500000

50

0.01

1

OK

147.499

149.381

193.269

536.833

0.75

27.7221

84.3614

146.734

146.734

683.567

500000

50

0.01

1

OK

2960.4

2995.34

0.75

3563.1

3563.1

6558.43

500000

100

0.08

1

OK

293.743

525.132

432.228

1365.42

0.75

40.1337

104.378

198.705

198.705

1564.12

500000

100

0.08

1

OK

2633.58

2701.6

1.75

6585.72

3763.27

9287.31

500000

100

0.06

1

OK

289.14

529.322

437.447

1370.51

0.75

40.3706

105.418

199.745

199.745

1570.25

500000

100

0.06

1

OK

2634.03

2702.27

0.75

3657.35

3657.35

6359.62

500000

100

0.04

1

OK

285.153

524.89

432.509

1356.9

0.75

40.1117

104.342

198.596

198.596

1555.5

500000

100

0.04

1

OK

2617.75

2685.77

1.75

6585.44

3763.11

9271.21

500000

100

0.02

1

OK

288.561

528.613

437.389

1369.38

0.75

40.374

105.307

199.999

199.999

1569.38

500000

100

0.02

1

OK

2616.67

2684.98

1.75

6587.27

3764.15

9272.25

500000

100

0.01

1

OK

293.819

524.95

432.174

1366.23

0.75

40.1003

104.364

198.682

198.682

1564.91

500000

100

0.01

1

OK

2617.67

2685.95

0.75

3661.48

3661.48

6347.43

500000

200

0.08

1

OK

917.847

1431.74

1162.34

3813.98

0.25

45.9485

88.7145

191.841

191.841

4005.82

500000

200

0.08

1

OK

3455.31

3593.43

0.25

2321.6

2321.6

5915.03

500000

200

0.06

1

OK

918.295

1430.52

1161.03

3812.25

0.25

46.039

88.8082

191.562

191.562

4003.81

500000

200

0.06

1

OK

3449.96

3587.78

0.25

2319.42

2319.42

5907.21

500000

200

0.04

1

OK

918.062

1432.36

1160.45

3812.6

0.75

76.52

147.863

318.565

318.565

4131.17

500000

200

0.04

1

OK

3453.43

3591.42

1.75

6962.81

3978.75

10554.2

500000

200

0.02

1

OK

910.569

1431.03

1160.91

3804.46

0.75

76.6964

148.04

319.124

319.124

4123.58

500000

200

0.02

1

OK

3454.14

3592.12

0.75

3870.53

3870.53

7462.65

500000

200

0.01

1

OK

919.269

1432.81

1161.11

3813.94

1.25

107.393

207.228

444.778

355.822

4258.72

500000

200

0.01

1

OK

3451.91

3590.02

0.75

3866.92

3866.92

7456.94

500000

500

0.08

1

OoM (in setup stage)

500000

500

0.08

1

OoM (in setup stage)

500000

500

0.06

1

OoM (in setup stage)

500000

500

0.06

1

OoM (in setup stage)

500000

500

0.04

1

OoM (in setup stage)

500000

500

0.04

1

OoM (in setup stage)

500000

500

0.02

1

OoM (in setup stage)

500000

500

0.02

1

OoM (in setup stage)

500000

500

0.01

1

OoM (in setup stage)

500000

500

0.01

1

OoM (in setup stage)