3-23-2016 Banded Results BCR Test



• Name: the name of the matrix (test)
• N: the dimension of the matrix (number of rows and columns)
• NNZ: number of non-zeros
• SPD: whether the matrix is specified by the user to be symmetric positive definite (values: 0 or 1)
• DB: indicate whether DB reordering is performed. Values: 0 or 1.
• K-DB: the half-bandwidth after DB reordering method (without any drop-off). If DB is specified not to be executed, then this reports the original half-bandwidth
• KnoDrp: the half-bandwidth after DB and CM reordering but before drop-off
• K: the half-bandwidth after reordering and drop-off
• FRate: fill-in rate. See NOTES below
• nuKf: non-uniform K factor. Indicates whether the K changes a lot from row to row. Values are between 0 and 1, with 0 indicating a perfectly uniform bandwidth over the entire matrix. See NOTES below
• Solves: indicates whether we managed to solve the problem or not. OK means solved fine, otherwise a reason is provided for failure
• Bstng: indicates whether we enable diagonal boosting when doing factorization. Values: 0 or 1
• SolAcc: infinity norm of the array storing the relative errors
• T-DB: time to run DB reordering for the matrix on the CPU
• T-CM: time to run CM reordering for the matrix on the CPU
• T-Drop: time to drop off off-diagonal elements to decrease bandwidth. Done on the CPU.
• T-Dtransf: data transfer from CPU to GPU
• T-Asmbl: after reordering and drop-off, copy the sparse matrix to banded matrix stored in GPU memory
• LU-M: LU method (complete, ILUT or ILUULT)
• Fill-in: the fill-in factor of ILUT (-1 indicates complete LU)
• NPrtns: the number of partitions used to solve the problem
• T-BC: time required to get off-diagonal right hand sides (Bs and Cs) from the banded matrix - done on the GPU
• T-LU: LU time
• GFlps-LU: LU GFLOPs
• T-SPK: time to solve for the spikes Vs and Ws - done on the GPU
• T-LUrdcd: time required to factorize the reduced matrices - done on the GPU
• T-PreP: the sum of all preprocessing times, see NOTES
• Kry-M: the method used in Krylov solving stage (can be BiCGStab2 (0), BiCGStab (1), or CG(2))
• nItrs: the number of Krylov-solve iterations to solve the problem
• T-Kry: time spent in the Krylov solver (on the GPU)
• Total: total time to solve the problem, sum of PreProc + T-Kry
• Pardiso: the time for the commercial tool "Pardiso" to solve the problem
• SlwD: the slowdown ratio of our solver compared to Pardiso (a value less than one means that we are faster than Pardiso. The value is shown in green if we run faster and shown in red if we run more than 5 times slower.)
• Fastest: the time when SaP runs fastest historically
• SpdUp: the speedup of this run compared to the historical fastest run (the value is shown in green if the speedup is more than 5% and shown in red if the slowdown is more than 5%)


NOTES:
1) nuKf = 1/(2KN)*[sum over i from 1 to N of (2K - K_{iLeft} - K_{iRight})], where K_{iLeft} is the row half-bandwidth to the left of the diagonal while K_{iRight} is the row half-bandwidth to the right of the diagonal.
2) FRate = the actual number of NNZ / ((2K+1)N).
3) All times reported are in miliseconds (1E-3 second)


N

K

d

NPrtns

Solves

RelR

T-LU

T-SwDef

T-MMDef

T-PreP

nItrs

T-SwInf

T-MVInf

T-Kry

T-KryPIt

T-Total

200000

50

1.2

50

200000

50

1.2

80

OK

1.81046e-11

50.2802

84.441

1.75

102.791

58.7377

187.232

200000

200

1.2

50

200000

200

1.2

80

OK

2.50688e-12

362.847

492.936

1.75

174.186

99.5349

667.122

200000

50

1

50

200000

50

1

80

OK

7.84293e-11

50.1698

84.389

1.75

102.642

58.6526

187.031

200000

200

1

50

200000

200

1

80

OK

1.52629e-11

366.935

496.966

1.75

175.067

100.038

672.033

200000

50

0.8

50

200000

50

0.8

80

OK

4.73113e-10

50.1831

84.28

1.75

102.606

58.632

186.886

200000

200

0.8

50

200000

200

0.8

80

OK

2.38516e-10

361.958

492

1.75

174.179

99.5309

666.179

200000

50

0.6

50

200000

50

0.6

80

OK

7.94136e-10

50.81

84.909

2.25

139.633

62.0591

224.542

200000

200

0.6

50

200000

200

0.6

80

OK

2.67732e-12

365.838

495.831

2.25

207.711

92.316

703.542

200000

50

0.4

50

200000

50

0.4

80

OK

1.49434e-10

49.9666

84.225

2.75

143.263

52.0956

227.488

200000

200

0.4

50

200000

200

0.4

80

OK

1.01774e-10

361.901

492.194

2.25

207.81

92.36

700.004

200000

50

0.2

50

200000

50

0.2

80

OK

6.36112e-10

50.5178

84.732

4.75

226.077

47.5952

310.809

200000

200

0.2

50

200000

200

0.2

80

OK

5.10344e-10

361.904

492.064

2.75

244.255

88.82

736.319

200000

50

0.1

50

200000

50

0.1

80

NConv

1

50.0296

84.182

101.25

4095.96

40.4539

4180.14

200000

200

0.1

50

200000

200

0.1

80

OK

5.96008e-10

361.517

491.804

4.75

382.736

80.576

874.54

200000

50

0.08

50

200000

50

0.08

80

NConv

1

50.0118

84.157

101.25

4105.11

40.5443

4189.26

200000

200

0.08

50

200000

200

0.08

80

OK

5.41432e-10

363.035

494.794

6.75

534.329

79.1599

1029.12

200000

50

0.06

50

200000

50

0.06

80

NConv

1

50.6959

84.986

101.25

4092.15

40.4163

4177.14

200000

200

0.06

50

200000

200

0.06

80

NConv

1.7933e-09

366.599

498.535

101.25

7183.61

70.9492

7682.14

200000

50

0.04

50

200000

50

0.04

80

NConv

1

51.6889

86.704

101.25

4103.86

40.532

4190.56

200000

200

0.04

50

200000

200

0.04

80

NConv

1

362.274

493.78

101.25

7157.07

70.6872

7650.85

200000

50

0.02

50

200000

50

0.02

80

NConv

1

51.0954

85.974

101.25

4323.79

42.7041

4409.77

200000

200

0.02

50

200000

200

0.02

80

NConv

1

362.189

493.54

101.25

7309.94

72.1969

7803.48

200000

50

0.01

50

200000

50

0.01

80

NConv

1

50.7688

85.664

101.25

4107.17

40.5647

4192.84

200000

200

0.01

50

200000

200

0.01

80

NConv

1

362.276

493.956

101.25

7142.68

70.545

7636.64

1000

10

0.6

50

1000

10

0.6

80

OK

2.36381e-10

1.08246

4.67

4.75

15.626

3.28968

20.296

2000

10

0.6

50

2000

10

0.6

80

OK

2.97092e-10

1.09888

4.529

4.25

14.885

3.50235

19.414

5000

10

0.6

50

5000

10

0.6

80

OK

6.41246e-10

1.18355

5.632

3.75

14.924

3.97973

20.556

10000

10

0.6

50

10000

10

0.6

80

OK

9.39939e-10

1.28915

6.402

3.75

18.092

4.82453

24.494

20000

10

0.6

50

20000

10

0.6

80

OK

5.87459e-10

1.28026

6.074

3.75

20.091

5.3576

26.165

50000

10

0.6

50

50000

10

0.6

80

OK

7.00301e-10

3.69251

9.889

3.75

50.011

13.3363

59.9

100000

10

0.6

50

100000

10

0.6

80

OK

2.63041e-10

4.8257

12.292

3.75

79.961

21.3229

92.253

200000

10

0.6

50

200000

10

0.6

80

OK

3.00918e-10

7.23958

17.494

3.75

162.392

43.3045

179.886

500000

10

0.6

50

500000

10

0.6

80

OK

2.25904e-10

14.0125

33.683

3.75

332.127

88.5672

365.81

1000000

10

0.6

50

1000000

10

0.6

80

OK

9.93146e-10

25.4663

61.937

3.25

610.381

187.81

672.318

1000

20

0.6

50

1000

20

0.6

80

OK

1.16167e-10

1.11613

4.554

3.25

12.392

3.81292

16.946

2000

20

0.6

50

2000

20

0.6

80

OK

7.681e-10

1.20019

5.626

3.25

11.555

3.55538

17.181

5000

20

0.6

50

5000

20

0.6

80

OK

8.67865e-10

1.30838

6.424

2.75

11.867

4.31527

18.291

10000

20

0.6

50

10000

20

0.6

80

OK

9.73068e-10

1.60493

6.768

2.75

14.71

5.34909

21.478

20000

20

0.6

50

20000

20

0.6

80

OK

6.92831e-10

3.62118

9.235

2.75

20.074

7.29964

29.309

50000

20

0.6

50

50000

20

0.6

80

OK

6.23817e-10

5.22384

12.294

2.75

41.121

14.9531

53.415

100000

20

0.6

50

100000

20

0.6

80

OK

8.57747e-10

7.38781

16.26

2.75

61.515

22.3691

77.775

200000

20

0.6

50

200000

20

0.6

80

OK

2.98509e-10

13.0572

28.349

2.75

126.137

45.868

154.486

500000

20

0.6

50

500000

20

0.6

80

OK

3.42927e-10

29.5594

64.516

2.75

269.718

98.0793

334.234

1000000

20

0.6

50

1000000

20

0.6

80

OK

1.40223e-10

56.8884

124.432

2.75

512.457

186.348

636.889

1000

50

0.6

50

1000

50

0.6

80

OK

5.73815e-10

1.24813

4.456

2.25

8.75

3.88889

13.206

2000

50

0.6

50

2000

50

0.6

80

OK

3.33326e-10

1.34909

5.008

2.25

8.967

3.98533

13.975

5000

50

0.6

50

5000

50

0.6

80

OK

9.14694e-10

3.19242

7.277

2.25

9.557

4.24756

16.834

10000

50

0.6

50

10000

50

0.6

80

OK

3.29765e-10

3.78806

8.231

2.25

11.571

5.14267

19.802

20000

50

0.6

50

20000

50

0.6

80

OK

2.65793e-10

6.27494

11.824

2.25

17.662

7.84978

29.486

50000

50

0.6

50

50000

50

0.6

80

OK

3.55509e-10

13.7994

23.909

2.25

39.665

17.6289

63.574

100000

50

0.6

50

100000

50

0.6

80

OK

9.59934e-10

25.7476

43.761

2.25

59.805

26.58

103.566

200000

50

0.6

50

200000

50

0.6

80

OK

7.94136e-10

50.1252

84.413

2.25

139.371

61.9427

223.784

500000

50

0.6

50

500000

50

0.6

80

OK

6.80824e-10

124.481

206.927

2.25

268.081

119.147

475.008

1000000

50

0.6

50

1000000

50

0.6

80

OK

4.20445e-10

249.261

411.402

2.25

525.895

233.731

937.297

1000

100

0.6

50

1000

100

0.6

80

OK

3.93268e-10

2.3777

6.147

2.25

10.733

4.77022

16.88

2000

100

0.6

50

2000

100

0.6

80

OK

2.4168e-10

3.34659

7.332

2.25

11.146

4.95378

18.478

5000

100

0.6

50

5000

100

0.6

80

OK

3.96534e-10

5.2968

9.826

2.25

11.921

5.29822

21.747

10000

100

0.6

50

10000

100

0.6

80

OK

1.30014e-10

7.6535

13.088

2.25

13.515

6.00667

26.603

20000

100

0.6

50

20000

100

0.6

80

OK

2.88389e-10

13.2968

21.61

2.25

17.776

7.90044

39.386

50000

100

0.6

50

50000

100

0.6

80

OK

6.54399e-10

34.4213

52.425

1.75

34.372

19.6411

86.797

100000

100

0.6

50

100000

100

0.6

80

OK

4.57002e-10

70.1992

104.177

1.75

62.243

35.5674

166.42

200000

100

0.6

50

200000

100

0.6

80

OK

3.2852e-10

137.104

202.93

1.75

123.614

70.6366

326.544

500000

100

0.6

50

500000

100

0.6

80

OK

3.68961e-10

349.822

512.12

1.75

288.372

164.784

800.492

1000000

100

0.6

50

1000000

100

0.6

80

OK

1.9837e-10

682.797

1005

1.75

574.459

328.262

1579.46

1000

200

0.6

50

1000

200

0.6

80

OK

6.5029e-11

5.03626

7.964

1.75

16.745

9.56857

24.709

2000

200

0.6

50

2000

200

0.6

80

OK

1.19787e-10

6.86211

11.171

1.75

14.828

8.47314

25.999

5000

200

0.6

50

5000

200

0.6

80

OK

8.07397e-11

11.3491

16.892

1.75

16.357

9.34686

33.249

10000

200

0.6

50

10000

200

0.6

80

OK

8.76396e-10

17.6957

25.883

1.75

18.267

10.4383

44.15

20000

200

0.6

50

20000

200

0.6

80

OK

2.72961e-10

31.3579

45.89

1.75

22.385

12.7914

68.275

50000

200

0.6

50

50000

200

0.6

80

OK

2.01488e-10

85.7212

119.87

1.75

45.943

26.2531

165.813

100000

200

0.6

50

100000

200

0.6

80

OK

4.92705e-11

180.329

246.629

1.75

86.169

49.2394

332.798

200000

200

0.6

50

200000

200

0.6

80

OK

2.67732e-12

365.664

496.132

2.25

207.868

92.3858

704

500000

200

0.6

50

500000

200

0.6

80

OK

3.64618e-11

945.387

1270.05

1.75

424.954

242.831

1695

1000000

200

0.6

50

OoM (in setup stage)

1000000

200

0.6

80

OoM (in setup stage)

1000

500

0.6

50

OK

1.59476e-15

12.6832

15.02

0.25

10.935

10.935

25.955

1000

500

0.6

80

OK

1.59476e-15

12.6831

14.977

0.25

10.899

10.899

25.876

2000

500

0.6

50

2000

500

0.6

80

OK

7.36592e-12

20.9175

25.077

1.75

33.086

18.9063

58.163

5000

500

0.6

50

5000

500

0.6

80

OK

1.65699e-12

42.3631

51.542

1.75

35.378

20.216

86.92

10000

500

0.6

50

10000

500

0.6

80

OK

1.90938e-11

77.0392

94.218

1.75

39.492

22.5669

133.71

20000

500

0.6

50

20000

500

0.6

80

OK

2.8749e-11

146.465

181.155

1.75

49.762

28.4354

230.917

50000

500

0.6

50

50000

500

0.6

80

OK

4.23415e-12

358.246

441.376

1.75

88.285

50.4486

529.661

100000

500

0.6

50

100000

500

0.6

80

OK

5.5514e-12

774.164

941.487

1.75

195.706

111.832

1137.19

200000

500

0.6

50

200000

500

0.6

80

OK

4.12283e-12

1649.83

1983.08

1.75

411.425

235.1

2394.51

500000

500

0.6

50

OoM (in setup stage)

500000

500

0.6

80

OoM (in setup stage)