3-24-2016 Banded Results BCR Test



• Name: the name of the matrix (test)
• N: the dimension of the matrix (number of rows and columns)
• NNZ: number of non-zeros
• SPD: whether the matrix is specified by the user to be symmetric positive definite (values: 0 or 1)
• DB: indicate whether DB reordering is performed. Values: 0 or 1.
• K-DB: the half-bandwidth after DB reordering method (without any drop-off). If DB is specified not to be executed, then this reports the original half-bandwidth
• KnoDrp: the half-bandwidth after DB and CM reordering but before drop-off
• K: the half-bandwidth after reordering and drop-off
• FRate: fill-in rate. See NOTES below
• nuKf: non-uniform K factor. Indicates whether the K changes a lot from row to row. Values are between 0 and 1, with 0 indicating a perfectly uniform bandwidth over the entire matrix. See NOTES below
• Solves: indicates whether we managed to solve the problem or not. OK means solved fine, otherwise a reason is provided for failure
• Bstng: indicates whether we enable diagonal boosting when doing factorization. Values: 0 or 1
• SolAcc: infinity norm of the array storing the relative errors
• T-DB: time to run DB reordering for the matrix on the CPU
• T-CM: time to run CM reordering for the matrix on the CPU
• T-Drop: time to drop off off-diagonal elements to decrease bandwidth. Done on the CPU.
• T-Dtransf: data transfer from CPU to GPU
• T-Asmbl: after reordering and drop-off, copy the sparse matrix to banded matrix stored in GPU memory
• LU-M: LU method (complete, ILUT or ILUULT)
• Fill-in: the fill-in factor of ILUT (-1 indicates complete LU)
• NPrtns: the number of partitions used to solve the problem
• T-BC: time required to get off-diagonal right hand sides (Bs and Cs) from the banded matrix - done on the GPU
• T-LU: LU time
• GFlps-LU: LU GFLOPs
• T-SPK: time to solve for the spikes Vs and Ws - done on the GPU
• T-LUrdcd: time required to factorize the reduced matrices - done on the GPU
• T-PreP: the sum of all preprocessing times, see NOTES
• Kry-M: the method used in Krylov solving stage (can be BiCGStab2 (0), BiCGStab (1), or CG(2))
• nItrs: the number of Krylov-solve iterations to solve the problem
• T-Kry: time spent in the Krylov solver (on the GPU)
• Total: total time to solve the problem, sum of PreProc + T-Kry
• Pardiso: the time for the commercial tool "Pardiso" to solve the problem
• SlwD: the slowdown ratio of our solver compared to Pardiso (a value less than one means that we are faster than Pardiso. The value is shown in green if we run faster and shown in red if we run more than 5 times slower.)
• Fastest: the time when SaP runs fastest historically
• SpdUp: the speedup of this run compared to the historical fastest run (the value is shown in green if the speedup is more than 5% and shown in red if the slowdown is more than 5%)


NOTES:
1) nuKf = 1/(2KN)*[sum over i from 1 to N of (2K - K_{iLeft} - K_{iRight})], where K_{iLeft} is the row half-bandwidth to the left of the diagonal while K_{iRight} is the row half-bandwidth to the right of the diagonal.
2) FRate = the actual number of NNZ / ((2K+1)N).
3) All times reported are in miliseconds (1E-3 second)


N

K

d

NPrtns

Solves

RelR

T-LU

T-SwDef

T-MMDef

T-PreP

nItrs

T-SwInf

T-MVInf

T-Kry

T-KryPIt

T-Total

200000

50

1.2

50

200000

50

1.2

80

OK

1.81046e-11

50.2092

75.317

1.75

102.488

58.5646

177.805

200000

200

1.2

50

200000

200

1.2

80

OK

2.50688e-12

365.058

459.969

1.75

174.535

99.7343

634.504

200000

50

1

50

200000

50

1

80

OK

7.84293e-11

50.4963

75.659

1.75

103.211

58.9777

178.87

200000

200

1

50

200000

200

1

80

OK

1.52629e-11

361.142

455.926

1.75

173.574

99.1851

629.5

200000

50

0.8

50

200000

50

0.8

80

OK

4.73113e-10

49.4991

74.61

1.75

103.215

58.98

177.825

200000

200

0.8

50

200000

200

0.8

80

OK

2.38516e-10

364.081

459.039

1.75

174.578

99.7589

633.617

200000

50

0.6

50

200000

50

0.6

80

OK

7.94136e-10

49.7497

74.823

2.25

139.15

61.8444

213.973

200000

200

0.6

50

200000

200

0.6

80

OK

2.67732e-12

360.532

455.324

2.25

206.897

91.9542

662.221

200000

50

0.4

50

200000

50

0.4

80

OK

1.49434e-10

48.9224

74.001

2.75

142.445

51.7982

216.446

200000

200

0.4

50

200000

200

0.4

80

OK

1.01774e-10

360.301

455.233

2.25

206.896

91.9538

662.129

200000

50

0.2

50

200000

50

0.2

80

OK

6.36112e-10

48.9258

73.944

4.75

223.227

46.9952

297.171

200000

200

0.2

50

200000

200

0.2

80

OK

5.10344e-10

360.278

455.033

2.75

242.912

88.3316

697.945

200000

50

0.1

50

200000

50

0.1

80

NConv

1

49.5281

74.615

101.25

4102.84

40.5219

4177.46

200000

200

0.1

50

200000

200

0.1

80

OK

5.96008e-10

360.334

454.907

4.75

381.204

80.2535

836.111

200000

50

0.08

50

200000

50

0.08

80

NConv

1

49.4476

74.646

101.25

4067.63

40.1741

4142.28

200000

200

0.08

50

200000

200

0.08

80

OK

5.41432e-10

360.271

455.048

6.75

523.173

77.5071

978.221

200000

50

0.06

50

200000

50

0.06

80

NConv

1

49.5775

74.669

101.25

4053.44

40.034

4128.11

200000

200

0.06

50

200000

200

0.06

80

NConv

1.7933e-09

364.141

458.907

101.25

7083.48

69.9603

7542.39

200000

50

0.04

50

200000

50

0.04

80

NConv

1

49.0918

74.308

101.25

4107.62

40.5691

4181.93

200000

200

0.04

50

200000

200

0.04

80

NConv

1

364.292

459.133

101.25

7077.79

69.9041

7536.92

200000

50

0.02

50

200000

50

0.02

80

NConv

1

49.3556

74.468

101.25

4058.14

40.0804

4132.61

200000

200

0.02

50

200000

200

0.02

80

NConv

1

360.557

455.4

101.25

7051.35

69.643

7506.75

200000

50

0.01

50

200000

50

0.01

80

NConv

1

49.5586

74.761

101.25

4109.92

40.5918

4184.68

200000

200

0.01

50

200000

200

0.01

80

NConv

1

360.307

454.854

101.25

7046.93

69.5993

7501.78

1000

10

0.6

50

1000

10

0.6

80

OK

2.36381e-10

0.37696

1.996

4.75

10.59

2.22947

12.586

2000

10

0.6

50

2000

10

0.6

80

OK

2.97092e-10

0.398368

2.018

4.25

9.857

2.31929

11.875

5000

10

0.6

50

5000

10

0.6

80

OK

6.41246e-10

0.461856

2.607

3.75

11.104

2.96107

13.711

10000

10

0.6

50

10000

10

0.6

80

OK

9.39939e-10

0.582976

3.476

3.75

14.274

3.8064

17.75

20000

10

0.6

50

20000

10

0.6

80

OK

5.87459e-10

1.31008

4.383

3.75

20.059

5.34907

24.442

50000

10

0.6

50

50000

10

0.6

80

OK

7.00301e-10

2.01242

5.701

3.75

43.12

11.4987

48.821

100000

10

0.6

50

100000

10

0.6

80

OK

2.63041e-10

3.15699

7.903

3.75

73.255

19.5347

81.158

200000

10

0.6

50

200000

10

0.6

80

OK

3.00918e-10

5.42224

12.354

3.75

148.942

39.7179

161.296

500000

10

0.6

50

500000

10

0.6

80

OK

2.25904e-10

12.2209

26.284

3.75

318.186

84.8496

344.47

1000000

10

0.6

50

1000000

10

0.6

80

OK

9.93146e-10

23.866

50.156

3.25

606.852

186.724

657.008

1000

20

0.6

50

1000

20

0.6

80

OK

1.16167e-10

0.421216

2.047

3.25

8.464

2.60431

10.511

2000

20

0.6

50

2000

20

0.6

80

OK

7.681e-10

0.503392

2.739

3.25

8.307

2.556

11.046

5000

20

0.6

50

5000

20

0.6

80

OK

8.67865e-10

0.605856

3.482

2.75

8.624

3.136

12.106

10000

20

0.6

50

10000

20

0.6

80

OK

9.73068e-10

1.36909

4.347

2.75

11.414

4.15055

15.761

20000

20

0.6

50

20000

20

0.6

80

OK

6.92831e-10

1.93187

5.232

2.75

16.58

6.02909

21.812

50000

20

0.6

50

50000

20

0.6

80

OK

6.23817e-10

3.5897

8.072

2.75

35.935

13.0673

44.007

100000

20

0.6

50

100000

20

0.6

80

OK

8.57747e-10

6.30336

13.118

2.75

61.45

22.3455

74.568

200000

20

0.6

50

200000

20

0.6

80

OK

2.98509e-10

11.7571

23.221

2.75

124.181

45.1567

147.402

500000

20

0.6

50

500000

20

0.6

80

OK

3.42927e-10

28.2484

53.967

2.75

269.14

97.8691

323.107

1000000

20

0.6

50

1000000

20

0.6

80

OK

1.40223e-10

55.5419

105.171

2.75

512.285

186.285

617.456

1000

50

0.6

50

1000

50

0.6

80

OK

5.73815e-10

0.812992

2.962

2.25

8.914

3.96178

11.876

2000

50

0.6

50

2000

50

0.6

80

OK

3.33326e-10

0.906304

3.755

2.25

8.829

3.924

12.584

5000

50

0.6

50

5000

50

0.6

80

OK

9.14694e-10

2.07766

5.167

2.25

9.61

4.27111

14.777

10000

50

0.6

50

10000

50

0.6

80

OK

3.29765e-10

2.74666

6.13

2.25

11.321

5.03156

17.451

20000

50

0.6

50

20000

50

0.6

80

OK

2.65793e-10

5.27446

9.407

2.25

17.265

7.67333

26.672

50000

50

0.6

50

50000

50

0.6

80

OK

3.55509e-10

12.5552

20.134

2.25

39.25

17.4444

59.384

100000

50

0.6

50

100000

50

0.6

80

OK

9.59934e-10

24.6866

38.108

2.25

60.379

26.8351

98.487

200000

50

0.6

50

200000

50

0.6

80

OK

7.94136e-10

49.4063

74.487

2.25

139.35

61.9333

213.837

500000

50

0.6

50

500000

50

0.6

80

OK

6.80824e-10

122.637

182.153

2.25

267.422

118.854

449.575

1000000

50

0.6

50

1000000

50

0.6

80

OK

4.20445e-10

245.401

362.877

2.25

526.061

233.805

888.938

1000

100

0.6

50

1000

100

0.6

80

OK

3.93268e-10

1.9431

4.816

2.25

10.545

4.68667

15.361

2000

100

0.6

50

2000

100

0.6

80

OK

2.4168e-10

2.85869

5.829

2.25

10.974

4.87733

16.803

5000

100

0.6

50

5000

100

0.6

80

OK

3.96534e-10

4.26432

7.626

2.25

11.72

5.20889

19.346

10000

100

0.6

50

10000

100

0.6

80

OK

1.30014e-10

6.62435

10.754

2.25

13.338

5.928

24.092

20000

100

0.6

50

20000

100

0.6

80

OK

2.88389e-10

12.2232

18.515

2.25

17.621

7.83156

36.136

50000

100

0.6

50

50000

100

0.6

80

OK

6.54399e-10

33.5072

46.643

1.75

34.346

19.6263

80.989

100000

100

0.6

50

100000

100

0.6

80

OK

4.57002e-10

68.6159

93.692

1.75

62.235

35.5629

155.927

200000

100

0.6

50

200000

100

0.6

80

OK

3.2852e-10

135.548

183.428

1.75

123.484

70.5623

306.912

500000

100

0.6

50

500000

100

0.6

80

OK

3.68961e-10

348.102

465.286

1.75

287.964

164.551

753.25

1000000

100

0.6

50

1000000

100

0.6

80

OK

1.9837e-10

681.629

913.349

1.75

574.49

328.28

1487.84

1000

200

0.6

50

1000

200

0.6

80

OK

6.5029e-11

4.9681

7.719

1.75

16.366

9.352

24.085

2000

200

0.6

50

2000

200

0.6

80

OK

1.19787e-10

5.86141

9.057

1.75

14.701

8.40057

23.758

5000

200

0.6

50

5000

200

0.6

80

OK

8.07397e-11

10.3042

14.449

1.75

16.245

9.28286

30.694

10000

200

0.6

50

10000

200

0.6

80

OK

8.76396e-10

16.6419

22.857

1.75

18.143

10.3674

41

20000

200

0.6

50

20000

200

0.6

80

OK

2.72961e-10

30.2272

41.116

1.75

22.505

12.86

63.621

50000

200

0.6

50

50000

200

0.6

80

OK

2.01488e-10

84.2294

109.345

1.75

45.745

26.14

155.09

100000

200

0.6

50

100000

200

0.6

80

OK

4.92705e-11

178.937

227.433

1.75

85.642

48.9383

313.075

200000

200

0.6

50

200000

200

0.6

80

OK

2.67732e-12

360.258

454.877

2.25

206.438

91.7502

661.315

500000

200

0.6

50

500000

200

0.6

80

OK

3.64618e-11

944.355

1180.55

1.75

422.92

241.669

1603.47

1000000

200

0.6

50

OoM (in setup stage)

1000000

200

0.6

80

OoM (in setup stage)

1000

500

0.6

50

OK

1.59476e-15

12.6954

14.948

0.25

10.904

10.904

25.852

1000

500

0.6

80

OK

1.59476e-15

12.7108

14.929

0.25

10.831

10.831

25.76

2000

500

0.6

50

2000

500

0.6

80

OK

7.36592e-12

20.9102

25.015

1.75

32.879

18.788

57.894

5000

500

0.6

50

5000

500

0.6

80

OK

1.65699e-12

41.3375

48.552

1.75

35.31

20.1771

83.862

10000

500

0.6

50

10000

500

0.6

80

OK

1.90938e-11

75.4357

88.778

1.75

39.307

22.4611

128.085

20000

500

0.6

50

20000

500

0.6

80

OK

2.8749e-11

144.904

170.502

1.75

49.473

28.2703

219.975

50000

500

0.6

50

50000

500

0.6

80

OK

4.23415e-12

356.862

418.533

1.75

88.508

50.576

507.041

100000

500

0.6

50

100000

500

0.6

80

OK

5.5514e-12

762.99

885.513

1.75

194.045

110.883

1079.56

200000

500

0.6

50

200000

500

0.6

80

OK

4.12283e-12

1646.43

1891.87

1.75

410.938

234.822

2302.8

500000

500

0.6

50

OoM (in setup stage)

500000

500

0.6

80

OoM (in setup stage)