1-14-2016 Banded Results BCR Test



• Name: the name of the matrix (test)
• N: the dimension of the matrix (number of rows and columns)
• NNZ: number of non-zeros
• SPD: whether the matrix is specified by the user to be symmetric positive definite (values: 0 or 1)
• DB: indicate whether DB reordering is performed. Values: 0 or 1.
• K-DB: the half-bandwidth after DB reordering method (without any drop-off). If DB is specified not to be executed, then this reports the original half-bandwidth
• KnoDrp: the half-bandwidth after DB and CM reordering but before drop-off
• K: the half-bandwidth after reordering and drop-off
• FRate: fill-in rate. See NOTES below
• nuKf: non-uniform K factor. Indicates whether the K changes a lot from row to row. Values are between 0 and 1, with 0 indicating a perfectly uniform bandwidth over the entire matrix. See NOTES below
• Solves: indicates whether we managed to solve the problem or not. OK means solved fine, otherwise a reason is provided for failure
• Bstng: indicates whether we enable diagonal boosting when doing factorization. Values: 0 or 1
• SolAcc: infinity norm of the array storing the relative errors
• T-DB: time to run DB reordering for the matrix on the CPU
• T-CM: time to run CM reordering for the matrix on the CPU
• T-Drop: time to drop off off-diagonal elements to decrease bandwidth. Done on the CPU.
• T-Dtransf: data transfer from CPU to GPU
• T-Asmbl: after reordering and drop-off, copy the sparse matrix to banded matrix stored in GPU memory
• LU-M: LU method (complete, ILUT or ILUULT)
• Fill-in: the fill-in factor of ILUT (-1 indicates complete LU)
• NPrtns: the number of partitions used to solve the problem
• T-BC: time required to get off-diagonal right hand sides (Bs and Cs) from the banded matrix - done on the GPU
• T-LU: LU time
• GFlps-LU: LU GFLOPs
• T-SPK: time to solve for the spikes Vs and Ws - done on the GPU
• T-LUrdcd: time required to factorize the reduced matrices - done on the GPU
• T-PreP: the sum of all preprocessing times, see NOTES
• Kry-M: the method used in Krylov solving stage (can be BiCGStab2 (0), BiCGStab (1), or CG(2))
• nItrs: the number of Krylov-solve iterations to solve the problem
• T-Kry: time spent in the Krylov solver (on the GPU)
• Total: total time to solve the problem, sum of PreProc + T-Kry
• Pardiso: the time for the commercial tool "Pardiso" to solve the problem
• SlwD: the slowdown ratio of our solver compared to Pardiso (a value less than one means that we are faster than Pardiso. The value is shown in green if we run faster and shown in red if we run more than 5 times slower.)
• Fastest: the time when SaP runs fastest historically
• SpdUp: the speedup of this run compared to the historical fastest run (the value is shown in green if the speedup is more than 5% and shown in red if the slowdown is more than 5%)


NOTES:
1) nuKf = 1/(2KN)*[sum over i from 1 to N of (2K - K_{iLeft} - K_{iRight})], where K_{iLeft} is the row half-bandwidth to the left of the diagonal while K_{iRight} is the row half-bandwidth to the right of the diagonal.
2) FRate = the actual number of NNZ / ((2K+1)N).
3) All times reported are in miliseconds (1E-3 second)


T-PreP

N

K

d

NPrtns

Solves

T-BC

T-LU

T-SwDef

T-MMDef

nItrs

T-SwInf

T-MVInf

T-Kry

T-KryPIt

T-Total

200000

10

1

1

OK

19.994

12.2603

35.0052

77.117

0.75

10.855

30.1868

53.8

53.8

130.917

200000

10

1

4

OK

20.6869

12.002

34.1924

76.707

2.75

27.2799

77.3639

132.605

48.22

209.312

200000

10

1

8

OK

19.7689

11.8649

33.455

74.96

2.75

26.8425

76.9584

131.463

47.8047

206.423

200000

10

1

16

OK

18.9606

11.7364

33.2324

73.759

2.75

26.3887

76.5673

130.236

47.3585

203.995

200000

10

1

20

OK

18.0649

11.654

32.5622

72.077

3.25

29.8926

87.7209

148.004

45.5397

220.081

200000

10

1

32

OK

18.026

11.5984

32.4997

71.9

3.25

33.9327

99.7615

167.386

51.5034

239.286

200000

10

1

50

OK

17.243

11.5128

32.0771

70.675

3.25

33.3769

99.2008

165.6

50.9538

236.275

200000

20

1

1

OK

30.6358

20.7853

42.7134

104.682

0.75

10.8124

32.3976

58.167

58.167

162.849

200000

20

1

4

OK

28.87

20.1192

41.827

101.406

2.75

26.816

83.1321

143.673

52.2447

245.079

200000

20

1

8

OK

27.8828

19.7548

40.9505

99.116

2.75

26.1479

82.6748

142.131

51.684

241.247

200000

20

1

16

OK

27.0002

19.407

40.3906

97.246

2.75

25.5097

82.4035

140.841

51.2149

238.087

200000

20

1

20

OK

26.0976

19.2266

39.846

95.684

2.75

24.8182

81.5794

139.093

50.5793

234.777

200000

20

1

32

OK

26.1823

19.0738

39.8965

95.639

2.75

24.9699

82.003

139.651

50.7822

235.29

200000

20

1

50

OK

25.2008

18.8557

34.3754

88.954

2.75

24.2058

81.6233

138.064

50.2051

227.018

200000

50

1

1

OK

59.7688

40.8844

72.8934

191.544

0.75

13.3369

32.8317

63.884

63.884

255.428

200000

50

1

4

OK

57.7388

38.9221

72.3354

186.844

2.25

27.2431

71.1521

132.995

59.1089

319.839

200000

50

1

8

OK

56.646

37.9291

71.3072

183.687

2.25

26.1652

70.5893

131.132

58.2809

314.819

200000

50

1

16

OK

55.464

36.7152

63.8788

173.804

2.25

25.0469

69.8229

129.272

57.4542

303.076

200000

50

1

20

OK

55.6604

36.8564

63.9624

174.312

2.25

29.6172

82.9376

152.275

67.6778

326.587

200000

50

1

32

OK

54.3924

35.2342

60.9498

168.377

2.25

28.4269

81.4321

149.382

66.392

317.759

200000

50

1

50

OK

55.1335

34.9096

61.572

169.344

2.25

24.0786

69.4871

127.62

56.72

296.964

200000

100

1

1

OK

111.034

153.621

144.639

442.424

0.75

17.6839

38.0391

79.238

79.238

521.662

200000

100

1

4

OK

108.506

144.736

141.923

428.053

1.75

28.6729

67.2429

135.846

77.6263

563.899

200000

100

1

8

OK

107.274

139.87

132.584

412.743

1.75

27.1111

66.4245

133.598

76.3417

546.341

200000

100

1

16

OK

106.02

134.497

128.687

402.144

1.75

25.5929

65.254

130.543

74.596

532.687

200000

100

1

20

OK

106.881

134.277

129.472

403.469

1.75

25.4716

65.8118

130.876

74.7863

534.345

200000

100

1

32

OK

105.571

94.5609

123.104

356.326

1.75

24.3204

63.4971

127.342

72.7669

483.668

200000

100

1

50

OK

107.368

125.333

122.543

387.986

2.25

29.436

78.2668

155.206

68.9804

543.192

200000

200

1

1

OK

315.596

416.764

369.591

1195.13

0.75

27.9574

54.4809

120.219

120.219

1315.35

200000

200

1

4

OK

312.219

388.529

349.699

1143.44

1.75

44.1978

95.6043

205.527

117.444

1348.97

200000

200

1

8

OK

310.4

371.932

338.606

1113.6

1.75

41.3329

93.3072

200.608

114.633

1314.21

200000

200

1

16

OK

316.944

232.015

329.364

971.926

1.75

39.2627

91.1022

195.841

111.909

1167.77

200000

200

1

20

OK

312.564

352.243

323.028

1079.98

1.75

38.3334

91.2733

195.198

111.542

1275.18

200000

200

1

32

OK

313.918

209.528

296.167

912.782

1.75

36.0635

85.0138

186.525

106.586

1099.31

200000

200

1

50

OK

318.256

315.688

289.854

1014.67

1.75

38.5344

84.7201

188.401

107.658

1203.07

200000

500

1

1

OK

1688.59

1533.71

1858.04

5394.5

0.75

66.3298

111.835

259.605

259.605

5654.1

200000

500

1

4

OK

1691.14

1430.45

1753.92

5187.74

1.75

104.531

193.955

442.892

253.081

5630.63

200000

500

1

8

OK

1696.14

1340.77

1658.88

5004.7

1.75

97.3678

186.582

428.058

244.605

5432.76

200000

500

1

16

OK

1700.96

1216.47

1491.89

4715.09

1.75

89.5093

174.146

407.43

232.817

5122.52

200000

500

1

20

OK

1727.48

1184.04

1466.72

4681.07

1.75

90.5271

170.966

405.479

231.702

5086.55

200000

500

1

32

NConv

1781.28

975.877

1363.33

4438.7

1000.25

41343.1

72109.2

176336

176.292

180774

200000

500

1

50

OK

1813.47

958.658

1169.22

4231.72

1.75

82.4797

146.284

372.738

212.993

4604.46