3-19-2016 Banded Results Patch



• Name: the name of the matrix (test)
• N: the dimension of the matrix (number of rows and columns)
• NNZ: number of non-zeros
• SPD: whether the matrix is specified by the user to be symmetric positive definite (values: 0 or 1)
• DB: indicate whether DB reordering is performed. Values: 0 or 1.
• K-DB: the half-bandwidth after DB reordering method (without any drop-off). If DB is specified not to be executed, then this reports the original half-bandwidth
• KnoDrp: the half-bandwidth after DB and CM reordering but before drop-off
• K: the half-bandwidth after reordering and drop-off
• FRate: fill-in rate. See NOTES below
• nuKf: non-uniform K factor. Indicates whether the K changes a lot from row to row. Values are between 0 and 1, with 0 indicating a perfectly uniform bandwidth over the entire matrix. See NOTES below
• Solves: indicates whether we managed to solve the problem or not. OK means solved fine, otherwise a reason is provided for failure
• Bstng: indicates whether we enable diagonal boosting when doing factorization. Values: 0 or 1
• SolAcc: infinity norm of the array storing the relative errors
• T-DB: time to run DB reordering for the matrix on the CPU
• T-CM: time to run CM reordering for the matrix on the CPU
• T-Drop: time to drop off off-diagonal elements to decrease bandwidth. Done on the CPU.
• T-Dtransf: data transfer from CPU to GPU
• T-Asmbl: after reordering and drop-off, copy the sparse matrix to banded matrix stored in GPU memory
• LU-M: LU method (complete, ILUT or ILUULT)
• Fill-in: the fill-in factor of ILUT (-1 indicates complete LU)
• NPrtns: the number of partitions used to solve the problem
• T-BC: time required to get off-diagonal right hand sides (Bs and Cs) from the banded matrix - done on the GPU
• T-LU: LU time
• GFlps-LU: LU GFLOPs
• T-SPK: time to solve for the spikes Vs and Ws - done on the GPU
• T-LUrdcd: time required to factorize the reduced matrices - done on the GPU
• T-PreP: the sum of all preprocessing times, see NOTES
• Kry-M: the method used in Krylov solving stage (can be BiCGStab2 (0), BiCGStab (1), or CG(2))
• nItrs: the number of Krylov-solve iterations to solve the problem
• T-Kry: time spent in the Krylov solver (on the GPU)
• Total: total time to solve the problem, sum of PreProc + T-Kry
• Pardiso: the time for the commercial tool "Pardiso" to solve the problem
• SlwD: the slowdown ratio of our solver compared to Pardiso (a value less than one means that we are faster than Pardiso. The value is shown in green if we run faster and shown in red if we run more than 5 times slower.)
• Fastest: the time when SaP runs fastest historically
• SpdUp: the speedup of this run compared to the historical fastest run (the value is shown in green if the speedup is more than 5% and shown in red if the slowdown is more than 5%)


NOTES:
1) nuKf = 1/(2KN)*[sum over i from 1 to N of (2K - K_{iLeft} - K_{iRight})], where K_{iLeft} is the row half-bandwidth to the left of the diagonal while K_{iRight} is the row half-bandwidth to the right of the diagonal.
2) FRate = the actual number of NNZ / ((2K+1)N).
3) All times reported are in miliseconds (1E-3 second)


N

K

d

NPrtns

Solves

RelR

T-LU

T-SwDef

T-MMDef

T-PreP

nItrs

T-SwInf

T-MVInf

T-Kry

T-KryPIt

T-Total

200000

200

1.2

50

OK

1.1807e-15

1004.51

1200.83

0.25

111.827

111.827

1312.65

200000

200

1.2

80

OK

2.50688e-12

365.549

421.434

1.75

174.77

99.8686

596.204

200000

200

1.2

1

OK

8.5935e-16

370.207

611.733

469.966

1573.61

0.25

29.0383

48.5761

109.31

109.31

1682.92

200000

200

1

50

OK

1.19498e-15

1003.36

1200

0.25

112.09

112.09

1312.09

200000

200

1

80

OK

1.52629e-11

361.706

417.604

1.75

174.462

99.6926

592.066

200000

200

1

1

OK

8.74646e-16

370.091

611.717

469.462

1572.66

0.25

29.0225

48.5773

109.215

109.215

1681.88

200000

200

0.8

50

OK

1.16234e-15

994.478

1189.79

0.25

111.443

111.443

1301.23

200000

200

0.8

80

OK

2.38516e-10

361.687

417.421

1.75

174.086

99.4777

591.507

200000

200

0.8

1

OK

8.48387e-16

370.314

611.667

468.991

1572.61

0.25

29.0538

48.5612

109.145

109.145

1681.76

200000

200

0.6

50

OK

1.19337e-15

994.333

1189.6

0.25

111.363

111.363

1300.96

200000

200

0.6

80

OK

2.67732e-12

361.759

417.621

2.25

207.049

92.0218

624.67

200000

200

0.6

1

OK

8.65722e-16

370.105

611.955

469.14

1572.52

0.25

29.0494

48.5844

109.224

109.224

1681.74

200000

200

0.4

50

OK

1.16559e-15

994.512

1189.56

0.25

111.597

111.597

1301.16

200000

200

0.4

80

OK

1.01774e-10

361.743

417.568

2.25

206.841

91.9293

624.409

200000

200

0.4

1

OK

8.51966e-16

370.173

612.078

469.536

1573.3

0.25

29.0391

48.5876

109.273

109.273

1682.57

200000

200

0.2

50

OK

1.17864e-15

994.488

1189.6

0.25

111.358

111.358

1300.96

200000

200

0.2

80

OK

5.10344e-10

361.673

417.514

2.75

243.508

88.5484

661.022

200000

200

0.2

1

OK

8.61372e-16

368.743

611.962

469.212

1571.78

0.25

29.0382

48.5448

109.206

109.206

1680.99

200000

200

0.1

50

OK

6.88359e-14

994.396

1189.54

0.25

111.616

111.616

1301.16

200000

200

0.1

80

OK

5.96008e-10

361.694

417.409

4.75

381.986

80.4181

799.395

200000

200

0.1

1

OK

1.01022e-15

370.239

611.846

469.238

1573.19

0.25

29.0826

48.582

109.227

109.227

1682.42

200000

200

0.08

50

OK

1.43986e-10

994.467

1189.65

0.25

111.578

111.578

1301.22

200000

200

0.08

80

OK

5.41432e-10

361.757

417.403

6.75

520.604

77.1265

938.007

200000

200

0.08

1

OK

1.28597e-15

370.344

611.688

469.508

1572.87

0.25

29.0543

48.5752

109.22

109.22

1682.09

200000

200

0.06

50

OK

2.33386e-10

994.592

1189.88

1.75

281.015

160.58

1470.89

200000

200

0.06

80

NConv

1.7933e-09

361.771

417.576

101.25

7072.4

69.8509

7489.98

200000

200

0.06

1

OK

8.60068e-14

368.704

611.722

469.661

1571.61

0.25

29.009

48.5564

109.014

109.014

1680.62

200000

200

0.04

50

NConv

1

994.537

1189.83

101.25

11505.7

113.636

12695.5

200000

200

0.04

80

NConv

1

362.038

417.749

101.25

7112.5

70.2469

7530.25

200000

200

0.04

1

OK

4.98003e-10

370.417

612.271

469.502

1573.55

1.25

65.4234

109.389

243.325

194.66

1816.87

200000

200

0.02

50

NConv

1

994.508

1189.74

101.25

11474.7

113.33

12664.4

200000

200

0.02

80

NConv

1

361.794

417.713

101.25

7069.29

69.8202

7487

200000

200

0.02

1

NConv

1.2879e-09

370.294

611.887

469.013

1573.03

2.5

159.607

266.957

597.337

238.935

2170.37

200000

200

0.01

50

NConv

1

994.843

1190.24

101.25

11593.6

114.504

12783.8

200000

200

0.01

80

NConv

1

361.77

417.513

101.25

7127.19

70.392

7544.71

200000

200

0.01

1

NConv

8.20921e-09

370.937

612.157

469.575

1574.33

2.5

159.793

267.181

597.715

239.086

2172.04