4-30-2016 New Banded Results



• Name: the name of the matrix (test)
• N: the dimension of the matrix (number of rows and columns)
• NNZ: number of non-zeros
• SPD: whether the matrix is specified by the user to be symmetric positive definite (values: 0 or 1)
• DB: indicate whether DB reordering is performed. Values: 0 or 1.
• K-DB: the half-bandwidth after DB reordering method (without any drop-off). If DB is specified not to be executed, then this reports the original half-bandwidth
• KnoDrp: the half-bandwidth after DB and CM reordering but before drop-off
• K: the half-bandwidth after reordering and drop-off
• FRate: fill-in rate. See NOTES below
• nuKf: non-uniform K factor. Indicates whether the K changes a lot from row to row. Values are between 0 and 1, with 0 indicating a perfectly uniform bandwidth over the entire matrix. See NOTES below
• Solves: indicates whether we managed to solve the problem or not. OK means solved fine, otherwise a reason is provided for failure
• Bstng: indicates whether we enable diagonal boosting when doing factorization. Values: 0 or 1
• SolAcc: infinity norm of the array storing the relative errors
• T-DB: time to run DB reordering for the matrix on the CPU
• T-CM: time to run CM reordering for the matrix on the CPU
• T-Drop: time to drop off off-diagonal elements to decrease bandwidth. Done on the CPU.
• T-Dtransf: data transfer from CPU to GPU
• T-Asmbl: after reordering and drop-off, copy the sparse matrix to banded matrix stored in GPU memory
• LU-M: LU method (complete, ILUT or ILUULT)
• Fill-in: the fill-in factor of ILUT (-1 indicates complete LU)
• NPrtns: the number of partitions used to solve the problem
• T-BC: time required to get off-diagonal right hand sides (Bs and Cs) from the banded matrix - done on the GPU
• T-LU: LU time
• GFlps-LU: LU GFLOPs
• T-SPK: time to solve for the spikes Vs and Ws - done on the GPU
• T-LUrdcd: time required to factorize the reduced matrices - done on the GPU
• T-PreP: the sum of all preprocessing times, see NOTES
• Kry-M: the method used in Krylov solving stage (can be BiCGStab2 (0), BiCGStab (1), or CG(2))
• nItrs: the number of Krylov-solve iterations to solve the problem
• T-Kry: time spent in the Krylov solver (on the GPU)
• Total: total time to solve the problem, sum of PreProc + T-Kry
• Pardiso: the time for the commercial tool "Pardiso" to solve the problem
• SlwD: the slowdown ratio of our solver compared to Pardiso (a value less than one means that we are faster than Pardiso. The value is shown in green if we run faster and shown in red if we run more than 5 times slower.)
• Fastest: the time when SaP runs fastest historically
• SpdUp: the speedup of this run compared to the historical fastest run (the value is shown in green if the speedup is more than 5% and shown in red if the slowdown is more than 5%)


NOTES:
1) nuKf = 1/(2KN)*[sum over i from 1 to N of (2K - K_{iLeft} - K_{iRight})], where K_{iLeft} is the row half-bandwidth to the left of the diagonal while K_{iRight} is the row half-bandwidth to the right of the diagonal.
2) FRate = the actual number of NNZ / ((2K+1)N).
3) All times reported are in miliseconds (1E-3 second)


N

K

d

NPrtns

Solves

RelR

T-LU

T-SwDef

T-MMDef

T-PreP

nItrs

T-SwInf

T-MVInf

T-Kry

T-KryPIt

T-Total

200000

50

0.2

80

OK

6.36112e-10

0

51.7984

0

0

58.155

4.75

162.209

34.1493

220.364

200000

200

0.2

80

OK

5.10344e-10

0

375.029

0

0

398.316

2.75

213.662

77.6953

611.978

200000

50

0.19

80

OK

1.72395e-10

0

51.7148

0

0

58.073

5.75

191.792

33.3551

249.865

200000

200

0.19

80

OK

9.59482e-10

0

370.998

0

0

393.993

2.75

212.865

77.4055

606.858

200000

50

0.18

80

OK

9.41806e-10

0

52.6795

0

0

59.124

5.75

192.826

33.535

251.95

200000

200

0.18

80

OK

1.66894e-10

0

371.116

0

0

394.142

3.25

241.504

74.3089

635.646

200000

50

0.17

80

OK

4.55752e-10

0

50.7046

0

0

57.067

6.75

221.294

32.7843

278.361

200000

200

0.17

80

OK

3.58456e-10

0

370.178

0

0

393.18

3.25

241.742

74.3822

634.922

200000

50

0.16

80

OK

5.85404e-10

0

52.1065

0

0

58.479

7.75

250.454

32.3166

308.933

200000

200

0.16

80

OK

8.11541e-10

0

370.184

0

0

393.152

3.25

241.739

74.3812

634.891

200000

50

0.15

80

OK

4.64543e-10

0

51.5259

0

0

57.88

9.25

292.756

31.6493

350.636

200000

200

0.15

80

OK

4.33092e-10

0

374.119

0

0

397.313

3.25

270.979

83.3782

668.292

200000

50

0.14

80

OK

9.95609e-10

0

51.24

0

0

57.631

11.75

368.416

31.3546

426.047

200000

200

0.14

80

OK

6.87913e-11

0

370.125

0

0

393.112

3.75

273.618

72.9648

666.73

200000

50

0.13

80

OK

3.02539e-10

0

51.4644

0

0

57.815

26.25

792.237

30.1805

850.052

200000

200

0.13

80

OK

2.44978e-10

0

370.204

0

0

393.198

3.75

273.307

72.8819

666.505

200000

50

0.12

80

NConv

2.00164e-09

0

51.5974

0

0

57.959

183.75

5429.71

29.5494

5487.67

200000

200

0.12

80

OK

9.9876e-10

0

370.406

0

0

394.63

3.75

276.724

73.7931

671.354

200000

50

0.11

80

NConv

1

0

52.0471

0

0

58.47

501.25

14537.1

29.0018

14595.6

200000

200

0.11

80

OK

6.2028e-11

0

370.327

0

0

393.294

4.75

333.763

70.2659

727.057

200000

50

0.1

80

NConv

1

0

52.264

0

0

58.687

501.25

14621.4

29.1699

14680.1

200000

200

0.1

80

OK

5.96008e-10

0

370.125

0

0

393.107

4.75

334.121

70.3413

727.228