Download presentation
Presentation is loading. Please wait.
Published byAshlie Young Modified over 9 years ago
1
" Characterizing the Relationship between ILU-type Preconditioners and the Storage Hierarchy" " Characterizing the Relationship between ILU-type Preconditioners and the Storage Hierarchy" Diego Rivera 1, David Kaeli 1 and Misha Kilmer 2 1 Department of Electrical and Computer Engineering Northeastern University, Boston, MA {drivera, kaeli}@ece.neu.edu 2 Department of Mathematics Tufts University, Medford, MA misha.kilmer@tufts.eduwww.ece.neu.edu/students/drivera/tlg/tunlib.html ICSS Institute for Complex Scientific Software NS (Numerical Symmetry) B (matrix’s Bandwidth) The difference in performance on different memory hierarchies becomes significant when the problem’s conditions make it more difficult to solve These conditions are related to the dropping strategy adopted in the preconditioner algorithm We use the PIN tool to capture cache events Our results show a high correlation between the execution time, memory accesses and cache misses Our Algorithm Approach to: 1) Extract the problem’s conditions related to the dropping strategies adopted in the preconditioner 2) Detect if the computation of a solution depends upon the relationship between the preconditioner’s parameters and the memory hierarchy of the machine used 3) Suggest values for the preconditioner’s parameters which can help to reduce the time required to compute the preconditioner and the solution for matrices with similar characteristics Our experimental results show that 78.4% of the time, the suggested values of the preconditioner’s parameters were appropriate in reducing the overall execution time Plans and future work Explore more sophisticated heuristics for our algorithmic approach Increase the percentage of suggested values appropriated in reducing the overall execution time. Extend our study to multilevel preconditioners based on ILU factorization Objective To improve the performance of preconditioners targeting sparse matrices To accelerate the memory accesses associated with these codes Motivation Prior work targeted Krylov subspace methods However, little has been done in the case of preconditioners “Nothing will be more central to computational science in the next century than the art of transforming a problem that appears intractable into another whose solution can be approximated rapidly. For Krylov subspace matrix iterations, this is preconditioning” from Numerical Linear Algebra by Trefethen and Bau (1997). Common target applications Incomplete LU factorization type Preconditioners are used to accelerate the convergence of Krylov subspace methods A drawback of these approaches is that it is difficult to choose the best values for their tuning parameters Choosing good values depends heavily on the structure of non- zero elements of the coefficient matrix In our work we have found that it depends also on the memory hierarchy of the machine used to carry out the computation The parameter values used to obtain the fastest execution time, given an acceptable final error, may be different for different memory hierarchies Multilevel preconditioners based on ILU factorization Acknowledgement This project is supported by a grant from the NSF Advanced Computational Research Division, award No. CCF-0342555 and the Institute for Complex Scientific Software at Northeastern University. Preconditioner Ax=b Solution to the linear system M -1 Ax=M -1 b Iterative Method Weather Simulations Turbulence problems in airplanes DNA models A (m,m) x (m) = b (m) Raefsky3Ldoor Cage14 Torso3 Relation NS/B decreases in this direction Error norm vs. 13 first duple sorted in increasing order for execution time of ILUT and GMRES DTLB DL1 L2 Ultra Sparc-III DTLB DL1 L2 L3 Intel XEON Correlation of load accesses and execution time drop tolerance, diagonal compensation parameter and tolerance ratio , , permtol ILUDP drop tolerance, diagonal compensation parameter , ILUD level-of-fill, drop tolerance and tolerance ratio , , permtol ILUTP level-of-fill, drop tolerance ,, ILUT level-of-fill ILU( ) Description parametersParametersPreconditioner Target preconditioners 1 GB RAM2 GB RAM RAM N/A 1 MB 8-way Level 3 8MB 2-way512 KB 8-way Level 2 64KB 4-way for data8KB 4-way for data Level 1 Ultra Sparc-III 750 MHzIntel XEON 3.06 GHz Evaluation environment TORSO XeonUltra level of fill-indrop tol.iterationsResidual errorlevel of fill-indrop tol.iterationsResidual error 204.0E-02102.1274E-08301.0E-0272.0350E-08 174.0E-02102.1105E-08302.5E-0292.1215E-08 134.0E-02104.1135E-08301.5E-0288.3086E-09 154.0E-02102.9366E-08134.0E-02104.1135E-08 304.0E-02102.1951E-08304.0E-02102.1951E-08 302.5E-0292.1215E-08174.0E-02102.1105E-08 173.5E-02101.4079E-08154.0E-02102.9366E-08 303.5E-02101.1416E-08204.0E-02102.1274E-08 203.5E-02101.1082E-08303.5E-02101.1416E-08 206.0E-02113.1833E-08203.5E-02101.1082E-08 136.0E-02113.7707E-08173.5E-02101.4079E-08 156.0E-02113.3234E-08302.0E-0291.0323E-08 306.0E-02113.1831E-08203.0E-02107.4593E-09 CAGE14 XeonUltra level of fill-indrop tol.iterationsResidual errorlevel of fill-indrop tol.iterationsResidual error 15.0E-0181.4892E-02132.5E-0172.6528E-02 405.0E-0182.3926E-02152.5E-0172.8517E-02 25.0E-0181.5111E-0212.5E-0181.4892E-02 205.0E-0182.3926E-0225.0E-0181.5111E-02 175.0E-0182.3882E-0211.0E-0182.6387E-02 35.0E-0181.7360E-0215.0E-0181.4892E-02 155.0E-0182.3695E-02305.0E-0182.3926E-02 55.0E-0181.7308E-02405.0E-0182.3926E-02 305.0E-0182.3926E-02505.0E-0182.3926E-02 95.0E-0182.2126E-02205.0E-0182.3926E-02 505.0E-0182.3926E-02135.0E-0182.3546E-02 115.0E-0182.3420E-0235.0E-0181.7360E-02 135.0E-0182.3546E-02115.0E-0182.3420E-02 RAEFSKY3 XeonUltra level of fill-indrop tol.iterationsResidual errorlevel of fill-indrop tol.iterationsResidual error 301.0E-03237.5754E-07301.0E-03237.5754E-07 308.0E-04235.8466E-07308.0E-04235.8466E-07 321.0E-03235.3237E-07321.0E-03235.3237E-07 341.0E-03226.5689E-07341.0E-03226.5689E-07 328.0E-04224.8701E-07328.0E-04224.8701E-07 306.0E-04237.5087E-07306.0E-04237.5087E-07 348.0E-04226.3722E-07348.0E-04226.3722E-07 361.0E-03226.9058E-07361.0E-03226.9058E-07 326.0E-04226.7871E-07326.0E-04226.7871E-07 304.0E-04236.6286E-07304.0E-04236.6286E-07 381.0E-03223.7978E-07381.0E-03223.7978E-07 346.0E-04225.0248E-07346.0E-04225.0248E-07 368.0E-04224.4528E-07368.0E-04224.4528E-07 LDOOR XeonUltra level of fill-indrop tol.iterationsResidual errorlevel of fill-indrop tol.iterationsResidual error 501.0E-0234.5742E-02501.0E-0234.5742E-02 501.0E-0334.5558E-02501.0E-0334.5558E-02 501.0E-0434.5558E-02501.0E-0434.5558E-02 501.0E-0734.5558E-02501.0E-0734.5558E-02 501.0E-0634.5558E-02501.0E-0634.5558E-02 501.0E-1034.5558E-02501.0E-1034.5558E-02 501.0E-0534.5558E-02501.0E-0534.5558E-02 501.0E-0143.7216E-03501.0E-0143.7216E-03 401.0E-0141.0742E-02505.0E-0245.0180E-04 505.0E-0245.0180E-04502.5E-0242.4878E-04 502.5E-0242.4878E-04401.0E-0141.0742E-02 502.5E-0153.8002E-03405.0E-0246.2204E-03 405.0E-0246.2204E-03402.5E-0245.7514E-03 Same duple (level of fill-in, drop tol) in both machines Different duple (level of fill-in, drop tol) in both machines NameNon-zero elements RowsNSBNS/B Raefsky31,488,76821,20048%0.05968.05 Ldoor42,493,817952,203100%0.72151.39 Cage1427,130,3491,505,78521%0.44900.47 Torso34,429,042259,1560%0.81810 Matrices
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.