" Characterizing the Relationship between ILU-type Preconditioners and the Storage Hierarchy" " Characterizing the Relationship between ILU-type Preconditioners.

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.
Linear Algebra Applications in Matlab ME 303. Special Characters and Matlab Functions.
ACCELERATING GOOGLE’S PAGERANK Liz & Steve. Background  When a search query is entered in Google, the relevant results are returned to the user in an.
Weighted Matrix Reordering and Parallel Banded Preconditioners for Nonsymmetric Linear Systems Murat Manguoğlu*, Mehmet Koyutürk**, Ananth Grama* and Ahmed.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Applied Linear Algebra - in honor of Hans SchneiderMay 25, 2010 A Look-Back Technique of Restart for the GMRES(m) Method Akira IMAKURA † Tomohiro SOGABE.
1 JuliusC A practical Approach to Analyze Divide-&-Conquer Algorithms Speaker: Paolo D'Alberto Authors: D'Alberto & Nicolau Information & Computer Science.
MATH 685/ CSI 700/ OR 682 Lecture Notes
Rayan Alsemmeri Amseena Mansoor. LINEAR SYSTEMS Jacobi method is used to solve linear systems of the form Ax=b, where A is the square and invertible.
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
OpenFOAM on a GPU-based Heterogeneous Cluster
" Characterizing the Relationship between ILU-type Preconditioners and the Storage Hierarchy" " Characterizing the Relationship between ILU-type Preconditioners.
Multilevel Incomplete Factorizations for Non-Linear FE problems in Geomechanics DMMMSA – University of Padova Department of Mathematical Methods and Models.
MA5233: Computational Mathematics
Special Matrices and Gauss-Siedel
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
Sparse Matrix Methods Day 1: Overview Day 2: Direct methods
The Landscape of Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage (if sparse) More Robust.
Special Matrices and Gauss-Siedel
Instrumentation and Profiling David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
A Solenoidal Basis Method For Efficient Inductance Extraction H emant Mahawar Vivek Sarin Weiping Shi Texas A&M University College Station, TX.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Written by: Haim Natan Benny Pano Supervisor:
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 6. Eigenvalue problems.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
Parallel Performance of Hierarchical Multipole Algorithms for Inductance Extraction Ananth Grama, Purdue University Vivek Sarin, Texas A&M University Hemant.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
9/7/ Gauss-Siedel Method Chemical Engineering Majors Authors: Autar Kaw Transforming.
Tuning Libraries to Effectively Exploit Memory Prof. Misha Kilmer Emily Reid Stacey Ecott.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
“SEMI-AUTOMATED PARALLELISM USING STAR-P " “SEMI-AUTOMATED PARALLELISM USING STAR-P " Dana Schaa 1, David Kaeli 1 and Alan Edelman 2 2 Interactive Supercomputing.
1 Using the PETSc Parallel Software library in Developing MPP Software for Calculating Exact Cumulative Reaction Probabilities for Large Systems (M. Minkoff.
Fast Low-Frequency Impedance Extraction using a Volumetric 3D Integral Formulation A.MAFFUCCI, A. TAMBURRINO, S. VENTRE, F. VILLONE EURATOM/ENEA/CREATE.
Qualifier Exam in HPC February 10 th, Quasi-Newton methods Alexandru Cioaca.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Chapter 3 Solution of Algebraic Equations 1 ChE 401: Computational Techniques for Chemical Engineers Fall 2009/2010 DRAFT SLIDES.
PDCS 2007 November 20, 2007 Accelerating the Complex Hessenberg QR Algorithm with the CSX600 Floating-Point Coprocessor Yusaku Yamamoto 1 Takafumi Miyata.
Scalable Symbolic Model Order Reduction Yiyu Shi*, Lei He* and C. J. Richard Shi + *Electrical Engineering Department, UCLA + Electrical Engineering Department,
CISC Machine Learning for Solving Systems Problems Presented by: Alparslan SARI Dept of Computer & Information Sciences University of Delaware
Accelerating the Singular Value Decomposition of Rectangular Matrices with the CSX600 and the Integrable SVD September 7, 2007 PaCT-2007, Pereslavl-Zalessky.
On the Use of Sparse Direct Solver in a Projection Method for Generalized Eigenvalue Problems Using Numerical Integration Takamitsu Watanabe and Yusaku.
Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.
Computational Aspects of Multi-scale Modeling Ahmed Sameh, Ananth Grama Computing Research Institute Purdue University.
ICOM 4035 – Data Structures Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 11 – September 25, 2001.
Consider Preconditioning – Basic Principles Basic Idea: is to use Krylov subspace method (CG, GMRES, MINRES …) on a modified system such as The matrix.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Programming Massively Parallel Graphics Multiprocessors using CUDA Final Project Amirhassan Asgari Kamiabad
2/26/ Gauss-Siedel Method Electrical Engineering Majors Authors: Autar Kaw
F. Fairag, H Tawfiq and M. Al-Shahrani Department of Math & Stat Department of Mathematics and Statistics, KFUPM. Nov 6, 2013 Preconditioning Technique.
Circuit Simulation using Matrix Exponential Method Shih-Hung Weng, Quan Chen and Chung-Kuan Cheng CSE Department, UC San Diego, CA Contact:
3/6/ Gauss-Siedel Method Major: All Engineering Majors Author: دکتر ابوالفضل رنجبر نوعی
Analyzing Memory Access Intensity in Parallel Programs on Multicore Lixia Liu, Zhiyuan Li, Ahmed Sameh Department of Computer Science, Purdue University,
MA237: Linear Algebra I Chapters 1 and 2: What have we learned?
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
The Materials Computation Center, University of Illinois Duane Johnson and Richard Martin (PIs), NSF DMR Computer science-based.
Solving Systems of Linear Equations: Iterative Methods
Gauss-Siedel Method.
Eigenspectrum calculation of the non-Hermitian O(a)-improved Wilson-Dirac operator using the Sakurai-Sugiura method H. Sunoa, Y. Nakamuraa,b, K.-I. Ishikawac,
Lecture 19 MA471 Fall 2003.
Autar Kaw Benjamin Rigsby
Memory Hierarchies.
A robust preconditioner for the conjugate gradient method
"Developing an Efficient Sparse Matrix Framework Targeting SSI Applications" Diego Rivera and David Kaeli The Center for Subsurface Sensing and Imaging.
Numerical Linear Algebra
TensorFlow: A System for Large-Scale Machine Learning
August 8, 2006 Danny Budik, Itamar Elhanany Machine Intelligence Lab
Linear Algebra Lecture 16.
2019/9/14 The Deep Learning Vision for Heterogeneous Network Traffic Control Proposal, Challenges, and Future Perspective Author: Nei Kato, Zubair Md.
Presentation transcript:

" Characterizing the Relationship between ILU-type Preconditioners and the Storage Hierarchy" " Characterizing the Relationship between ILU-type Preconditioners and the Storage Hierarchy" Diego Rivera 1, David Kaeli 1 and Misha Kilmer 2 1 Department of Electrical and Computer Engineering Northeastern University, Boston, MA {drivera, 2 Department of Mathematics Tufts University, Medford, MA ICSS Institute for Complex Scientific Software NS (Numerical Symmetry) B (matrix’s Bandwidth) The difference in performance on different memory hierarchies becomes significant when the problem’s conditions make it more difficult to solve These conditions are related to the dropping strategy adopted in the preconditioner algorithm We use the PIN tool to capture cache events Our results show a high correlation between the execution time, memory accesses and cache misses Our Algorithm Approach to: 1) Extract the problem’s conditions related to the dropping strategies adopted in the preconditioner 2) Detect if the computation of a solution depends upon the relationship between the preconditioner’s parameters and the memory hierarchy of the machine used 3) Suggest values for the preconditioner’s parameters which can help to reduce the time required to compute the preconditioner and the solution for matrices with similar characteristics Our experimental results show that 78.4% of the time, the suggested values of the preconditioner’s parameters were appropriate in reducing the overall execution time Plans and future work Explore more sophisticated heuristics for our algorithmic approach  Increase the percentage of suggested values appropriated in reducing the overall execution time. Extend our study to multilevel preconditioners based on ILU factorization Objective To improve the performance of preconditioners targeting sparse matrices To accelerate the memory accesses associated with these codes Motivation Prior work targeted Krylov subspace methods However, little has been done in the case of preconditioners “Nothing will be more central to computational science in the next century than the art of transforming a problem that appears intractable into another whose solution can be approximated rapidly. For Krylov subspace matrix iterations, this is preconditioning” from Numerical Linear Algebra by Trefethen and Bau (1997). Common target applications Incomplete LU factorization type Preconditioners are used to accelerate the convergence of Krylov subspace methods A drawback of these approaches is that it is difficult to choose the best values for their tuning parameters Choosing good values depends heavily on the structure of non- zero elements of the coefficient matrix In our work we have found that it depends also on the memory hierarchy of the machine used to carry out the computation The parameter values used to obtain the fastest execution time, given an acceptable final error, may be different for different memory hierarchies Multilevel preconditioners based on ILU factorization Acknowledgement This project is supported by a grant from the NSF Advanced Computational Research Division, award No. CCF and the Institute for Complex Scientific Software at Northeastern University. Preconditioner Ax=b Solution to the linear system M -1 Ax=M -1 b Iterative Method Weather Simulations Turbulence problems in airplanes DNA models A (m,m) x (m) = b (m) Raefsky3Ldoor Cage14 Torso3 Relation NS/B decreases in this direction Error norm vs. 13 first duple sorted in increasing order for execution time of ILUT and GMRES DTLB DL1 L2 Ultra Sparc-III DTLB DL1 L2 L3 Intel XEON Correlation of load accesses and execution time drop tolerance, diagonal compensation parameter and tolerance ratio , , permtol ILUDP drop tolerance, diagonal compensation parameter ,  ILUD level-of-fill, drop tolerance and tolerance ratio , , permtol ILUTP level-of-fill, drop tolerance ,, ILUT level-of-fill  ILU(  ) Description parametersParametersPreconditioner Target preconditioners 1 GB RAM2 GB RAM RAM N/A 1 MB 8-way Level 3 8MB 2-way512 KB 8-way Level 2 64KB 4-way for data8KB 4-way for data Level 1 Ultra Sparc-III 750 MHzIntel XEON 3.06 GHz Evaluation environment TORSO XeonUltra level of fill-indrop tol.iterationsResidual errorlevel of fill-indrop tol.iterationsResidual error 204.0E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E-09 CAGE14 XeonUltra level of fill-indrop tol.iterationsResidual errorlevel of fill-indrop tol.iterationsResidual error 15.0E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E-02 RAEFSKY3 XeonUltra level of fill-indrop tol.iterationsResidual errorlevel of fill-indrop tol.iterationsResidual error 301.0E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E-07 LDOOR XeonUltra level of fill-indrop tol.iterationsResidual errorlevel of fill-indrop tol.iterationsResidual error 501.0E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E-03 Same duple (level of fill-in, drop tol) in both machines Different duple (level of fill-in, drop tol) in both machines NameNon-zero elements RowsNSBNS/B Raefsky31,488,76821,20048% Ldoor42,493,817952,203100% Cage1427,130,3491,505,78521% Torso34,429,042259,1560% Matrices