Sparse Matrix Algorithms CS 524 – High-Performance Computing.

Slides:



Advertisements
Similar presentations
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Advertisements

Dense Matrix Algorithms. Topic Overview Matrix-Vector Multiplication Matrix-Matrix Multiplication Solving a System of Linear Equations.
Parallel Matrix Operations using MPI CPS 5401 Fall 2014 Shirley Moore, Instructor November 3,
Algebraic, transcendental (i.e., involving trigonometric and exponential functions), ordinary differential equations, or partial differential equations...
CS 484. Dense Matrix Algorithms There are two types of Matrices Dense (Full) Sparse We will consider matrices that are Dense Square.
Numerical Algorithms ITCS 4/5145 Parallel Computing UNC-Charlotte, B. Wilkinson, 2009.
MATH 685/ CSI 700/ OR 682 Lecture Notes
Solving Linear Systems (Numerical Recipes, Chap 2)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
SOLVING SYSTEMS OF LINEAR EQUATIONS. Overview A matrix consists of a rectangular array of elements represented by a single symbol (example: [A]). An individual.
CISE301_Topic3KFUPM1 SE301: Numerical Methods Topic 3: Solution of Systems of Linear Equations Lectures 12-17: KFUPM Read Chapter 9 of the textbook.
Rayan Alsemmeri Amseena Mansoor. LINEAR SYSTEMS Jacobi method is used to solve linear systems of the form Ax=b, where A is the square and invertible.
Numerical Algorithms Matrix multiplication
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
Numerical Algorithms • Matrix multiplication
Maths for Computer Graphics
CSCI 317 Mike Heroux1 Sparse Matrix Computations CSCI 317 Mike Heroux.
CS 584. Review n Systems of equations and finite element methods are related.
CS267 L24 Solving PDEs.1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 24: Solving Linear Systems arising from PDEs - I James Demmel.
1 Systems of Linear Equations Iterative Methods. 2 B. Iterative Methods 1.Jacobi method and Gauss Seidel 2.Relaxation method for iterative methods.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
ECIV 301 Programming & Graphics Numerical Methods for Engineers Lecture 14 Elimination Methods.
Design of parallel algorithms
Chapter 2 Matrices Definition of a matrix.
1/26 Design of parallel algorithms Linear equations Jari Porras.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
CS240A: Conjugate Gradients and the Model Problem.
Dense Matrix Algorithms CS 524 – High-Performance Computing.
ECIV 301 Programming & Graphics Numerical Methods for Engineers REVIEW II.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
PETE 603 Lecture Session #29 Thursday, 7/29/ Iterative Solution Methods Older methods, such as PSOR, and LSOR require user supplied iteration.
Arithmetic Operations on Matrices. 1. Definition of Matrix 2. Column, Row and Square Matrix 3. Addition and Subtraction of Matrices 4. Multiplying Row.
CE 311 K - Introduction to Computer Methods Daene C. McKinney
By Mary Hudachek-Buswell. Overview Atmospheric Turbulence Blur.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Conjugate gradients, sparse matrix-vector multiplication, graphs, and meshes Thanks to Aydin Buluc, Umit Catalyurek, Alan Edelman, and Kathy Yelick for.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
L11: Sparse Linear Algebra on GPUs CS Sparse Linear Algebra 1 L11: Sparse Linear Algebra CS6235
ECON 1150 Matrix Operations Special Matrices
Solving Scalar Linear Systems Iterative approach Lecture 15 MA/CS 471 Fall 2003.
1 Iterative Solution Methods Starts with an initial approximation for the solution vector (x 0 ) At each iteration updates the x vector by using the sytem.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Efficiency and Flexibility of Jagged Arrays Geir Gundersen Department of Informatics University of Bergen Norway Joint work with Trond Steihaug.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Lecture 7 - Systems of Equations CVEN 302 June 17, 2002.
Elliptic PDEs and the Finite Difference Method
JAVA AND MATRIX COMPUTATION
Solution of Sparse Linear Systems
Parallel Solution of the Poisson Problem Using MPI
CS240A: Conjugate Gradients and the Model Problem.
Linear Systems – Iterative methods
Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }
CS 484. Iterative Methods n Gaussian elimination is considered to be a direct method to solve a system. n An indirect method produces a sequence of values.
CS 290H Administrivia: May 14, 2008 Course project progress reports due next Wed 21 May. Reading in Saad (second edition): Sections
Programming Massively Parallel Graphics Multiprocessors using CUDA Final Project Amirhassan Asgari Kamiabad
9 Nov B - Introduction to Scientific Computing1 Sparse Systems and Iterative Methods Paul Heckbert Computer Science Department Carnegie Mellon.
Programming assignment # 3 Numerical Methods for PDEs Spring 2007 Jim E. Jones.
Matrices. Variety of engineering problems lead to the need to solve systems of linear equations matrixcolumn vectors.
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
1 Numerical Methods Solution of Systems of Linear Equations.
Numerical Algorithms Chapter 11.
MTH108 Business Math I Lecture 20.
Lecture 19 MA471 Fall 2003.
Parallel Matrix Operations
Numerical Algorithms • Parallelizing matrix multiplication
CSCE569 Parallel Computing
Solving Linear Systems: Iterative Methods and Sparse Systems
Jacobi Project Salvatore Orlando.
Linear Algebra Lecture 16.
Ax = b Methods for Solution of the System of Equations:
Presentation transcript:

Sparse Matrix Algorithms CS 524 – High-Performance Computing

CS 524 (Wi 2003/04)- Asim LUMS2 Sparse Matrices Sparse matrices have the majority of their elements equal to zero  More significantly, a matrix is considered sparse if a computation involving it can utilize the number and location of its nonzero elements to reduce the run time over the same computation on a dense matrix of the same size. The finite difference and finite element methods for solving partial differential equations arising from mathematical models of continuous domain require sparse matrix algorithms. For many problems, it is not necessary to explicitly assemble the sparse matrices. Rather, operations are done using specialized data structures.

CS 524 (Wi 2003/04)- Asim LUMS3 Storage Schemes for Sparse Matrices Store only the nonzero elements and their locations in the matrix  Saves memory, as storage used can be much less than n 2  May improve performance  Several storage schemes and data structures are available. Some may be better than others for a given algorithm. Common storage schemes  Coordinate format  Compressed sparse row format (CSR format)  Diagonal storage format  Ellpack-Itpack format  Jagged-diagonal format

CS 524 (Wi 2003/04)- Asim LUMS4 Coordinate Format VAL is q x 1 array of nonzero elements (in any order) I is q x 1 array of ith coordinate of element J is q x 1 array of jth coordinate of element

CS 524 (Wi 2003/04)- Asim LUMS5 Compressed Sparse Row (CSR) Format VAL is q x 1 array of nonzero elements stored in the order of their rows (however, within a row they can be stored in any order) J is q x 1 array of the column number of each nonzero element I is n x 1 array that points to the first entry of the ith row in VAL and J

CS 524 (Wi 2003/04)- Asim LUMS6 Diagonal Storage Format VAL is n x d array of nonzero elements in diagonals; d = # of diagonals (order of diagonals in VAL is not important) OFFSET is d x 1 array of offset of diagonal from principal diagonal Banded format: Uses VAL and parameters for thickness of band and its lower (or upper) limit.

CS 524 (Wi 2003/04)- Asim LUMS7 Ellpack-Itpack Format VAL is n x m array of nonzero elements in rows; m = max. # of nonzero elements in a row J is n x m array of column number of corresponding entry in VAL. A indicator value is used to indicate the end of a row

CS 524 (Wi 2003/04)- Asim LUMS8 Jagged-Diagonal Format (1)

CS 524 (Wi 2003/04)- Asim LUMS9 Jagged-Diagonal Format (2) Rows in matrix are ordered in decreasing number of nonzero elements VAL is q x 1 array of nonzero elements in each row in increasing column order. That is, the first nonzero element in each row is stored contiguously, then the second, and so on. J is q x 1 array of column number of corresponding entries in VAL I is m x 1 array of pointers to beginning of each jagged diagonal

CS 524 (Wi 2003/04)- Asim LUMS10 Vector Inner Product – p <= n y = x T x; x is a n x 1 vector and y is a scalar Data partitioning: Each processor has n/p elements of x Communication: global reduction sum of y (scalar) Computation: n/p multiplications + additions Parallel run time: T p = 2t c n/p + (t s + t w ) Algorithm is cost-optimal, as sequential and parallel cost is O(n) when p = O(n)

CS 524 (Wi 2003/04)- Asim LUMS11 Sparse Matrix-Vector Multiplication y = Ax, where A is sparse n x n matrix; x and y are vectors of dimension n Sparse MVM algorithms depend on the structure of the sparse matrix and the storage scheme used In many MVMs arising in science and engineering (e.g. from FDM solution of PDEs) the matrix has a block- tridiagonal structure Other common sparse matrix structures include banded sparse matrices and general unstructured matrices

CS 524 (Wi 2003/04)- Asim LUMS12 Block-Tridiagonal Matrix

CS 524 (Wi 2003/04)- Asim LUMS13 Striped Partitioning – p <= n (1) Data partitioning: Each processor has n/p rows of A and n/p elements of x Three components of the multiplication  Multiplication of principal diagonal. This requires no communication.  Multiplication of adjacent off-diagonals. These require exchange of border elements of vector x  Multiplication of outer block diagonals. These require communication of vector x depending on n and p

CS 524 (Wi 2003/04)- Asim LUMS14 Striped Partitioning – p <= n (2)

CS 524 (Wi 2003/04)- Asim LUMS15 Striped Partitioning – p <= n (3) Communication  Exchange of border elements of vector x  When p ≤ √n, exchange of √n elements of vector x between neighboring processors  When p > √n, P i exchanges n/p elements of vector x with processors with index i ± p/√n Computation: Except for the first and last √n rows, there are 5 nonzero elements. Thus, at most 5 multiplications + additions are needed per row

CS 524 (Wi 2003/04)- Asim LUMS16 Striped Partitioning – p <= n (4) Parallel run time when p ≤ √n  T p = 10t c n/p + 2[t s + t w √n] Parallel run time when p > √n  T p = 10t c n/p + 2[t s + t w ] + 2[t s + t w n/p]

CS 524 (Wi 2003/04)- Asim LUMS17 Partitioning the Grid (1)

CS 524 (Wi 2003/04)- Asim LUMS18 Partitioning the Grid (2) Data partitioning: Each processor has √(n/p) x √(n/p) rows of matrix A and √(n/p) x √(n/p) elements of vector x corresponding to the grid points Communication: Exchange of vector elements corresponding to the √(n/p) grid points with neighbors Parallel run time: T p = 10t c n/p + 4[t s + t w √(n/p)]

CS 524 (Wi 2003/04)- Asim LUMS19 Iterative Methods for Sparse Linear Systems Popular methods for solving sparse linear systems, Ax = b  Generates a sequence of approximations to vector x that converges to the solution of the system Characteristics  Number of iterations required is problem data dependent  Each iteration requires a matrix-vector multiplication  Generally faster than direct methods  Appropriate for sparse coefficient matrices Methods  Jacobi iterative  Gauss-Seidel and SOR  Conjugate gradient and preconditioned CG

CS 524 (Wi 2003/04)- Asim LUMS20 Jacobi Iterative Method Simple iterative procedure that is guaranteed to converge for diagonally dominated systems x k (i) = A -1 (i,i)[b(i) – Σ A(i,j)x k-1 (j)] Or x k (i) = r k-1 (i)/A(i,i) + x k-1 (i)  r k-1 is the residual at the k-1 iteration  r k-1 = b(i) – Σ A(i,j)x k-1 (j)

CS 524 (Wi 2003/04)- Asim LUMS21 Finite Difference Discretization i = 0 i = 1 i = 2 i = 3 i = N+1 j = 0 j = 1 j = 2 j = 3 j = N+1 i = 1i = 4 j = 1 j = Linearized order of unknowns in a 2-D grid Only internal grid points are unknown

CS 524 (Wi 2003/04)- Asim LUMS22 Jacobi Iterative Method – Sequential for (iter = 0; iter < maxiter; iter++) { for (i = 1; i <= N; i++) { for (j = 1; j <= N; j++) { xnew[i][j] = b[i][j] - odiag*(xold[i-1][j] + xold[i+1][j] + xold[i][j-1] + xold[i][j+1]); xnew[i][j] = xnew[i][j]/diag; } for (i = 1; i <= N; i++) { for (j = 1; j <= N; j++) resid[i][j] = b[i][j] - diag*xnew[i][j] \ - odiag*(xnew[i-1][j]+xnew[i+1][j]+xnew[i][j- 1]+xnew[i][j+1]); } for (i = 1; i <= N; i++) { for (j = 1; j <= N; j++) xold[i][j] = xnew[i][j]; }

CS 524 (Wi 2003/04)- Asim LUMS23 Conjugate Gradient Method Powerful iterative method for systems in which A is symmetric positive definite, i.e, x T Ax > 0 for any nonzero vector x Formulates the problem as a minimization problem  Solution of the system is found by minimizing q(x) = (1/2)x T Ax – x T b Iteration  x k = x k-1 + α k p k  r k = r k-1 – α k Ap k  p 1 = r 0 = b; p k+1 = r k + ||r k || 2 p k /||r k-1 || 2  α k = ||r k-1 || 2 /(p k ) T Ap k

CS 524 (Wi 2003/04)- Asim LUMS24 CG Method – Sequential (1) for (iter = 1; iter <= maxiter; iter++) { // Form v = Ap for (i = 1; i <= N; i++) { for (j = 1; j <= N; j++) v[i][j] = odiag*(p[i-1][j] + p[i+1][j] + p[i][j-1] + p[i][j+1]) + diag*p [i][j]; } // Form (r dot r), (v dot p) and a rdotr = 0.0; pdotv = 0.0; for (i = 1; i <= N; i++) { for (j = 1; j <= N; j++) { rdotr = rdotr + r[i][j]*r[i][j]; pdotv = pdotv + p[i][j]*v[i][j]; } a = rdotr/pdotv;

CS 524 (Wi 2003/04)- Asim LUMS25 CG Method – Sequential (2) // Update x and compute new residual rnew for (i = 1; i <= N; i++) { for (j = 1; j <= N; j++) { x[i][j] = x[i][j] + a*p[i][j]; rnew[i][j] = r[i][j] - a*v[i][j]; }} // Compute new residual and g rndotrn = 0.0; for (i = 1; i <= N; i++) { for (j = 1; j <= N; j++) rndotrn = rndotrn + rnew[i][j]*rnew[i][j]; } rhonew = sqrt(rndotrn); g = rndotrn/rdotr; // Compute new search direction p, and move rnew into r for (i = 1; i <= N; i++) { for (j = 1; j <= N; j++) { p[i][j] = rnew[i][j] + g*p[i][j]; r[i][j] = rnew[i][j]; }}