Direct and Iterative Methods for Sparse Linear Systems

Slides:



Advertisements
Similar presentations
Optimization.
Advertisements

Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.
MATH 685/ CSI 700/ OR 682 Lecture Notes
Linear Systems of Equations
Solving Linear Systems (Numerical Recipes, Chap 2)
SOLVING SYSTEMS OF LINEAR EQUATIONS. Overview A matrix consists of a rectangular array of elements represented by a single symbol (example: [A]). An individual.
Rayan Alsemmeri Amseena Mansoor. LINEAR SYSTEMS Jacobi method is used to solve linear systems of the form Ax=b, where A is the square and invertible.
Iterative Methods and QR Factorization Lecture 5 Alessandra Nardi Thanks to Prof. Jacob White, Suvranu De, Deepak Ramaswamy, Michal Rewienski, and Karen.
Steepest Decent and Conjugate Gradients (CG). Solving of the linear equation system.
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Jonathan Richard Shewchuk Reading Group Presention By David Cline
Function Optimization Newton’s Method. Conjugate Gradients
Tutorial 12 Unconstrained optimization Conjugate gradients.
Shawn Sickel A Comparison of some Iterative Methods in Scientific Computing.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
Gradient Methods May Preview Background Steepest Descent Conjugate Gradient.
Ch 7.3: Systems of Linear Equations, Linear Independence, Eigenvalues
1 A Matrix Free Newton /Krylov Method For Coupling Complex Multi-Physics Subsystems Yunlin Xu School of Nuclear Engineering Purdue University October 23,
The Landscape of Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage (if sparse) More Robust.
Tutorial 5-6 Function Optimization. Line Search. Taylor Series for Rn
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
CS240A: Conjugate Gradients and the Model Problem.
Function Optimization. Newton’s Method Conjugate Gradients Method
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
CSE 245: Computer Aided Circuit Simulation and Verification
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 6. Eigenvalue problems.

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
UNCONSTRAINED MULTIVARIABLE
Systems of Linear Equations Iterative Methods
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Qualifier Exam in HPC February 10 th, Quasi-Newton methods Alexandru Cioaca.
Nonlinear programming Unconstrained optimization techniques.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.
1 Incorporating Iterative Refinement with Sparse Cholesky April 2007 Doron Pearl.
Elliptic PDEs and the Finite Difference Method
CSE 245: Computer Aided Circuit Simulation and Verification Matrix Computations: Iterative Methods I Chung-Kuan Cheng.
Parallel Solution of the Poisson Problem Using MPI
CS240A: Conjugate Gradients and the Model Problem.
Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }
Direct Methods for Sparse Linear Systems Lecture 4 Alessandra Nardi Thanks to Prof. Jacob White, Suvranu De, Deepak Ramaswamy, Michal Rewienski, and Karen.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Consider Preconditioning – Basic Principles Basic Idea: is to use Krylov subspace method (CG, GMRES, MINRES …) on a modified system such as The matrix.
Krylov-Subspace Methods - I Lecture 6 Alessandra Nardi Thanks to Prof. Jacob White, Deepak Ramaswamy, Michal Rewienski, and Karen Veroy.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
MA237: Linear Algebra I Chapters 1 and 2: What have we learned?
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
The Landscape of Sparse Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage More Robust More.
Krylov-Subspace Methods - I
Solving Systems of Linear Equations: Iterative Methods
A Comparison of some Iterative Methods in Scientific Computing
CSE 245: Computer Aided Circuit Simulation and Verification
CSE 245: Computer Aided Circuit Simulation and Verification
Conjugate Gradient Method
CS5321 Numerical Optimization
~ Least Squares example
Numerical Linear Algebra
~ Least Squares example
Performance Optimization
Presentation transcript:

Direct and Iterative Methods for Sparse Linear Systems Shirley Moore svmoore@utep.edu CPS5401 Fall 2015 svmoore.pbworks.com December 1, 2015

Learning Objectives Describe advantages and disadvantages of direct and iterative methods for solving sparse linear systems. Apply appropriate method from a solver library to solve a particular sparse linear system, including both symmetric positive definite and nonsymmetric matrices. Be able to find and make use of documentation on sparse solver libraries.

Direct vs. Iterative Methods In a direct method, the matrix of the initial linear system is transformed or factorized into a simpler form, which can be solved easily. The exact solution is obtained in a finite number of arithmetic operations, if not considering numerical rounding errors. Iterative methods compute a sequence of approximate solutions, which converges to the exact solution in the limit, i.e., in practice until a desired accuracy is obtained.

Direct vs. Iterative Methods (cont.) Direct methods have been preferred to iterative methods for solving linear systems, mainly because of their simplicity and robustness. However, the emergence of conjugate gradient methods and Krylov subspace iterations has provided an efficient alternative to direct solvers. Nowadays, iterative methods are almost mandatory in complex applications, mainly because of memory and computational requirements that prohibit the use of direct methods. Iterative methods usually involve a matrix-vector multiplication procedure that is cheap to compute on modern computer architectures. When the matrix A is very large and is composed of a majority of nonzero elements, the LU factorization would contain many more nonzero coefficients than the matrix A itself. Nonetheless, in some peculiar applications, very ill-conditioned matrices arise that may require a direct method for solving the problem at hand.

Direct Solvers for Sparse Linear Systems Direct solvers for sparse matrices involve much more complicated algorithms than for dense matrices. The main complication is due to the need for efficient handling the fill-in in the factors L and U. A typical sparse solver consists of four distinct steps as opposed to two in the dense case: An ordering step that reorders the rows and columns so that the factors suffer little fill, or so that the matrix has special structure such as block triangular form. An analysis step or symbolic factorization that determines the nonzero structures of the factors and creates suitable data structures for the factors. Numerical factorization that computes the L and U factors. A solve step that performs forward and back substitution using the factors.

Direct Solver Packages SuperLU http://crd-legacy.lbl.gov/~xiaoye/SuperLU/ SuperLU for sequential machines SuperLU_MT for shared memory parallel machines SuperLU_DIST for distributed memory parallel machines See survey of direct solvers by Xiaoye Li http://crd-legacy.lbl.gov/~xiaoye/SuperLU/SparseDirectSurvey.pdf See also research by Tim Davis http://www.cise.ufl.edu/~davis/welcome.html

Iterative Methods Use successive approximations to obtain more accurate solutions to a linear system Suitable for large sparse linear systems Stationary methods are older, simple to understand and implement, not as effective Perform same operation each iteration Nonstationary methods are more recent, harder to understand, highly effective Have iteration-dependent coefficients Typically use a transformation matrix called a preconditioner that improves convergence of the method Reference: Templates book http://www.netlib.org/linalg/html_templates/report.html

Stationary Methods Jacobi Gauss-Seidel Successive Overrelaxation (SOR) Based on solving for every variable locally with respect to the other variables One iteration of the method corresponds to solving for every variable once. Resulting method is easy to understand and implement, but convergence is slow. Gauss-Seidel  Like the Jacobi method, except that it uses updated values as soon as they are available In general, if the Jacobi method converges, the Gauss-Seidel method will converge faster than the Jacobi method, though still relatively slowly. Successive Overrelaxation (SOR) Can be derived from the Gauss-Seidel method by introducing an extrapolation parameter ω For the optimal choice of ω, SOR may converge faster than Gauss-Seidel by an order of magnitude. Symmetric Successive Overrelaxation (SSOR) No advantage over SOR as a stand-alone iterative method However, it is useful as a preconditioner for nonstationary methods.

Nonstationary Methods Conjugate Gradient (CG ) The conjugate gradient method derives its name from the fact that it generates a sequence of conjugate (or orthogonal) vectors. These vectors are the residuals of the iterates. They are also the gradients of a quadratic functional, the minimization of which is equivalent to solving the linear system. CG is an extremely effective method when the coefficient matrix is symmetric positive definite, since storage for only a limited number of vectors is required. Minimum Residual (MINRES ) and Symmetric LQ (SYMMLQ ) These methods are computational alternatives for CG for coefficient matrices that are symmetric but possibly indefinite. SYMMLQ will generate the same solution iterates as CG if the coefficient matrix is symmetric positive definite. Conjugate Gradient on the Normal Equations : CGNE  and CGNR These methods are based on the application of the CG method to one of two forms of the normal equations  When the coefficient matrix is nonsymmetric and nonsingular, the normal equations matrices will be symmetric and positive definite, and hence CG can be applied. The convergence may be slow, since the spectrum of the normal equations matrices will be less favorable.

Nonstationary Methods (cont.) Generalized Minimal Residual (GMRES ) The Generalized Minimal Residual method computes a sequence of orthogonal vectors (like MINRES), and combines these through a least-squares solve and update. However, unlike MINRES (and CG) it requires storing the whole sequence, so that a large amount of storage is needed. For this reason, restarted versions of this method are used. In restarted versions, computation and storage costs are limited by specifying a fixed number of vectors to be generated. This method is useful for general nonsymmetric matrices. BiConjugate Gradient (BiCG ) The Biconjugate Gradient method generates two CG-like sequences of vectors, one based on a system with the original coefficient matrix A , and one on AT. Instead of orthogonalizing each sequence, they are made mutually orthogonal, or ``bi-orthogonal''. This method, like CG, uses limited storage. It is useful when the matrix is nonsymmetric and nonsingular; however, convergence may be irregular, and there is a possibility that the method will break down. BiCG requires a multiplication with the coefficient matrix and with its transpose at each iteration. Quasi-Minimal Residual (QMR ) The Quasi-Minimal Residual method applies a least-squares solve and update to the BiCG residuals, thereby smoothing out the irregular convergence behavior of BiCG, which may lead to more reliable approximations. In full glory, it has a look ahead strategy built in that avoids the BiCG breakdown. Even without look ahead, QMR largely avoids the breakdown that can occur in BiCG. On the other hand, it does not effect a true minimization of either the error or the residual, and while it converges smoothly, it often does not improve on the BiCG in terms of the number of iteration steps.

Nonstationary Methods (cont.) Conjugate Gradient Squared (CGS ) The Conjugate Gradient Squared method is a variant of BiCG that applies the updating operations for the A-sequence and the AT-sequences both to the same vectors. Ideally, this would double the convergence rate, but in practice convergence may be much more irregular than for BiCG, which may sometimes lead to unreliable results. A practical advantage is that the method does not need the multiplications with the transpose of the coefficient matrix. Biconjugate Gradient Stabilized (Bi-CGSTAB ) The Biconjugate Gradient Stabilized method is a variant of BiCG, like CGS, but using different updates for the -sequence in order to obtain smoother convergence than CGS. Chebyshev  Iteration The Chebyshev Iteration recursively determines polynomials with coefficients chosen to minimize the norm of the residual in a min-max sense. The coefficient matrix must be positive definite and knowledge of the extremal eigenvalues is required. This method has the advantage of requiring no inner products.

Conjugate Gradient Method Popular iterative method for solving large systems of sparse linear equations Ax=b A is known, square, symmetric, positive-definite Reference: Shewchuk

Quadratic Form Quadratic form If A is symmetric and positive-definite, f(x) is minimized by the solution to Ax=b A=AT

Method of Steepest Descent Choose the direction in which f decreases the most: the direction opposite to f’(x(i)) Error Error e(i) = x(i) – x is a vector that indicates how far we are from the solution. residual r(i) = b – Ax(i) indicates how far we are from the correct value of b. Think of the residual as the error transformed by A into the same space as b. Think of the residual as the direction of steepest descent. Residual Residual as the direction of steepest descent

Line Search  = 0 line search chooses α to minimize f along a line α minimizes f when the directional derivative is equal to zero Setting this to zero, α should be chosen so that r(0) and f’x(1) are orthogonal Bottom figure:f is minimized where the projection of the gradient onto the line is zero

Summary b – A b – A( ) Starting at [-2,-2] Converges at [2,-2] Each gradient is orthogonal to the previous gradient. Premultiply by –A and add b to eliminate one matrix-vector multiply

Convergence Analysis Two cases where one-step convergence is possible e(i) is the eigenvector (of A) All eigenvalues of A are the same

Convergence Analysis (cont) In general, depends on condition number of A and the slope of e(0) (relative to coord. system of eigenvectors) at the starting point

Method of Conjugate Directions Idea: Pick a set of n orthogonal directions. In each direction, take exactly one step. After n steps, we’ll be done   Steepest descent can take steps in the same direction as earlier steps. But can’t compute α(i) without knowing e(i+1) in which case we would already know the solution. Solution: make the search direction A-orthogonal (also called A-conjugate) instead of orthogonal. New requirement is that d(i) and e(i+1) be A-orthogonal. Algorithm should look like: But we don’t know e(i) !

Conjugate Directions Make the search directions {d(i)}A-orthogonal Def: A-orthogonal Make the search directions {d(i)}A-orthogonal A And require Equivalent to finding the minimum point along the search direction d(i) To see this, set the directional derivative equal to zero and A-orthogonal Find minimum along d(i)

Conjugate Directions To get stepsize Algorithm Summary

Proof Can this really solve x in n steps? [since we’ve changed orthogonal to A-orthogonal?] Intuition: Each step of conjugate directions eliminates one A-orthogonal component of error

represent error in basis of {d(i)} Initial error represent error in basis of {d(i)} Initial error e(0) can be expressed as a sum of A-orthogonal components. Each step of conjugate directions eliminates one of these components. After n iterations, every component is cut away and e(n) = 0. After n iterations, e(n) = 0

To find A-orthogonal directions Conjugate Gram-Schmidt process n linearly independent vectors To construct d(i), take ui and subtract out any components that are not A-orthogonal to the previous d vectors e.g. u1 is comprised of two components: u* which is A-orthogonal to d(0) and u+ which is parallel to d(0). After conjugation, only the A-orthogonal portion remains. Difficulty: need to keep all the old search directions to construct the new one!

Conjugate Gradients Conjugate directions where the search directions are constructed by conjugation of the residuals Setting ui = r(i) Reasons: residuals are independent, orthogonal to previous search directions, …

Method of Conjugate Gradients Each new residual is orthogonal to all the previous residuals and search directions. Each new search direction is constructed to be A-orthogonal to all the previous residuals and search directions. d(2) is a linear combination of r(2) and d(1). Krylov subspace Krylov subspace is a subspace formed by repeatedly applying a matrix to a vector.

Reduction of Space Complexity Conjugate Gradients is a misnomer: the gradients are not conjugate and the conjugate directions are not all gradients. “Conjugated Gradients” would be more accurate. Let

Convergence Analysis of Conjugate Gradients With exact arithmetic, CG is complete after n iterations. Accumulated roundoff error causes residual to gradually lose accuracy, and cancellation error causes search vectors to lose A-orthogonality. CG is useful for problems so large that is not feasible to run even n iterations. CG is quicker if there are duplicated eigenvalues. CG converges more quickly when eigenvalues are clustered together.

Convergence Analysis of Conjugate Gradients (cont.) See Appendix C3 of Shewchuk

Time and Space Complexity Shewchuk, p. 38 How does this compare to the time and space complexity of Gaussian elimination?

Preconditioning

Transformed Preconditioned Conjugate Gradient Method

Untransformed Preconditioned Conjugate Gradient Method Effectiveness of preconditioner determined by condition number of M^(-1)A. For large-scale problems, CG should almost always be used with a preconditioner.

History of Conjugate Gradient Method Conjugate Direction methods were probably first presented by Schmidt [14] in 1908, and were independently reinvented by Fox, Huskey, and Wilkinson [7] in 1948. In the early fifties, the method of Conjugate Gradients was discovered independently by Hestenes [10] and Stiefel [15]; shortly thereafter, they jointly published what is considered the seminal reference on CG [11]. Convergence bounds for CG in terms of Chebyshev polynomials were developed by Kaniel [12]. A more thorough analysis of CG convergence is provided by van der Sluis and van der Vorst [16]. CG was popularized as an iterative method for large, sparse matrices by Reid [13] in 1971. CG was generalized to nonlinear problems in 1964 by Fletcher and Reeves [6], based on work by Davidon [4] and Fletcher and Powell [5]. Convergence of nonlinear CG with inexact line searches was analyzed by Daniel [3]. A history and extensive annotated bibliography of CG to the mid-seventies is provided by Golub and O’Leary [9]. Most research since that time has focused on nonsymmetric systems.

Templates section on Conjugate Gradient Method http://www.netlib.org/linalg/html_templates/node20.html

MATLAB Preconditioned Conjugate Gradients Method http://www.mathworks.com/help/matlab/ref/pcg.html

Generalized Minimal Residual (GMRES) Method Generalizes MINRES to unsymmetric systems Templates book: http://www.netlib.org/linalg/html_templates/node29.html

MATLAB documentation on GMRES http://www.mathworks.com/help/matlab/ref/gmres.html

Iterative Method Packages Included in PETSc http://www.mcs.anl.gov/petsc/ Sparse linear solvers http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html ITSOL and pARMS http://www-users.cs.umn.edu/~saad/software/ Dongarra’s survey of freely available linear algebra software http://www.netlib.org/utk/people/JackDongarra/la-sw.html Vendor libraries IBM ESSL/PESSL http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.essl.v5r1.essl100.doc%2Fam501_smsubs.htm Cray-supported version of PETSc Numerical Algorithms Group (NAG) http://www.nag.com/numeric/FL/decisiontrees.asp