Symmetric-pattern multifrontal factorization T(A) 1 2 3 4 6 7 8 9 5 9 1 2 3 4 6 7 8 5 G(A)

Slides:



Advertisements
Similar presentations
Fill Reduction Algorithm Using Diagonal Markowitz Scheme with Local Symmetrization Patrick Amestoy ENSEEIHT-IRIT, France Xiaoye S. Li Esmond Ng Lawrence.
Advertisements

Lecture 3 Sparse Direct Method: Combinatorics Xiaoye Sherry Li Lawrence Berkeley National Laboratory, USA crd-legacy.lbl.gov/~xiaoye/G2S3/
ECE 552 Numerical Circuit Analysis Chapter Four SPARSE MATRIX SOLUTION TECHNIQUES Copyright © I. Hajj 2012 All rights reserved.
Linear Systems LU Factorization CSE 541 Roger Crawfis.
Siddharth Choudhary.  Refines a visual reconstruction to produce jointly optimal 3D structure and viewing parameters  ‘bundle’ refers to the bundle.
CS 240A: Solving Ax = b in parallel Dense A: Gaussian elimination with partial pivoting (LU) Same flavor as matrix * matrix, but more complicated Sparse.
MATH 685/ CSI 700/ OR 682 Lecture Notes
Sparse Matrices in Matlab John R. Gilbert Xerox Palo Alto Research Center with Cleve Moler (MathWorks) and Rob Schreiber (HP Labs)
SOLVING SYSTEMS OF LINEAR EQUATIONS. Overview A matrix consists of a rectangular array of elements represented by a single symbol (example: [A]). An individual.
Numerical Algorithms Matrix multiplication
Solution of linear system of equations
Symmetric Minimum Priority Ordering for Sparse Unsymmetric Factorization Patrick Amestoy ENSEEIHT-IRIT (Toulouse) Sherry Li LBNL/NERSC (Berkeley) Esmond.
1cs542g-term Notes  Assignment 1 is out (due October 5)  Matrix storage: usually column-major.
1cs542g-term Sparse matrix data structure  Typically either Compressed Sparse Row (CSR) or Compressed Sparse Column (CSC) Informally “ia-ja” format.
CS 290H: Sparse Matrix Algorithms
1cs542g-term Notes  Note that r 2 log(r) is NaN at r=0: instead smoothly extend to be 0 at r=0  Schedule a make-up lecture?
Symmetric Weighted Matching for Indefinite Systems Iain Duff, RAL and CERFACS John Gilbert, MIT and UC Santa Barbara June 21, 2002.
ECIV 301 Programming & Graphics Numerical Methods for Engineers Lecture 17 Solution of Systems of Equations.
Sparse Matrix Methods Day 1: Overview Day 2: Direct methods
The Landscape of Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage (if sparse) More Robust.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
CS 240A: Solving Ax = b in parallel °Dense A: Gaussian elimination with partial pivoting Same flavor as matrix * matrix, but more complicated °Sparse A:
Sparse Matrix Methods Day 1: Overview Day 2: Direct methods Nonsymmetric systems Graph theoretic tools Sparse LU with partial pivoting Supernodal factorization.
Sparse Matrix Methods Day 1: Overview Matlab and examples Data structures Ax=b Sparse matrices and graphs Fill-reducing matrix permutations Matching and.
Using Search in Problem Solving
ECIV 520 Structural Analysis II Review of Matrix Algebra.
6/22/2005ICS'20051 Parallel Sparse LU Factorization on Second-class Message Passing Platforms Kai Shen University of Rochester.
CS240A: Conjugate Gradients and the Model Problem.
Ramanujan Graphs of Every Degree Adam Marcus (Crisply, Yale) Daniel Spielman (Yale) Nikhil Srivastava (MSR India)
Mujahed AlDhaifallah (Term 342) Read Chapter 9 of the textbook
The Evolution of a Sparse Partial Pivoting Algorithm John R. Gilbert with: Tim Davis, Jim Demmel, Stan Eisenstat, Laura Grigori, Stefan Larimore, Sherry.
Sparse Direct Solvers on High Performance Computers X. Sherry Li CS267: Applications of Parallel Computers March.
CS 290H Lecture 17 Dulmage-Mendelsohn Theory
CS 290H Lecture 12 Column intersection graphs, Ordering for sparsity in LU with partial pivoting Read “Computing the block triangular form of a sparse.
MUMPS A Multifrontal Massively Parallel Solver IMPLEMENTATION Distributed multifrontal.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Scalabilities Issues in Sparse Factorization and Triangular Solution Sherry Li Lawrence Berkeley National Laboratory Sparse Days, CERFACS, June 23-24,
Complexity of direct methods n 1/2 n 1/3 2D3D Space (fill): O(n log n)O(n 4/3 ) Time (flops): O(n 3/2 )O(n 2 ) Time and space to solve any problem on any.
1 Iterative Solution Methods Starts with an initial approximation for the solution vector (x 0 ) At each iteration updates the x vector by using the sytem.
Introduction to Numerical Analysis I MATH/CMPSC 455 PA=LU.
Symbolic sparse Gaussian elimination: A = LU
Lecture 5 Parallel Sparse Factorization, Triangular Solution
The Landscape of Sparse Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage More Robust More.
Amesos Sparse Direct Solver Package Ken Stanley, Rob Hoekstra, Marzio Sala, Tim Davis, Mike Heroux Trilinos Users Group Albuquerque 3 Nov 2004.
CS 290H Lecture 5 Elimination trees Read GLN section 6.6 (next time I’ll assign 6.5 and 6.7) Homework 1 due Thursday 14 Oct by 3pm turnin file1.
Chapter 3 Solution of Algebraic Equations 1 ChE 401: Computational Techniques for Chemical Engineers Fall 2009/2010 DRAFT SLIDES.
 6.2 Pivoting Strategies 1/17 Chapter 6 Direct Methods for Solving Linear Systems -- Pivoting Strategies Example: Solve the linear system using 4-digit.
Chapter 5 MATRIX ALGEBRA: DETEMINANT, REVERSE, EIGENVALUES.
JAVA AND MATRIX COMPUTATION
Solution of Sparse Linear Systems
Lecture 4 Sparse Factorization: Data-flow Organization
CS240A: Conjugate Gradients and the Model Problem.
Direct Methods for Sparse Linear Systems Lecture 4 Alessandra Nardi Thanks to Prof. Jacob White, Suvranu De, Deepak Ramaswamy, Michal Rewienski, and Karen.
Department of Electronic Engineering, Tsinghua University Nano-scale Integrated Circuit and System Lab. Performance Analysis of Parallel Sparse LU Factorization.
Direct Methods for Linear Systems Lecture 3 Alessandra Nardi Thanks to Prof. Jacob White, Suvranu De, Deepak Ramaswamy, Michal Rewienski, and Karen Veroy.
Administrivia: October 5, 2009 Homework 1 due Wednesday Reading in Davis: Skim section 6.1 (the fill bounds will make more sense next week) Read section.
CS 290H Administrivia: May 14, 2008 Course project progress reports due next Wed 21 May. Reading in Saad (second edition): Sections
CS 290H 31 October and 2 November Support graph preconditioners Final projects: Read and present two related papers on a topic not covered in class Or,
CS 290H Lecture 15 GESP concluded Final presentations for survey projects next Tue and Thu 20-minute talk with at least 5 min for questions and discussion.
CS 290H Lecture 9 Left-looking LU with partial pivoting Read “A supernodal approach to sparse partial pivoting” (course reader #4), sections 1 through.
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
CS 290N / 219: Sparse Matrix Algorithms
Solving Linear Systems Ax=b
Linear Equations.
CS 290H Administrivia: April 16, 2008
The Landscape of Sparse Ax=b Solvers
CS 290H Lecture 3 Fill: bounds and heuristics
Read GLN sections 6.1 through 6.4.
Nonsymmetric Gaussian elimination
Ax = b Methods for Solution of the System of Equations (ReCap):
Presentation transcript:

Symmetric-pattern multifrontal factorization T(A) G(A)

Symmetric-pattern multifrontal factorization T(A) For each node of T from leaves to root: Sum own row/col of A with children’s Update matrices into Frontal matrix Eliminate current variable from Frontal matrix, to get Update matrix Pass Update matrix to parent G(A)

Symmetric-pattern multifrontal factorization T(A) F 1 = A 1 => U 1 For each node of T from leaves to root: Sum own row/col of A with children’s Update matrices into Frontal matrix Eliminate current variable from Frontal matrix, to get Update matrix Pass Update matrix to parent G(A)

Symmetric-pattern multifrontal factorization F 2 = A 2 => U F 1 = A 1 => U 1 For each node of T from leaves to root: Sum own row/col of A with children’s Update matrices into Frontal matrix Eliminate current variable from Frontal matrix, to get Update matrix Pass Update matrix to parent T(A) G(A)

Symmetric-pattern multifrontal factorization T(A) F 2 = A 2 => U F 1 = A 1 => U F 3 = A 3 +U 1 +U 2 => U G(A)

Symmetric-pattern multifrontal factorization T(A) G + (A)

Symmetric-pattern multifrontal factorization T(A) G(A) Really uses supernodes, not nodes All arithmetic happens on dense square matrices. Needs extra memory for a stack of pending update matrices Potential parallelism: 1.between independent tree branches 2.parallel dense ops on frontal matrix

MUMPS: distributed-memory multifrontal MUMPS: distributed-memory multifrontal [Amestoy, Duff, L’Excellent, Koster, Tuma] Symmetric-pattern multifrontal factorization Parallelism both from tree and by sharing dense ops Dynamic scheduling of dense op sharing Symmetric preordering For nonsymmetric matrices: optional weighted matching for heavy diagonal expand nonzero pattern to be symmetric numerical pivoting only within supernodes if possible (doesn’t change pattern) failed pivots are passed up the tree in the update matrix

SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization

SuperLU-dist: Distributed static data structure Process (or) mesh L U Block cyclic matrix layout

GESP: Gaussian elimination with static pivoting PA = LU Sparse, nonsymmetric A P is chosen numerically in advance, not by partial pivoting! After choosing P, can permute PA symmetrically for sparsity: Q(PA)Q T = LU = x P

SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

Row permutation for heavy diagonal Row permutation for heavy diagonal [Duff, Koster] Represent A as a weighted, undirected bipartite graph (one node for each row and one node for each column) Find matching (set of independent edges) with maximum product of weights Permute rows to place matching on diagonal Matching algorithm also gives a row and column scaling to make all diag elts =1 and all off-diag elts <= A PA

SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

Iterative refinement to improve solution Iterate: r = b – A*x backerr = max i ( r i / (|A|*|x| + |b|) i ) if backerr lasterr/2 then stop iterating solve L*U*dx = r x = x + dx lasterr = backerr repeat Usually 0 – 3 steps are enough

Convergence analysis of iterative refinement Let C = I – A(LU) -1 [ so A = (I – C)·(LU) ] x 1 = (LU) -1 b r 1 = b – Ax 1 = (I – A(LU) -1 )b = Cb dx 1 = (LU) -1 r 1 = (LU) -1 Cb x 2 = x 1 +dx 1 = (LU) -1 (I + C)b r 2 = b – Ax 2 = (I – (I – C)·(I + C))b = C 2 b... In general, r k = b – Ax k = C k b Thus r k  0 if |largest eigenvalue of C| < 1.

SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

Directed graph A is square, unsymmetric, nonzero diagonal Edges from rows to columns Symmetric permutations PAP T AG(A)

Undirected graph, ignoring edge directions Overestimates the nonzero structure of A Sparse GESP can use symmetric permutations (min degree, nested dissection) of this graph A+A T G(A+A T )

Symbolic factorization of undirected graph Overestimates the nonzero structure of L+U chol(A +A T )G + (A+A T )

+ Symbolic factorization of directed graph Add fill edge a -> b if there is a path from a to b through lower-numbered vertices. Sparser than G + (A+A T ) in general. But what’s a good ordering for G + (A)? AG (A) L+U

Question: Preordering for GESP Use directed graph model, less well understood than symmetric factorization Symmetric: bottom-up, top-down, hybrids Nonsymmetric: mostly bottom-up Symmetric: best ordering is NP-complete, but approximation theory is based on graph partitioning (separators) Nonsymmetric: no approximation theory is known; partitioning is not the whole story Good approximations and efficient algorithms both remain to be discovered