Support-Graph Preconditioning John R. Gilbert MIT and U. C. Santa Barbara coauthors: Marshall Bern, Bruce Hendrickson, Nhat Nguyen, Sivan Toledo authors.

Slides:



Advertisements
Similar presentations
05/11/2005 Carnegie Mellon School of Computer Science Aladdin Lamps 05 Combinatorial and algebraic tools for multigrid Yiannis Koutis Computer Science.
Advertisements

Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
The Combinatorial Multigrid Solver Yiannis Koutis, Gary Miller Carnegie Mellon University TexPoint fonts used in EMF. Read the TexPoint manual before you.
Solving linear systems through nested dissection Noga Alon Tel Aviv University Raphael Yuster University of Haifa.
CS 240A: Solving Ax = b in parallel Dense A: Gaussian elimination with partial pivoting (LU) Same flavor as matrix * matrix, but more complicated Sparse.
CS 290H 7 November Introduction to multigrid methods
MATH 685/ CSI 700/ OR 682 Lecture Notes
Solving Linear Systems (Numerical Recipes, Chap 2)
Systems of Linear Equations
Iterative methods TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A A A A.
Numerical Algorithms Matrix multiplication
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
1cs542g-term Notes  Assignment 1 is out (due October 5)  Matrix storage: usually column-major.
CSCI 317 Mike Heroux1 Sparse Matrix Computations CSCI 317 Mike Heroux.
1cs542g-term Sparse matrix data structure  Typically either Compressed Sparse Row (CSR) or Compressed Sparse Column (CSC) Informally “ia-ja” format.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
Sparse Matrix Methods Day 1: Overview Day 2: Direct methods
The Landscape of Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage (if sparse) More Robust.
Graph Algorithms in Numerical Linear Algebra: Past, Present, and Future John R. Gilbert MIT and UC Santa Barbara September 28, 2002.
Sparse Matrix Methods Day 1: Overview Matlab and examples Data structures Ax=b Sparse matrices and graphs Fill-reducing matrix permutations Matching and.
CS240A: Conjugate Gradients and the Model Problem.
ECE 552 Numerical Circuit Analysis Chapter Five RELAXATION OR ITERATIVE TECHNIQUES FOR THE SOLUTION OF LINEAR EQUATIONS Copyright © I. Hajj 2012 All rights.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Conjugate gradients, sparse matrix-vector multiplication, graphs, and meshes Thanks to Aydin Buluc, Umit Catalyurek, Alan Edelman, and Kathy Yelick for.
C&O 355 Mathematical Programming Fall 2010 Lecture 17 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.
Scientific Computing Linear Systems – LU Factorization.
Linear Algebra and Complexity Chris Dickson CAS Advanced Topics in Combinatorial Optimization McMaster University, January 23, 2006.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Fast, Randomized Algorithms for Partitioning, Sparsification, and
Complexity of direct methods n 1/2 n 1/3 2D3D Space (fill): O(n log n)O(n 4/3 ) Time (flops): O(n 3/2 )O(n 2 ) Time and space to solve any problem on any.
The Landscape of Sparse Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage More Robust More.
Scientific Computing Partial Differential Equations Poisson Equation.
CS 290H Lecture 5 Elimination trees Read GLN section 6.6 (next time I’ll assign 6.5 and 6.7) Homework 1 due Thursday 14 Oct by 3pm turnin file1.
CS 219: Sparse Matrix Algorithms
Télécom 2A – Algo Complexity (1) Time Complexity and the divide and conquer strategy Or : how to measure algorithm run-time And : design efficient algorithms.
 6.2 Pivoting Strategies 1/17 Chapter 6 Direct Methods for Solving Linear Systems -- Pivoting Strategies Example: Solve the linear system using 4-digit.
Lecture 8 Matrix Inverse and LU Decomposition
Administrivia: May 20, 2013 Course project progress reports due Wednesday. Reading in Multigrid Tutorial: Chapters 3-4: Multigrid cycles and implementation.
1 Incorporating Iterative Refinement with Sparse Cholesky April 2007 Doron Pearl.
Discrete Algorithms & Math Department Preconditioning ‘03 Algebraic Tools for Analyzing Preconditioners Bruce Hendrickson Erik Boman Sandia National Labs.
CSE 245: Computer Aided Circuit Simulation and Verification Matrix Computations: Iterative Methods I Chung-Kuan Cheng.
Parallel Solution of the Poisson Problem Using MPI
CS240A: Conjugate Gradients and the Model Problem.
Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
CS 290H Administrivia: May 14, 2008 Course project progress reports due next Wed 21 May. Reading in Saad (second edition): Sections
1 Chapter 7 Numerical Methods for the Solution of Systems of Equations.
CS 290H 31 October and 2 November Support graph preconditioners Final projects: Read and present two related papers on a topic not covered in class Or,
CS 290H Lecture 15 GESP concluded Final presentations for survey projects next Tue and Thu 20-minute talk with at least 5 min for questions and discussion.
Consider Preconditioning – Basic Principles Basic Idea: is to use Krylov subspace method (CG, GMRES, MINRES …) on a modified system such as The matrix.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
1 Algebraic and combinatorial tools for optimal multilevel algorithms Yiannis Koutis Carnegie Mellon University.
C&O 355 Lecture 19 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.
A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering
Symmetric-pattern multifrontal factorization T(A) G(A)
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.
Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.
The Landscape of Sparse Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage More Robust More.
Model Problem: Solving Poisson’s equation for temperature
CS 290N / 219: Sparse Matrix Algorithms
Solving Linear Systems Ax=b
CS 290H Administrivia: April 16, 2008
Parallel Algorithm Design using Spectral Graph Theory
A robust preconditioner for the conjugate gradient method
CS 290H Lecture 3 Fill: bounds and heuristics
Numerical Linear Algebra
Read GLN sections 6.1 through 6.4.
Administrivia: November 9, 2009
Presentation transcript:

Support-Graph Preconditioning John R. Gilbert MIT and U. C. Santa Barbara coauthors: Marshall Bern, Bruce Hendrickson, Nhat Nguyen, Sivan Toledo authors of other work surveyed: Erik Boman, Doron Chen, Keith Gremban, Bruce Hendrickson, Gary Miller, Sivan Toledo, Pravin Vaidya, Marco Zagha

Outline The problem: Preconditioning linear systems Support graphs: Vaidya’s algorithm Analysis of two preconditioners New directions and open problems

Systems of Linear Equations: Ax = b A is large & sparse say n = 10 5 to 10 8, # nonzeros = O(n) Physical setting sometimes implies G(A) has separators O(n 1/2 ) in 2D, O(n 2/3 ) in 3D Here: A is symmetric and positive (semi) definite Direct methods: A=LU Iterative methods: y (k+1) = Ay (k)

Graphs and Sparse Matrices Graphs and Sparse Matrices [Parter, … ] G(A) G + (A) [chordal] Symmetric Gaussian elimination: for j = 1 to n add edges between j’s higher-numbered neighbors Fill: new nonzeros in factor

Fill-reducing matrix permutations Theory: approx optimal separators => approx optimal fill and flop count Orderings: nested dissection, minimum degree, hybrids Graph partitioning: spectral, geometric, multilevel

Conjugate Gradients: an iterative method Each step does one matrix-vector multiplication, O(#nonzeros) work In exact arithmetic, CG converges in n steps (completely unrealistic) Condition number κ (A) = ||A|| 2 ||A -1 || 2 = λ max (A) / λ min (A) CG needs O( κ 1/2 (A)) iterations to “solve” Ax=b. Preconditioner B: Solve B -1 Ax=B -1 b instead of Ax=b Want κ (B -1 A) to be small Want By=c to be easily solved

Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0 = 0, r 0 = b, p 0 = r 0 for k = 1, 2, 3,... α k = (r T k-1 r k-1 ) / (p T k-1 Ap k-1 ) step length x k = x k-1 + α k p k-1 approx solution r k = r k-1 – α k Ap k-1 residual β k = (r T k r k ) / (r T k-1 r k-1 ) improvement p k = r k + β k p k-1 search direction

Conjugate gradient: Convergence In exact arithmetic, CG converges in n steps (completely unrealistic!!) Accuracy after k steps of CG is related to: consider polynomials of degree k that are equal to 1 at 0. how small can such a polynomial be at all the eigenvalues of A? Thus, eigenvalues close together are good. Condition number: κ (A) = ||A|| 2 ||A -1 || 2 = λ max (A) / λ min (A) Residual is reduced by a constant factor by O(κ 1/2 (A)) iterations of CG.

Preconditioners Suppose you had a matrix B such that: (1) condition number κ (B -1 A) is small (2) By = z is easy to solve Then you could solve (B -1 A)x = B -1 b instead of Ax = b (actually (B -1/2 AB -1/2 ) B 1/2 x = B -1/2 b, but never mind) B = A is great for (1), not for (2) B = I is great for (2), not for (1)

Incomplete Cholesky factorization (IC, ILU) Compute factors of A by Gaussian elimination, but ignore fill Preconditioner B = R T R  A, not formed explicitly Compute B -1 z by triangular solves (in time nnz(A)) Total storage is O(nnz(A)), static data structure Either symmetric (IC) or nonsymmetric (ILU) x ARTRT R

Incomplete Cholesky and ILU: Variants Allow one or more “levels of fill” unpredictable storage requirements Allow fill whose magnitude exceeds a “drop tolerance” may get better approximate factors than levels of fill unpredictable storage requirements choice of tolerance is ad hoc Partial pivoting (for nonsymmetric A) “Modified ILU” (MIC): Add dropped fill to diagonal of U or R A and R T R have same row sums good in some PDE contexts

Incomplete Cholesky and ILU: Issues Choice of parameters good: smooth transition from iterative to direct methods bad: very ad hoc, problem-dependent tradeoff: time per iteration (more fill => more time) vs # of iterations (more fill => fewer iters) Effectiveness condition number usually improves (only) by constant factor (except MIC for some problems from PDEs) still, often good when tuned for a particular class of problems Parallelism Triangular solves are not very parallel Reordering for parallel triangular solve by graph coloring

Sparse approximate inverses Compute B -1  A explicitly Minimize || B -1 A – I || F (in parallel, by columns) Variants: factored form of B -1, more fill,.. Good: very parallel Bad: effectiveness varies widely AB -1

Support Graph Preconditioning +: New analytic tools, some new preconditioners +: Can use existing direct-methods software -: Current theory and techniques limited Define a preconditioner B for matrix A Explicitly compute the factorization B = LU Choose nonzero structure of B to make factoring cheap (using combinatorial tools from direct methods) Prove bounds on condition number using both algebraic and combinatorial tools

Spanning Tree Preconditioner Spanning Tree Preconditioner [Vaidya] A is symmetric positive definite with negative off-diagonal nzs B is a maximum-weight spanning tree for A (with diagonal modified to preserve row sums) factor B in O(n) space and O(n) time applying the preconditioner costs O(n) time per iteration G(A) G(B)

Spanning Tree Preconditioner Spanning Tree Preconditioner [Vaidya] support each edge of A by a path in B dilation(A edge) = length of supporting path in B congestion(B edge) = # of supported A edges p = max congestion, q = max dilation condition number κ (B -1 A) bounded by p·q (at most O(n 2 )) G(A) G(B)

Spanning Tree Preconditioner Spanning Tree Preconditioner [Vaidya] can improve congestion and dilation by adding a few strategically chosen edges to B cost of factor+solve is O(n 1.75 ), or O(n 1.2 ) if A is planar in experiments by Chen & Toledo, often better than drop-tolerance MIC for 2D problems, but not for 3D. G(A) G(B)

Support Graphs Support Graphs [after Gremban/Miller/Zagha] Intuition from resistive networks: How much must you amplify B to provide the conductivity of A? The support of B for A is σ(A, B) = min{ τ : x T (tB – A)x  0 for all x, all t  τ } In the SPD case, σ(A, B) = max{ λ : Ax = λ Bx} = λ max (A, B) Theorem: If A, B are SPD then κ (B -1 A) = σ(A, B) · σ(B, A)

Splitting and Congestion/Dilation Lemmas Split A = A 1 + A 2 + ··· + A k and B = B 1 + B 2 + ··· + B k A i and B i are positive semidefinite Typically they correspond to pieces of the graphs of A and B (edge, path, small subgraph) Lemma: σ(A, B)  max i {σ(A i, B i )} Lemma: σ(edge, path)  (worst weight ratio) · (path length) In the MST case: A i is an edge and B i is a path, to give σ(A, B)  p·q B i is an edge and A i is the same edge, to give σ(B, A)  1

Support-graph analysis of modified incomplete Cholesky B has positive (dotted) edges that cancel fill B has same row sums as A Strategy: Use the negative edges of B to support both the negative edges of A and the positive edges of B A A = 2D model Poisson problem B B = MIC preconditioner for A

Supporting positive edges of B Every dotted (positive) edge in B is supported by two paths in B Each solid edge of B supports one or two dotted edges Tune fractions to support each dotted edge exactly 1/(2  n – 2) of each solid edge is left over to support an edge of A

Analysis of MIC: Summary Each edge of A is supported by the leftover 1/(2  n – 2) fraction of the same edge of B. Therefore σ(A, B)  2  n – 2 Easy to show σ(B, A)  1 For this 2D model problem, condition number is O(n 1/2 ) Similar argument in 3D gives condition number O(n 1/3 ) or O(n 2/3 ) (depending on boundary conditions)

Support-graph analysis for better preconditioners? For model problems, the preconditioners analyzed so far are not as efficient as multigrid, domain decomposition, etc. Gremban/Miller: a hierarchical support-graph preconditioner, but condition number still not polylogarithmic. We analyze a multilevel-diagonal-scaling-like preconditioner in 1D but we haven’t proved tight bounds in higher dimensions.

Hierarchical preconditioner (1D model problem) Good support in both directions: σ(A, B) = σ(B, A) = O(1) But, B is a mesh => expensive to factor and to apply A B

Hierarchical preconditioner continued Drop fill in factor of B (i.e. add positive edges) => factoring and preconditioning are cheap But B cannot support both A and its own positive edges; σ(A, B) is infinite A B

Hierarchical preconditioner continued Solution: add a coarse mesh to support positive edges Now σ(A, B)  log 2 (n+1) Elimination/splitting analysis gives σ(B, A) = 1 Therefore condition number = O(log n) A B

Generalization to higher dimensions? Idea: mimic the 1D construction Generate a coarsening hierarchy with overlapping subdomains Use σ(B, A) = 1 as a constraint to choose weights Defines a preconditioner for regular or irregular problems Sublinear bounds on σ(A, B) for some 2D model problems. (Very preliminary) experiments show slow growth in κ (B -1 A) Model problem on triangular mesh n ICC(0): iterations New: iterations

Algebraic framework The support of B for A is σ(A, B) = min{ τ : x T (tB – A)x  0 for all x, all t  τ } In the SPD case, σ(A, B) = max{ λ : Ax = λ Bx} = λ max (A, B) If A, B are SPD then κ (B -1 A) = σ(A, B) · σ(B, A) [Boman/Hendrickson] If V·W=U, then σ(U·U T, V·V T )  ||W|| 2 2

Algebraic framework Algebraic framework [Boman/Hendrickson] Lemma: If V·W=U, then σ(U·U T, V·V T )  ||W|| 2 2 Proof: take t  ||W|| 2 2 = λ max ( W·W T ) = max x  0 { x T W·W T x / x T x } then x T (tI - W·W T ) x  0 for all x letting x = V T y gives y T (tV·V T - U·U T ) y  0 for all y recall σ(A, B) = min{ τ : x T (tB – A)x  0 for all x, all t  τ } thus σ(U·U T, V·V T )  t

A B - a 2 - b 2 -a 2 -c 2 -b 2 [ ] a 2 +b 2 - a 2 - b 2 - a 2 a 2 +c 2 - c 2 - b 2 - c 2 b 2 +c 2 [ ] a 2 +b 2 - a 2 - b 2 - a 2 a 2 - b 2 b 2 [ ] a b - a c - b - c [ ] a b - a c - b U V =VV T =UU T [ ] 1 - c / a 1 c / b /b W = x σ(A, B)  ||W|| 2 2  ||W||  x ||W|| 1 = (max row sum) x (max col sum)  (max congestion) x (max dilation)

Open problems I Other subgraph constructions for better bounds on ||W|| 2 2 ? For example [Boman], ||W|| 2 2  ||W|| F 2 = sum(w ij 2 ) = sum of (weighted) dilations, and [Alon, Karp, Peleg, West] show there exists a spanning tree with average weighted dilation exp(O((log n loglog n) 1/2 )) = o(n  ); this gives condition number O(n 1+  ) and solution time O(n 1.5+  ), compared to Vaidya O(n 1.75 ) with augmented spanning tree Is there a construction that minimizes ||W|| 2 2 directly?

Open problems II Make spanning tree methods more effective in 3D? Vaidya gives O(n 1.75 ) in general, O(n 1.2 ) in 2D Issue: 2D uses bounded excluded minors, not just separators Spanning tree methods for more general matrices? All SPD matrices? ([Boman, Chen, Hendrickson, Toledo]: different matroid for all diagonally dominant SPD matrices) Finite element problems? ([Boman]: Element-by-element preconditioner for bilinear quadrilateral elements) Analyze a multilevel method in general?

Complexity of linear solvers 2D3D Sparse Cholesky:O(n 1.5 )O(n 2 ) CG, exact arithmetic: O(n 2 ) CG, no precond: O(n 1.5 )O(n 1.33 ) CG, modified IC: O(n 1.25 )O(n 1.17 ) CG, support trees: O(n 1.20 )O(n 1.5+ ) Multigrid:O(n) n 1/2 n 1/3 Time to solve model problem (Poisson’s equation) on regular mesh

References M. Bern, J. Gilbert, B. Hendrickson, N. Nguyen, S. Toledo. Support-graph preconditioners. (Submitted for publication, 2001.) ftp://parcftp.xerox.com/gilbert/support-graph.ps K. Gremban, G. Miller, M. Zagha. Performance evaluation of a parallel preconditioner. IPPS D. Chen, S. Toledo. Implementation and evaluation of Vaidya’s preconditioners. Preconditioning (Submitted for publication, 2001.) E. Boman, B. Hendrickson. Support theory for preconditioning. (Submitted for publication, 2001.) E. Boman, D. Chen, B. Hendrickson, S. Toledo. Maximum-weight-basis preconditioners. (Submitted for publication, 2001.) Bruce Hendrickson’s support theory web page: