Support-Graph Preconditioning John R. Gilbert MIT and U. C. Santa Barbara coauthors: Marshall Bern, Bruce Hendrickson, Nhat Nguyen, Sivan Toledo authors.

Support-Graph Preconditioning John R. Gilbert MIT and U. C. Santa Barbara coauthors: Marshall Bern, Bruce Hendrickson, Nhat Nguyen, Sivan Toledo authors of other work surveyed: Erik Boman, Doron Chen, Keith Gremban, Bruce Hendrickson, Gary Miller, Sivan Toledo, Pravin Vaidya, Marco Zagha

Outline The problem: Preconditioning linear systems Support graphs: Vaidya’s algorithm Analysis of two preconditioners New directions and open problems

Systems of Linear Equations: Ax = b A is large & sparse say n = 10 5 to 10 8, # nonzeros = O(n) Physical setting sometimes implies G(A) has separators O(n 1/2 ) in 2D, O(n 2/3 ) in 3D Here: A is symmetric and positive (semi) definite Direct methods: A=LU Iterative methods: y (k+1) = Ay (k)

Graphs and Sparse Matrices Graphs and Sparse Matrices [Parter, … ] 10 1 3 2 4 5 6 7 8 9 1 3 2 4 5 6 7 8 9 G(A) G + (A) [chordal] Symmetric Gaussian elimination: for j = 1 to n add edges between j’s higher-numbered neighbors Fill: new nonzeros in factor

Fill-reducing matrix permutations Theory: approx optimal separators => approx optimal fill and flop count Orderings: nested dissection, minimum degree, hybrids Graph partitioning: spectral, geometric, multilevel

Conjugate Gradients: an iterative method Each step does one matrix-vector multiplication, O(#nonzeros) work In exact arithmetic, CG converges in n steps (completely unrealistic) Condition number κ (A) = ||A|| 2 ||A -1 || 2 = λ max (A) / λ min (A) CG needs O( κ 1/2 (A)) iterations to “solve” Ax=b. Preconditioner B: Solve B -1 Ax=B -1 b instead of Ax=b Want κ (B -1 A) to be small Want By=c to be easily solved

Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0 = 0, r 0 = b, p 0 = r 0 for k = 1, 2, 3,... α k = (r T k-1 r k-1 ) / (p T k-1 Ap k-1 ) step length x k = x k-1 + α k p k-1 approx solution r k = r k-1 – α k Ap k-1 residual β k = (r T k r k ) / (r T k-1 r k-1 ) improvement p k = r k + β k p k-1 search direction

Conjugate gradient: Convergence In exact arithmetic, CG converges in n steps (completely unrealistic!!) Accuracy after k steps of CG is related to: consider polynomials of degree k that are equal to 1 at 0. how small can such a polynomial be at all the eigenvalues of A? Thus, eigenvalues close together are good. Condition number: κ (A) = ||A|| 2 ||A -1 || 2 = λ max (A) / λ min (A) Residual is reduced by a constant factor by O(κ 1/2 (A)) iterations of CG.

Preconditioners Suppose you had a matrix B such that: (1) condition number κ (B -1 A) is small (2) By = z is easy to solve Then you could solve (B -1 A)x = B -1 b instead of Ax = b (actually (B -1/2 AB -1/2 ) B 1/2 x = B -1/2 b, but never mind) B = A is great for (1), not for (2) B = I is great for (2), not for (1)

Incomplete Cholesky factorization (IC, ILU) Compute factors of A by Gaussian elimination, but ignore fill Preconditioner B = R T R  A, not formed explicitly Compute B -1 z by triangular solves (in time nnz(A)) Total storage is O(nnz(A)), static data structure Either symmetric (IC) or nonsymmetric (ILU) x ARTRT R

Incomplete Cholesky and ILU: Variants Allow one or more “levels of fill” unpredictable storage requirements Allow fill whose magnitude exceeds a “drop tolerance” may get better approximate factors than levels of fill unpredictable storage requirements choice of tolerance is ad hoc Partial pivoting (for nonsymmetric A) “Modified ILU” (MIC): Add dropped fill to diagonal of U or R A and R T R have same row sums good in some PDE contexts 2 3 1 4 2 3 1 4

Incomplete Cholesky and ILU: Issues Choice of parameters good: smooth transition from iterative to direct methods bad: very ad hoc, problem-dependent tradeoff: time per iteration (more fill => more time) vs # of iterations (more fill => fewer iters) Effectiveness condition number usually improves (only) by constant factor (except MIC for some problems from PDEs) still, often good when tuned for a particular class of problems Parallelism Triangular solves are not very parallel Reordering for parallel triangular solve by graph coloring

Sparse approximate inverses Compute B -1  A explicitly Minimize || B -1 A – I || F (in parallel, by columns) Variants: factored form of B -1, more fill,.. Good: very parallel Bad: effectiveness varies widely AB -1

Support Graph Preconditioning +: New analytic tools, some new preconditioners +: Can use existing direct-methods software -: Current theory and techniques limited Define a preconditioner B for matrix A Explicitly compute the factorization B = LU Choose nonzero structure of B to make factoring cheap (using combinatorial tools from direct methods) Prove bounds on condition number using both algebraic and combinatorial tools

Spanning Tree Preconditioner Spanning Tree Preconditioner [Vaidya] A is symmetric positive definite with negative off-diagonal nzs B is a maximum-weight spanning tree for A (with diagonal modified to preserve row sums) factor B in O(n) space and O(n) time applying the preconditioner costs O(n) time per iteration G(A) G(B)

Spanning Tree Preconditioner Spanning Tree Preconditioner [Vaidya] support each edge of A by a path in B dilation(A edge) = length of supporting path in B congestion(B edge) = # of supported A edges p = max congestion, q = max dilation condition number κ (B -1 A) bounded by p·q (at most O(n 2 )) G(A) G(B)

Spanning Tree Preconditioner Spanning Tree Preconditioner [Vaidya] can improve congestion and dilation by adding a few strategically chosen edges to B cost of factor+solve is O(n 1.75 ), or O(n 1.2 ) if A is planar in experiments by Chen & Toledo, often better than drop-tolerance MIC for 2D problems, but not for 3D. G(A) G(B)

Support Graphs Support Graphs [after Gremban/Miller/Zagha] Intuition from resistive networks: How much must you amplify B to provide the conductivity of A? The support of B for A is σ(A, B) = min{ τ : x T (tB – A)x  0 for all x, all t  τ } In the SPD case, σ(A, B) = max{ λ : Ax = λ Bx} = λ max (A, B) Theorem: If A, B are SPD then κ (B -1 A) = σ(A, B) · σ(B, A)

Splitting and Congestion/Dilation Lemmas Split A = A 1 + A 2 + ··· + A k and B = B 1 + B 2 + ··· + B k A i and B i are positive semidefinite Typically they correspond to pieces of the graphs of A and B (edge, path, small subgraph) Lemma: σ(A, B)  max i {σ(A i, B i )} Lemma: σ(edge, path)  (worst weight ratio) · (path length) In the MST case: A i is an edge and B i is a path, to give σ(A, B)  p·q B i is an edge and A i is the same edge, to give σ(B, A)  1

Support-graph analysis of modified incomplete Cholesky B has positive (dotted) edges that cancel fill B has same row sums as A Strategy: Use the negative edges of B to support both the negative edges of A and the positive edges of B. -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 A A = 2D model Poisson problem.5 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 B B = MIC preconditioner for A

Supporting positive edges of B Every dotted (positive) edge in B is supported by two paths in B Each solid edge of B supports one or two dotted edges Tune fractions to support each dotted edge exactly 1/(2  n – 2) of each solid edge is left over to support an edge of A

Analysis of MIC: Summary Each edge of A is supported by the leftover 1/(2  n – 2) fraction of the same edge of B. Therefore σ(A, B)  2  n – 2 Easy to show σ(B, A)  1 For this 2D model problem, condition number is O(n 1/2 ) Similar argument in 3D gives condition number O(n 1/3 ) or O(n 2/3 ) (depending on boundary conditions)

Support-graph analysis for better preconditioners? For model problems, the preconditioners analyzed so far are not as efficient as multigrid, domain decomposition, etc. Gremban/Miller: a hierarchical support-graph preconditioner, but condition number still not polylogarithmic. We analyze a multilevel-diagonal-scaling-like preconditioner in 1D...... but we haven’t proved tight bounds in higher dimensions.

Hierarchical preconditioner (1D model problem) Good support in both directions: σ(A, B) = σ(B, A) = O(1) But, B is a mesh => expensive to factor and to apply A B -1 -1 -1 -1 -1 -1 -2-2 -1 -1

Hierarchical preconditioner continued Drop fill in factor of B (i.e. add positive edges) => factoring and preconditioning are cheap But B cannot support both A and its own positive edges; σ(A, B) is infinite A B -1 -1 -1 -1 -1 -1 -2-2 -1 -1.5

Hierarchical preconditioner continued Solution: add a coarse mesh to support positive edges Now σ(A, B)  log 2 (n+1) Elimination/splitting analysis gives σ(B, A) = 1 Therefore condition number = O(log n) A B -1 -1 -1 -1 -1 -1 -2-2 -1 -1.5 -.5

Generalization to higher dimensions? Idea: mimic the 1D construction Generate a coarsening hierarchy with overlapping subdomains Use σ(B, A) = 1 as a constraint to choose weights Defines a preconditioner for regular or irregular problems Sublinear bounds on σ(A, B) for some 2D model problems. (Very preliminary) experiments show slow growth in κ (B -1 A) Model problem on triangular mesh n4515356121454753 ICC(0): iterations812193345 New: iterations1415 1720

Algebraic framework The support of B for A is σ(A, B) = min{ τ : x T (tB – A)x  0 for all x, all t  τ } In the SPD case, σ(A, B) = max{ λ : Ax = λ Bx} = λ max (A, B) If A, B are SPD then κ (B -1 A) = σ(A, B) · σ(B, A) [Boman/Hendrickson] If V·W=U, then σ(U·U T, V·V T )  ||W|| 2 2

Algebraic framework Algebraic framework [Boman/Hendrickson] Lemma: If V·W=U, then σ(U·U T, V·V T )  ||W|| 2 2 Proof: take t  ||W|| 2 2 = λ max ( W·W T ) = max x  0 { x T W·W T x / x T x } then x T (tI - W·W T ) x  0 for all x letting x = V T y gives y T (tV·V T - U·U T ) y  0 for all y recall σ(A, B) = min{ τ : x T (tB – A)x  0 for all x, all t  τ } thus σ(U·U T, V·V T )  t

A B - a 2 - b 2 -a 2 -c 2 -b 2 [ ] a 2 +b 2 - a 2 - b 2 - a 2 a 2 +c 2 - c 2 - b 2 - c 2 b 2 +c 2 [ ] a 2 +b 2 - a 2 - b 2 - a 2 a 2 - b 2 b 2 [ ] a b - a c - b - c [ ] a b - a c - b U V =VV T =UU T [ ] 1 - c / a 1 c / b /b W = x σ(A, B)  ||W|| 2 2  ||W||  x ||W|| 1 = (max row sum) x (max col sum)  (max congestion) x (max dilation)

Open problems I Other subgraph constructions for better bounds on ||W|| 2 2 ? For example [Boman], ||W|| 2 2  ||W|| F 2 = sum(w ij 2 ) = sum of (weighted) dilations, and [Alon, Karp, Peleg, West] show there exists a spanning tree with average weighted dilation exp(O((log n loglog n) 1/2 )) = o(n  ); this gives condition number O(n 1+  ) and solution time O(n 1.5+  ), compared to Vaidya O(n 1.75 ) with augmented spanning tree Is there a construction that minimizes ||W|| 2 2 directly?

Open problems II Make spanning tree methods more effective in 3D? Vaidya gives O(n 1.75 ) in general, O(n 1.2 ) in 2D Issue: 2D uses bounded excluded minors, not just separators Spanning tree methods for more general matrices? All SPD matrices? ([Boman, Chen, Hendrickson, Toledo]: different matroid for all diagonally dominant SPD matrices) Finite element problems? ([Boman]: Element-by-element preconditioner for bilinear quadrilateral elements) Analyze a multilevel method in general?

Complexity of linear solvers 2D3D Sparse Cholesky:O(n 1.5 )O(n 2 ) CG, exact arithmetic: O(n 2 ) CG, no precond: O(n 1.5 )O(n 1.33 ) CG, modified IC: O(n 1.25 )O(n 1.17 ) CG, support trees: O(n 1.20 )O(n 1.5+ ) Multigrid:O(n) n 1/2 n 1/3 Time to solve model problem (Poisson’s equation) on regular mesh

References M. Bern, J. Gilbert, B. Hendrickson, N. Nguyen, S. Toledo. Support-graph preconditioners. (Submitted for publication, 2001.) ftp://parcftp.xerox.com/gilbert/support-graph.ps K. Gremban, G. Miller, M. Zagha. Performance evaluation of a parallel preconditioner. IPPS 1995. D. Chen, S. Toledo. Implementation and evaluation of Vaidya’s preconditioners. Preconditioning 2001. (Submitted for publication, 2001.) E. Boman, B. Hendrickson. Support theory for preconditioning. (Submitted for publication, 2001.) E. Boman, D. Chen, B. Hendrickson, S. Toledo. Maximum-weight-basis preconditioners. (Submitted for publication, 2001.) Bruce Hendrickson’s support theory web page: http://www.cs.sandia.gov/~bahendr/support.html

Support-Graph Preconditioning John R. Gilbert MIT and U. C. Santa Barbara coauthors: Marshall Bern, Bruce Hendrickson, Nhat Nguyen, Sivan Toledo authors.

Similar presentations

Presentation on theme: "Support-Graph Preconditioning John R. Gilbert MIT and U. C. Santa Barbara coauthors: Marshall Bern, Bruce Hendrickson, Nhat Nguyen, Sivan Toledo authors."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Support-Graph Preconditioning John R. Gilbert MIT and U. C. Santa Barbara coauthors: Marshall Bern, Bruce Hendrickson, Nhat Nguyen, Sivan Toledo authors.

Similar presentations

Presentation on theme: "Support-Graph Preconditioning John R. Gilbert MIT and U. C. Santa Barbara coauthors: Marshall Bern, Bruce Hendrickson, Nhat Nguyen, Sivan Toledo authors."— Presentation transcript:

Similar presentations

About project

Feedback