Iterative Methods and Combinatorial Preconditioners

Slides:



Advertisements
Similar presentations
05/11/2005 Carnegie Mellon School of Computer Science Aladdin Lamps 05 Combinatorial and algebraic tools for multigrid Yiannis Koutis Computer Science.
Advertisements

The Combinatorial Multigrid Solver Yiannis Koutis, Gary Miller Carnegie Mellon University TexPoint fonts used in EMF. Read the TexPoint manual before you.
Solving linear systems through nested dissection Noga Alon Tel Aviv University Raphael Yuster University of Haifa.
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
MATH 685/ CSI 700/ OR 682 Lecture Notes
Solving Linear Systems (Numerical Recipes, Chap 2)
Iterative methods TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A A A A.
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
Linear Algebraic Equations
Totally Unimodular Matrices Lecture 11: Feb 23 Simplex Algorithm Elliposid Algorithm.
Maverick Woo Iterative Methods and Combinatorial Preconditioners.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
System of Linear Equations Nattee Niparnan. LINEAR EQUATIONS.
CS 219: Sparse Matrix Algorithms
1 MAC 2103 Module 3 Determinants. 2 Rev.F09 Learning Objectives Upon completing this module, you should be able to: 1. Determine the minor, cofactor,
Elliptic PDEs and the Finite Difference Method
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Solution of Sparse Linear Systems
Linear Systems – Iterative methods
Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }
CS 290H Administrivia: May 14, 2008 Course project progress reports due next Wed 21 May. Reading in Saad (second edition): Sections
CS 290H 31 October and 2 November Support graph preconditioners Final projects: Read and present two related papers on a topic not covered in class Or,
CS 290H Lecture 15 GESP concluded Final presentations for survey projects next Tue and Thu 20-minute talk with at least 5 min for questions and discussion.
Solving Scalar Linear Systems A Little Theory For Jacobi Iteration
C&O 355 Lecture 19 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.
Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Maximum Norms & Nonnegative Matrices  Weighted maximum norm e.g.) x1x1 x2x2.
Introduction Wireless Ad-Hoc Network  Set of transceivers communicating by radio.
Linear Algebra Engineering Mathematics-I. Linear Systems in Two Unknowns Engineering Mathematics-I.
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.
Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.
7.3 Linear Systems of Equations. Gauss Elimination
MAT 322: LINEAR ALGEBRA.
Lap Chi Lau we will only use slides 4 to 19
7.7 Determinants. Cramer’s Rule
Hans Bodlaender, Marek Cygan and Stefan Kratsch
Topics in Algorithms Lap Chi Lau.
5 Systems of Linear Equations and Matrices
CS 290N / 219: Sparse Matrix Algorithms
Markov Chains Mixing Times Lecture 5
Solving Linear Systems Ax=b
Gauss-Siedel Method.
The minimum cost flow problem
CS 290H Administrivia: April 16, 2008
Matrices 3 1.
Routing: Distance Vector Algorithm
A Polynomial-time Tree Decomposition for Minimizing Congestion
Chapter 5. Optimal Matchings
Systems of First Order Linear Equations
Chapter 1 Systems of Linear Equations and Matrices
Dissertation for the degree of Philosophiae Doctor (PhD)
Degree and Eigenvector Centrality
Craig Schroeder October 26, 2004
RECORD. RECORD Gaussian Elimination: derived system back-substitution.
Iterative Aggregation Disaggregation
Instructor: Shengyu Zhang
Analysis of Algorithms
Matrix Martingales in Randomized Numerical Linear Algebra
3.5 Minimum Cuts in Undirected Graphs
Introduction Wireless Ad-Hoc Network
Numerical Analysis Lecture14.
CSE 541 – Numerical Methods
Applied Combinatorics, 4th Ed. Alan Tucker
Systems of Linear Equations and Matrices
Solving Linear Systems: Iterative Methods and Sparse Systems
Maths for Signals and Systems Linear Algebra in Engineering Lecture 6, Friday 21st October 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL.
Numerical Analysis Lecture11.
Linear Algebra Lecture 16.
Chapter 9 Graph algorithms
Ax = b Methods for Solution of the System of Equations (ReCap):
Presentation transcript:

Iterative Methods and Combinatorial Preconditioners 4/29/2019

This talk is not about…

Solving Symmetric Diagonally-Dominant Systems By Preconditioning Credits Solving Symmetric Diagonally-Dominant Systems By Preconditioning Bruce Maggs, Gary Miller, Ojas Parekh, R. Ravi, mw Combinatorial Preconditioners for Large, Sparse, Symmetric, Diagonally-Dominant Linear Systems Keith Gremban (CMU PhD 1996)

A b x = Linear Systems n by n matrix n by 1 vector known unknown A useful way to do matrix algebra in your head: Matrix-vector multiplication = Linear combination of matrix columns

Matrix-Vector Multiplication @ 1 2 3 A a b c = + Using BTAT = (AB)T, xTA should also be interpreted as a linear combination of the rows of A.

Strassen had a faster method to find A-1 How to Solve? A x = b Find A-1 Guess x repeatedly until we guess a solution Gaussian Elimination Strassen had a faster method to find A-1

Large Linear Systems in The Real World

Circuit Voltage Problem Given a resistive network and the net current flow at each terminal, find the voltage at each node. 2A 2A A node with an external connection is a terminal. Otherwise it’s a junction. 3S 2S 1S Conductance is the reciprocal of resistance. It’s unit is siemens (S). 1S 4A

Kirchoff’s Law of Current At each node, net current flowing at each node = 0. Consider v1. We have which after regrouping yields 2A 2A 2 + 3 ( v ¡ 1 ) = ; 3S v2 2S v1 v4 3 ( v 1 ¡ 2 ) + = : 1S 1S I=VC 4A v3

Summing Up 2A 2A 3S v2 2S v1 v4 1S 1S 4A v3

Solving 2A 2A 3S 2S 2V 2V 3V 1S 1S 4A 0V

Did I say “LARGE”? Imagine this being the power grid of America. 2A 2A v2 2S v1 v4 1S 1S 4A v3

Laplacians Given a weighted, undirected graph G = (V, E), we can represent it as a Laplacian matrix. 3 v2 2 v1 v4 1 1 v3

Laplacians Laplacians have many interesting properties, such as Diagonals ¸ 0 denotes total incident weights Off-diagonals < 0 denotes individual edge weights Row sum = 0 Symmetric 3 v2 2 v1 v4 1 1 v3

Net Current Flow Lemma Suppose an n by n matrix A is the Laplacian of a resistive network G with n nodes. If y is the n-vector specifying the voltage at each node of G, then Ay is the n-vector representing the net current flow at each node.

Power Dissipation Lemma Suppose an n by n matrix A is the Laplacian of a resistive network G with n nodes. If y is the n-vector specifying the voltage at each node of G, then yTAy is the total power dissipated by G.

Sparsity Laplacians arising in practice are usually sparse. The i-th row has (d+1) nonzeros if vi has d neighbors. 3 v2 2 v1 v4 1 1 v3

Sparse Matrix An n by n matrix is sparse when there are O(n) nonzeros. A reasonably-sized power grid has way more junctions and each junction has only a couple of neighbors. 3 v2 2 v1 v4 1 1 v3

Had Gauss owned a supercomputer… (Would he really work on Gaussian Elimination?)

A Model Problem Let G(x, y) and g(x, y) be continuous functions defined in R and S respectively, where R and S are respectively the region and the boundary of the unit square (as in the figure). R S 1

If g(x,y) = 0, this is called a Dirichlet boundary condition. A Model Problem We seek a function u(x, y) that satisfies Poisson’s equation in R and the boundary condition in S. @ 2 u x + y = G ( ; ) u ( x ; y ) = g R S 1 If g(x,y) = 0, this is called a Dirichlet boundary condition.

Discretization Imagine a uniform grid with a small spacing h. h 1

Five-Point Difference Replace the partial derivatives by difference quotients The Poisson’s equation now becomes @ 2 u = x s [ ( + h ; y ) ¡ ] @ 2 u = y s [ ( x ; + h ) ¡ ] 4 u ( x ; y ) ¡ + h = 2 G Exercise: Derive the 5-pt diff. eqt. from first principle (limit).

For each point in R The total number of equations is . 4 u ( x ; y ) ¡ + h = 2 G The total number of equations is . Now write them in the matrix form, we’ve got one BIG linear system to solve! ( 1 h ¡ ) 2 4 u ( x ; y ) ¡ + h = 2 G

An Example Consider u3,1, we have which can be rearranged to 4 u ( x ; y ) ¡ + h = 2 G Consider u3,1, we have which can be rearranged to 4 4 u ( 3 ; 1 ) ¡ 2 = G u32 u21 u31 u41 4 u ( 3 ; 1 ) ¡ 2 = G + u30 4

An Example Each row and column can have a maximum of 5 nonzeros. 4 u ( ; y ) ¡ + h = 2 G Each row and column can have a maximum of 5 nonzeros.

Sparse Matrix Again Really, it’s rare to see large dense matrices arising from applications.

Laplacian??? I showed you a system that is not quite Laplacian. We’ve got way too many boundary points in a 3x3 example.

Making It Laplacian We add a dummy variable and force it to zero. (How to force? Well, look at the rank of this matrix first…)

Sparse Matrix Representation A simple scheme An array of columns, where each column Aj is a linked-list of tuples (i, x).

Solving Sparse Systems Gaussian Elimination again? Let’s look at one elimination step.

Solving Sparse Systems Gaussian Elimination introduces fill.

Solving Sparse Systems Gaussian Elimination introduces fill.

Fill Of course it depends on the elimination order. Finding an elimination order with minimal fill is hopeless Garey and Johnson-GT46, Yannakakis SIAM JADM 1981 O(log n) Approximation Sudipto Guha, FOCS 2000 Nested Graph Dissection and Approximation Algorithms (n log n) lower bound on fill (Maverick still has not dug up the paper…)

A b x = When Fill Matters… Find A-1 Guess x repeatedly until we guessed a solution Gaussian Elimination

Inverse Of Sparse Matrices …are not necessarily sparse either! B B-1

A b x = And the winner is… Find A-1 Can we be so lucky? A x = b Find A-1 Guess x repeatedly until we guessed a solution Gaussian Elimination

Iterative Methods Checkpoint How large linear system actually arise in practice Why Gaussian Elimination may not be the way to go

The Basic Idea Start off with a guess x (0). Using x (i ) to compute x (i+1) until converge. We hope the process converges in a small number of iterations each iteration is efficient

The RF Method [Richardson, 1910] Residual x ( i + 1 ) = ¡ A b Domain A Range x x(i) b Test x(i+1) Ax(i) Correct

Why should it converge at all? x ( i + 1 ) = ¡ A b Domain Range x x(i) b Test x(i+1) Ax(i) Correct

It only converges when… x ( i + 1 ) = ¡ A b Theorem A first-order stationary iterative method converges iff x ( i + 1 ) = G k ½ ( G ) < 1 . (A) is the maximum absolute eigenvalue of A

Fate? A x = b Once we are given the system, we do not have any control on A and b. How do we guarantee even convergence?

Instead of dealing with A and b, we now deal with B-1A and B-1b. Preconditioning B - 1 A x = b Instead of dealing with A and b, we now deal with B-1A and B-1b. The word “preconditioning” originated with Turing in 1948, but his idea was slightly different.

Preconditioned RF x = ¡ B A b ( i + 1 ) = ¡ B - A b Since we may precompute B-1b by solving By = b, each iteration is dominated by computing B-1Ax(i), which is a multiplication step Ax(i) and a direct-solve step Bz = Ax(i). Hence a preconditioned iterative method is in fact a hybrid.

The Art of Preconditioning We have a lot of flexibility in choosing B. Solving Bz = Ax(i) must be fast B should approximate A well for a low iteration count I A Trivial What’s the point? B

Classics x = ¡ B A b Jacobi ( i + 1 ) = ¡ B - A b Jacobi Let D be the diagonal sub-matrix of A. Pick B = D. Gauss-Seidel Let L be the lower triangular part of A w/ zero diagonals Pick B = L + D.

“Combinatorial” We choose to measure how well B approximates A by comparing combinatorial properties of (the graphs represented by) A and B. Hence the term “Combinatorial Preconditioner”.

Questions?

Graphs as Matrices

Edge Vertex Incidence Matrix Given an undirected graph G = (V, E), let  be a |E| £ |V | matrix of {-1, 0, 1}. For each edge (u, v), set e,u to -1 and e,v to 1. Other entries are all zeros.

Edge Vertex Incidence Matrix b c d e f g h i j a b c d e f g h i j

Weighted Graphs 3 6 2 1 8 9 4 5 a b c d e f g h i j Let W be an |E| £ |E| diagonal matrix where We,e is the weight of the edge e.

Laplacian The Laplacian of G is defined to be TW . 3 6 2 1 8 9 4 5 a j The Laplacian of G is defined to be TW .

Properties of Laplacians 3 6 2 1 8 9 4 5 a b c d e f g h i j Let L = TW . “Prove by example”

Properties of Laplacians 3 6 2 1 8 9 4 5 a b c d e f g h i j Let L = TW . “Prove by example”

Properties of Laplacians A matrix A is Positive SemiDefinite if Since L = TW , it’s easy to see that for all x 8 x ; T A ¸ : x T ( ¡ W ) = 1 2 ¸ :

Laplacian The Laplacian of G is defined to be TW . a b c d e f g h i j The Laplacian of G is defined to be TW .

Graph Embedding A Primer

Graph Embedding Vertex in G  Vertex in H Edge in G  Path in H Guest Host a b c d e f g h i j a b e c f g h i j d

Dilation For each edge e in G, define dil(e) to be the number of edges in its corresponding path in H. Guest Host a b c d e f g h i j a b e c f g h i j d

Congestion For each edge e in H, define cong(e) to be the number of embedding paths that uses e. Guest Host a b c d e f g h i j a b e c f g h i j d

Support Theory

Disclaimer The presentation to follow is only “essentially correct”.

Support Definition The support required by a matrix B for a matrix A, both n by n in size, is defined as ¾ ( A = B ) : m i n f ¿ 2 R j 8 x ; T ¡ ¸ g (A / B) Think of B supporting A at the bottom.

Support With Laplacians Life is Good when the matrices are Laplacians. Remember the resistive circuit analogy?

Power Dissipation Lemma Suppose an n by n matrix A is the Laplacian of a resistive network G with n nodes. If y is the n-vector specifying the voltage at each node of G, then yTAy is the total power dissipated by G.

Circuit-Speak Read this loud in circuit-speak: ¾ ( A = B ) : m i n f ¿ 2 R j 8 x ; T ¡ ¸ g Read this loud in circuit-speak: “The support for A by B is the minimum number  so that for all possible voltage settings,  copies of B burn at least as much energy as one copy of A.”

Congestion-Dilation Lemma Given an embedding from G to H,

Transitivity Pop Quiz For Laplacians, prove this in circuit-speak. ¾ ( = C ) · B ¢ Pop Quiz For Laplacians, prove this in circuit-speak.

Generalized Condition Number Definition The generalized condition number of a pair of PSD matrices is A is Positive Semi-definite iff 8x, xTAx ¸ 0.

Preconditioned Conjugate Gradient Solving the system Ax = b using PCG with preconditioner B requires at most iterations to find a solution such that Convergence rate is dependent on the actual iterative method used.

Support Trees

Information Flow ¡ A b x = In many iterative methods, the only operation using A directly is to compute Ax(i) in each iteration. Imagine each node is an agent maintaining its value. The update formula specifies how each agent should update its value for round (i+1) given all the values in round i. x ( i + 1 ) = ¡ A b

The Problem With Multiplication Only neighbors can “communicate” in a multiplication, which happens once per iteration. 2A 3S 4A 2S 1S v1 v2 v4 v3

Diameter As A Natural Lower Bound In general, for a node to settle on its final value, it needs to “know” at least the initial values of the other nodes. 2A 3S 4A 2S 1S v1 v2 v4 v3 Diameter is the maximum shortest path distance between any pair of nodes.

Preconditioning As Shortcutting x ( i + 1 ) = ¡ B - A b By picking B carefully, we can introduce shortcuts for faster communication. But is it easy to find shortcuts in a sparse graph to reduce its diameter? 2A 3S 4A 2S 1S v1 v2 v4 v3

Square Mesh Let’s pick the complete graph induced on all the mesh points. Mesh: O(n) edges Complete graph: O(n2) edges

Bang! So exactly how do we propose to solve a dense n by n system faster than a sparse one? B can have at most O(n) edges, i.e., sparse…

Support Tree Build a Steiner tree to introduce shortcuts! If we pick a balanced tree, no nodes will be farther than O(log n) hops away. Need to specify weights on the tree edges 2A 3S 4A 2S 1S v2 v4 v3 v1

Mixing Speed The speed of communication is proportional to the corresponding coefficients on the paths between nodes. 2A 3S 4A 2S 1S v1 v2 v4 v3

Setting Weights The nodes should be able to talk at least as fast as they could without shortcuts. How about setting all the weights to 1? ? ? 2A 3S 4A 2S 1S v2 v4 v3 ? ? ? v1 ?

Recall PCG’s Convergence Rate Solving the system Ax = b using PCG with preconditioner B requires at most iterations to find a solution such that

Size Matters How big is the preconditioner matrix B? (2n-1) by (2n-1) The major contribution (among another) of our paper is deal with this disparity in size. 2A 3S 4A 2S 1S v2 v4 v3 v1

The Search Is Hard Finding the “right” preconditioner is really a tradeoff Solving Bz = Ax(i) must be fast B should approximate A well for a low iteration count It could very well be harder than we think.

How to Deal With Steiner Nodes?

The Trouble of Steiner Nodes Computation Definitions, e.g., x ( i + 1 ) = ¡ B - A b ¾ ( B = A ) : m i n f ¿ 2 R j 8 x ; T ¡ ¸ g

Generalized Support Let where W is n by n. = µ T U W ¶ Let where W is n by n. Then (B/A) is defined to be where y = -T -1Ux. m i n f ¿ 2 R j 8 x ; T A ¸ µ y ¶ B g

Circuit-Speak Read this loud in circuit-speak: ¾ ( B = A ) : m i n f ¿ 2 R j 8 x ; T ¸ µ y ¶ g Read this loud in circuit-speak: “The support for B by A is the minimum number  so that for all possible voltage settings at the terminals,  copies of A burn at least as much energy as one copy of B.”

Thomson’s Principle ¾ ( B = A ) : m i n f ¿ 2 R j 8 x ; T ¸ µ y ¶ g Fix the voltages at the terminals. The voltages at the junctions will be set such that the total power dissipation in the circuit is minimized. 2V 5V 1V 3V

Racke’s Decomposition Tree

Laminar Decomposition A laminar decomposition naturally defines a tree. a b c d e f g h i j a b e c f g h i j d

Racke, FOCS 2002 Given a graph G of n nodes, there exists a laminar decomposition tree T with all the “right” properties as a preconditioner for G. Except his advisor didn’t tell him about this…

For More Details Our paper is available online http://www.cs.cmu.edu/~maverick/ Our contributions Analysis of Racke’s decomposition tree as a preconditioner Provided tools for reasoning support between Laplacians of different dimensions

The Search Is NOT Over Future Directions Racke’s tree can take exponential time to find  Many recent improvements, but not quite there yet Insisting on balanced tree can hurt in many easy cases

Q&A

Color Fidelity Test

Junk Yard

How’s this related to separators? Still needs to incorporate slides from the talk on Racke’s tree. A naïve separator tree won’t work.

Previous Work