Sparsified Matrix Algorithms for Graph Laplacians Richard Peng Georgia Tech.

Slides:



Advertisements
Similar presentations
05/11/2005 Carnegie Mellon School of Computer Science Aladdin Lamps 05 Combinatorial and algebraic tools for multigrid Yiannis Koutis Computer Science.
Advertisements

Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.
Lecture 19: Parallel Algorithms
The Combinatorial Multigrid Solver Yiannis Koutis, Gary Miller Carnegie Mellon University TexPoint fonts used in EMF. Read the TexPoint manual before you.
Uniform Sampling for Matrix Approximation Michael Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, Aaron Sidford M.I.T.
Solving linear systems through nested dissection Noga Alon Tel Aviv University Raphael Yuster University of Haifa.
Siddharth Choudhary.  Refines a visual reconstruction to produce jointly optimal 3D structure and viewing parameters  ‘bundle’ refers to the bundle.
An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)
Algorithm Design Using Spectral Graph Theory Richard Peng Joint Work with Guy Blelloch, HuiHan Chin, Anupam Gupta, Jon Kelner, Yiannis Koutis, Aleksander.
Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.
Numerical Algorithms Matrix multiplication
SDD Solvers: Bridging theory and practice Yiannis Koutis University of Puerto Rico, Rio Piedras joint with Gary Miller, Richard Peng Carnegie Mellon University.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Chapter 3 The Greedy Method 3.
Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey U. Waterloo Department of Combinatorics and Optimization Joint work with Isaac.
Graph Sparsifiers: A Survey Nick Harvey Based on work by: Batson, Benczur, de Carli Silva, Fung, Hariharan, Harvey, Karger, Panigrahi, Sato, Spielman,
Graph Sparsifiers: A Survey Nick Harvey UBC Based on work by: Batson, Benczur, de Carli Silva, Fung, Hariharan, Harvey, Karger, Panigrahi, Sato, Spielman,
Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey University of Waterloo Department of Combinatorics and Optimization Joint.
Sampling from Gaussian Graphical Models via Spectral Sparsification Richard Peng M.I.T. Joint work with Dehua Cheng, Yu Cheng, Yan Liu and Shanghua Teng.
Sampling: an Algorithmic Perspective Richard Peng M.I.T.
1cs542g-term Sparse matrix data structure  Typically either Compressed Sparse Row (CSR) or Compressed Sparse Column (CSC) Informally “ia-ja” format.
Totally Unimodular Matrices Lecture 11: Feb 23 Simplex Algorithm Elliposid Algorithm.
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
Randomized Algorithms and Randomized Rounding Lecture 21: April 13 G n 2 leaves
Princeton University COS 423 Theory of Algorithms Spring 2002 Kevin Wayne Reductions Some of these lecture slides are adapted from CLRS Chapter 31.5 and.
The Landscape of Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage (if sparse) More Robust.
CS240A: Conjugate Gradients and the Model Problem.
Randomness in Computation and Communication Part 1: Randomized algorithms Lap Chi Lau CSE CUHK.
1 Introduction to Approximation Algorithms Lecture 15: Mar 5.
Yiannis Koutis , U of Puerto Rico, Rio Piedras
Fast, Randomized Algorithms for Partitioning, Sparsification, and
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
Graph Sparsifiers Nick Harvey University of British Columbia Based on joint work with Isaac Fung, and independent work of Ramesh Hariharan & Debmalya Panigrahi.
Complexity of direct methods n 1/2 n 1/3 2D3D Space (fill): O(n log n)O(n 4/3 ) Time (flops): O(n 3/2 )O(n 2 ) Time and space to solve any problem on any.
Institute for Advanced Study, April Sushant Sachdeva Princeton University Joint work with Lorenzo Orecchia, Nisheeth K. Vishnoi Linear Time Graph.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Matrix Sparsification. Problem Statement Reduce the number of 1s in a matrix.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Graph Sparsifiers Nick Harvey Joint work with Isaac Fung TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Solution of Sparse Linear Systems
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
Multifaceted Algorithm Design Richard Peng M.I.T..
CS 290H Administrivia: May 14, 2008 Course project progress reports due next Wed 21 May. Reading in Saad (second edition): Sections
Graph Partitioning using Single Commodity Flows
Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016.
Sketching complexity of graph cuts Alexandr Andoni joint work with: Robi Krauthgamer, David Woodruff.
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
Algorithm Frameworks Using Adaptive Sampling Richard Peng Georgia Tech.
Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.
Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.
High Performance Linear System Solvers with Focus on Graph Laplacians
Richard Peng Georgia Tech Michael Cohen Jon Kelner John Peebles
Lap Chi Lau we will only use slides 4 to 19
Resparsification of Graphs
Topics in Algorithms Lap Chi Lau.
Efficient methods for finding low-stretch spanning trees
Solving Linear Systems Ax=b
CS 290H Administrivia: April 16, 2008
Parallel Algorithm Design using Spectral Graph Theory
Nearly-Linear Time Algorithms for Markov Chains and New Spectral Primitives for Directed Graphs Richard Peng Georgia Tech.
Density Independent Algorithms for Sparsifying
Dissertation for the degree of Philosophiae Doctor (PhD)
Segmentation Graph-Theoretic Clustering.
Matrix Martingales in Randomized Numerical Linear Algebra
CS 290H Lecture 3 Fill: bounds and heuristics
A Numerical Analysis Approach to Convex Optimization
Read GLN sections 6.1 through 6.4.
Optimization on Graphs
Presentation transcript:

Sparsified Matrix Algorithms for Graph Laplacians Richard Peng Georgia Tech

OUTLINE (Structured) Linear Systems Iterative and Direct Methods (Graph) Sparsification Sparsified Squaring Speeding up Gaussian Elimination

GRAPH LAPLACIANS Matrices that correspond to undirected graphs Coordinates  vertices Non-zeros  edges This talk: weighted, undirected graphs, and symmetric PSD matrices

THIS TALK Provably efficient algorithms for graph Laplacians, with focus on solving linear systems Why linear systems? Primitive in many graph algorithms Simplest convex optimization problem Algorithms on them often generalize

THE LAPLACIAN PARADIGM Directly related : Elliptic systems Few iterations : Eigenvectors, Heat kernels Many iterations / modify algorithm Graph problems Image processing

SCIENTIFIC COMPUTING Reducible to SDD systems, M-matrices [BHV `04, DS `07] PDEs, trusses [CFMNPW`14]: Helmholtz on meshes

DATA ANALYSIS [ZGL `03][ZHS `05][CCLPT `15]: inference / sampling on graphical models [KMST `09, CMMP `13]: image segmentation / denoising

[Tutte `62] Planar graph embeddings in 2 solves [KMP `09][MST `15] random spanning trees, Õ(m 4/3 ) [DS `08, LS`13] mincost / lossy flows, Õ(mn 1/2 ) GRAPHS Õ hides factors of log c n

[CKMST `11] [Sherman `13][KLOS `13][P `16]: approx. undirected maxflow, Õ(m 4/3 )  Õ(m 1+ε )  Õ(m) [OSV `12]: balanced cuts, heat kernel walks, Õ(m) [Madry `13]: bipartite matching in Õ(m 10/7 ) [CMSV `16]: mincost matching and negative length shortest paths in Õ(m 10/7 ) GRAPHS, FASTER

WHY WORST CASE ANALYSIS? The Laplacian paradigm of designing graph algorithms Optimization Problem Linear System Solver Sequence of (adpatively) generated linear systems Main difficulties: Widely varying weights Multi-scale behavior

INSTANCE: ISOTONIC REGRESSION [Kyng-Rao-Sachdeva `15]: /blob/master/README.mdhttps://github.com/sachdevasushant/Isotonic /blob/master/README.md : …we suggest rerunning the program a few times and/or using a different solver. An alternate solver based on incomplete Cholesky factorization is provided with the code. Numbers thanks to Kevin Deweese (UCSB)

OUTLINE (Structured) Linear Systems Iterative and Direct Methods (Graph) Sparsification Sparsified Squaring Speeding up Gaussian Elimination

LINEAR SYSTEM SOLVERS [~0] Gaussian Elimination: O(n 3 ) [Strassen `69] O(n 2.8 ) [Coppersmith-Winograd `90] O(n ) [Stothers `10] O(n ) [Vassilevska Williams`11] O(n ) [Hestenes-Stiefel `52] Conjugate gradient: O(nm) (?)

APPROACHES DirectIterative Unit stepModifying entryMatrix-vector multiply Main goalSimplify systemExplored rank space Cost per stepO(1)O(m) #StepsO(n )O(n) TotalO(n )O(nm) Performances comparable on medium sized instances: m = 10 5 takes ~ 1 second

EXTREME INSTANCES Highly connected, need global steps Long paths / tree, need many steps Solvers must handle both simultaneously Each easy on their own: Iterative methodDirect Method

SIMPLIFICATION Adjust/rescale so diagonal = I Add to diagonal to make full rank L = I – A A: Random walk

ITERATIVE METHODS Division with multiplication: (1 – a) -1 = 1 + a + a 2 + a 3 … Spectral theorem: this works for symmetric PSD matrices matrices well-approximated by their diagonal blocks are easy to solve If |a| ≤ ρ, κ = (1-ρ) -1 terms give good approximation to (1 – a) -1 Matrix version: L -1 = I + A + A 2 + A 3 +…

LOWER BOUND FOR ITERATIVE METHODS Exists G (e.g. cycle) that require Ω(n) steps Graph theoretic interpretation: each term = 1 step walk A diameter b bAbA2bA2b Closely related to Smoothness 1/2 lower bound for # of gradient steps

( I – A ) -1 = I + A + A 2 + A 3 + …. = ( I + A ) ( I + A 2 ) ( I + A 4 )… DEGREE N  N OPERATIONS? Combinatorial view: A : step of random walk I – A 2 : Laplacian of the 2 step random walk Dense matrix! Repeated squaring: A 16 = (((( A 2 ) 2 ) 2 ) 2, 4 operations O(logn) terms ok Similar to multi-level methods Still a graph!

OUTLINE (Structured) Linear Systems Iterative and Direct Methods (Graph) Sparsification Sparsified Squaring Speeding up Gaussian Elimination

GRAPH SPARSIFICATION Any undirected graph can be approximated by an undirected graph with [ST `04]: O(nlog O(1) n) edges [BSS`09]: O(n) edges

NOTION OF APPROXIMATION Same as small relative condition number, reflexive, composes naturally A ≈ ε B if both exp(ε) A – B and exp(ε) B – A are P.S.D. Necessary condition: all cuts similar ≈ ≈

HOW? Simplest explanation (so far): [SS`08] importance sampling on the edges Keep edge e with probability p e, rescale if kept to maintain expectation

HOW TO SAMPLE? Widely used: uniform sampling Works well when data is uniform e.g. complete graph Problem: long path, removing any edge changes connectivity (can also have both in one graph)

THE `RIGHT’ PROBABILITIES Path + clique: 1 1/n τ : L 2 statistical leverage scores τ e = trace( L + L e ) Interpretation: effective resistance [Rudelson, Vershynin `07], [Tropp `12]: p e ≥ τ e O( logn) gives good sparsifier.

COMPUTING SAMPLING PROBABILITIES τ : leverage scores / effective resistance τ e = trace( M + M e ) [BSS`09][LS `15]: potential functions [ST `04][OV `11]: spectral partitioning [SS`08][CLMMPS`15]: Gaussian projections [Koutis `14]: spanners / low diameter partitions

OUTLINE (Structured) Linear Systems Iterative and Direct Methods (Graph) Sparsification Sparsified Squaring Speeding up Gaussian Elimination

SQUARING Sparsifiers (plus a few tricks) gives for any A, A ’ s.t. I – A ’ ≈ I – A 2 Plan: build algorithms around sparsifiers and identities involving I – A and I – A 2

SIMILAR TO ConnectivityParallel Solver Iteration A i+1 ≈ A i 2 Until | A d | small Size ReductionLow degreeSparse graph MethodDerandomizedRandomized Solution transferConnectivity ( I - A i )x i = b i Multiscale methods NC algorithm for shortest path Logspace connectivity: [Reingold `02] Deterministic squaring: [RV`05]

APPROXIMATE INVERSE CHAIN I - A 1 ≈ ε I – A 2 I – A 2 ≈ ε I – A 1 2 … I – A i ≈ ε I – A i-1 2 I - A d ≈ I I - A 0 I - A d ≈ I Convergence: I – A i+1 ≈ ε I – A i 2 implies | A i+1 |<| A i | 1.5 | A i | κ < 0.8: can stop at d = O(logκ)

ISSUE: ERROR AT EACH STEP Only have 1 – a i+1 ≈ 1 – a i 2 Solution: apply one at a time (1 – a i ) -1 = (1 + a i )(1 – a i 2 ) -1 ≈ (1 + a i )(1 – a i+1 ) -1 Induction: z i+1 ≈ (1 – a i+1 ) -1 I - A 0 I - A d ≈ I z i = (1 + a i ) z i+1 ≈ (1 + a i )(1 – a i+1 ) -1 ≈(1 – a i ) -1 Need to invoke: (1 – a) -1 = (1 + a) (1 + a 2 ) (1 + a 4 )… z d = (1 – a d ) -1 ≈ 1

ISSUE: MATRIX COMPOSITION In matrix setting, replacements by approximations need to be symmetric: Z ≈ Z ’  U T ZU ≈ U T Z ’ U Terms around Z ’ needs to be symmetric ( I – A i ) Z is not symmetric  Solution 1 ([PS `14]): (1 – a) -1 =1/2 ( 1 + (1 + a)(1 – a 2 ) -1 (1 + a))

ALGORITHM Z ’ ≈ ( 1 – A 2 ) -1 ( I – A ) -1 = ½ [ I +( 1 + A ) ( I – A 2 ) -1 ( 1 + A )] Composition: Z ≈ ( I – A ) -1 Total error = dε= O(logκε) Chain: ( I – A ’ ) -1 ≈ ( I – A i 2 ) -1 Z  ½ [ I +(1 + A ) Z ’ ( I + A )] Induction: Z ’ ≈ ( I – A ’ ) -1

PSEUDOCODE x = Solve( I, A 0, … A d, b) 1.For i from 1 to d, set b i = ( I + A i ) b i-1. 2.Set x d = b d. 3.For i from d - 1 downto 0, set x i = ½[b i +( I + A i )x i+1 ].

FACTORIZATION INTO PRODUCT [CCLPT`15] alternate step for computing matrix roots, ( I – A ) p for some |p|<1 ( I – A ) -1 = (I + A /2) ( I – 3/4 A 2 -1/4 A 3 ) -1 (I + A /2) Hard part: sparsifying I – 3/4 A 2 -1/4 A 3 3/4( I – A 2 ): same as before 1/4( I – A 3 ): cubic power

WHAT IS I - A 3 A : one step of random walk A 3 : 3 steps of random walk (part of) edge uv in I - A 3 Length 3 path in A : u-y-z-v Weight: A uy A yz A zv

PSEUDOCODE Repeat O(cmlognε -2 ) times: 1.Pick an integer 1 ≤ k ≤ c and an edge e = uv, both uniformly at random. 2.Perform (k -1)-step random walk from u. 3.Perform (r - k)-step random walk from v. 4.Add a scaled copy of the corresponding edge to the sparsifier Resembles: Local clustering Approximate triangle counting (c = 3)

OUTLINE (Structured) Linear Systems Iterative and Direct Methods (Graph) Sparsification Sparsified Squaring Speeding up Gaussian Elimination

DIRECT METHODS Row reduction Eliminate variable by subtracting equations from each other Sparse case? Effect of reduction: creates more non-zeros in matrix. Quickly get dense matrices Runtime: n steps, each O(degree 2 ), O(n 3 ) total

SPARSE GAUSSIAN ELIMINATION Goal: keep intermediate matrices sparse? [George `73][LRT `79]: nested dissection: O(nlogn) size inverses for planar graphs Schur Complement

KEY QUESTION Ways of controlling fill: Eliminate in the right order: Minimum degree heuristic Elimination / separator trees Drop entries: incomplete Cholesky Schur complement is still a graph, can also be sparsified

SPARSE BLOCK CHOLESKY Linear system solve reduces to: 2 solves involving top left block 1 solve on the Schur complement [KLPRS`16]: Repeatedly pivot out constant fraction of variables similar to matrix inverse via matrix multiplication (solves on red blocks)

TAIL RECURSION (solves on red blocks) Choose partition so top-left is easy to invert using iterative methods Recurrence: T(n) = T(0.99n) + O(nnz)

CHOOSING SET TO ELIMINATE α- block diagonally dominant (α-BDD) subset F: each vertex has ≥ 0.1 of total (weighed) degree going to V \ F = C Intuition: approximate independent set Identical to AMG: C: coarse grid F: fine grid - coarse Best case scenario: independent set

ITERATIVE METHOD ON M FF Division with multiplication: (1 – a) -1 = 1 + a + a 2 + a 3 … M FF = I – A : Row/column sum of A < 0.9 A 10t < e -t, quickly goes to 0 We had to be very careful with operators when addressing this. OPEN : random walk based view

Findingα-bDD subsets Pick F randomly: each u w.p. ½ Trim F: only keep good blocks Removing blocks from F can only decrease inner degree of remaining blocks Linearity of expectation: 1/4 of all blocks kept w.p. 1/2 half of u’s neighbors are not picked Markov inequality: u picked, and good w.p. ≥ 1/4

OVERALL CALL ROUTINE Cost with O(n) sized sparse approximations: T(n) = T(0.99n) + O(n) = O(n) 2 solves involving top left block: O(nnz) 1 solve on the Schur complement: T(0.99n)

KYNG-SACHDEVA `16 ( Per-entry pivoting, almost identical to incomplete LU

ONGOING WORK Connection to multigrid / multiscale? Other low factor width matrices: Multi-commodity flows? Linear elasticity problems? General PSD Linear Systems? Extension to convex optimization?