An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

Slides:

Advertisements

Similar presentations

Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.

Advertisements

05/11/2005 Carnegie Mellon School of Computer Science Aladdin Lamps 05 Combinatorial and algebraic tools for multigrid Yiannis Koutis Computer Science.

Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.

Lecture 19: Parallel Algorithms

Solving Laplacian Systems: Some Contributions from Theoretical Computer Science Nick Harvey UBC Department of Computer Science.

The Combinatorial Multigrid Solver Yiannis Koutis, Gary Miller Carnegie Mellon University TexPoint fonts used in EMF. Read the TexPoint manual before you.

Uniform Sampling for Matrix Approximation Michael Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, Aaron Sidford M.I.T.

Algorithm Design Using Spectral Graph Theory Richard Peng Joint Work with Guy Blelloch, HuiHan Chin, Anupam Gupta, Jon Kelner, Yiannis Koutis, Aleksander.

Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.

Iterative methods TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A A A A.

Lecture 17 Introduction to Eigenvalue Problems

SDD Solvers: Bridging theory and practice Yiannis Koutis University of Puerto Rico, Rio Piedras joint with Gary Miller, Richard Peng Carnegie Mellon University.

Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey U. Waterloo Department of Combinatorics and Optimization Joint work with Isaac.

Graph Sparsifiers: A Survey Nick Harvey Based on work by: Batson, Benczur, de Carli Silva, Fung, Hariharan, Harvey, Karger, Panigrahi, Sato, Spielman,

Graph Sparsifiers: A Survey Nick Harvey UBC Based on work by: Batson, Benczur, de Carli Silva, Fung, Hariharan, Harvey, Karger, Panigrahi, Sato, Spielman,

Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey University of Waterloo Department of Combinatorics and Optimization Joint.

Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey U. Waterloo C&O Joint work with Isaac Fung TexPoint fonts used in EMF. Read.

Sampling from Gaussian Graphical Models via Spectral Sparsification Richard Peng M.I.T. Joint work with Dehua Cheng, Yu Cheng, Yan Liu and Shanghua Teng.

Sampling: an Algorithmic Perspective Richard Peng M.I.T.

Approximate Undirected Maximum Flows in O(m polylog(n)) Time

Lecture 21: Parallel Algorithms

Undirected ST-Connectivity 2 DL Omer Reingold, STOC 2005: Presented by: Fenghui Zhang CPSC 637 – paper presentation.

Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.

Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.

Collective Additive Tree Spanners of Homogeneously Orderable Graphs

CS240A: Conjugate Gradients and the Model Problem.

Randomness in Computation and Communication Part 1: Randomized algorithms Lap Chi Lau CSE CUHK.

Solving SDD Linear Systems in Nearly mlog 1/2 n Time Richard Peng M.I.T. A&C Seminar, Sep 12, 2014.

Hardness Results for Problems

Yiannis Koutis , U of Puerto Rico, Rio Piedras

Fast, Randomized Algorithms for Partitioning, Sparsification, and

Graph Sparsifiers Nick Harvey University of British Columbia Based on joint work with Isaac Fung, and independent work of Ramesh Hariharan & Debmalya Panigrahi.

Complexity of direct methods n 1/2 n 1/3 2D3D Space (fill): O(n log n)O(n 4/3 ) Time (flops): O(n 3/2 )O(n 2 ) Time and space to solve any problem on any.

Institute for Advanced Study, April Sushant Sachdeva Princeton University Joint work with Lorenzo Orecchia, Nisheeth K. Vishnoi Linear Time Graph.

Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.

Parallel Computing Sciences Department MOV’01 Multilevel Combinatorial Methods in Scientific Computing Bruce Hendrickson Sandia National Laboratories Parallel.

Graph Sparsifiers Nick Harvey Joint work with Isaac Fung TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.

Spanning and Sparsifying Rajmohan Rajaraman Northeastern University, Boston May 2012 Chennai Network Optimization WorkshopSpanning and Sparsifying1.

Discrete Algorithms & Math Department Preconditioning ‘03 Algebraic Tools for Analyzing Preconditioners Bruce Hendrickson Erik Boman Sandia National Labs.

A deterministic near-linear time algorithm for finding minimum cuts in planar graphs Thank you, Steve, for presenting it for us!!! Parinya Chalermsook.

Multifaceted Algorithm Design Richard Peng M.I.T..

CS 290H Administrivia: May 14, 2008 Course project progress reports due next Wed 21 May. Reading in Saad (second edition): Sections

Graph Partitioning using Single Commodity Flows

Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016.

CS 290H 31 October and 2 November Support graph preconditioners Final projects: Read and present two related papers on a topic not covered in class Or,

Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University

1 Algebraic and combinatorial tools for optimal multilevel algorithms Yiannis Koutis Carnegie Mellon University.

Fast Elimination of Redundant Linear Equations and Reconstruction of Recombination-free Mendelian Inheritance on a Pedigree Authors: Lan Liu & Tao Jiang,

Sketching complexity of graph cuts Alexandr Andoni joint work with: Robi Krauthgamer, David Woodruff.

Generating Random Spanning Trees via Fast Matrix Multiplication Keyulu Xu University of British Columbia Joint work with Nick Harvey TexPoint fonts used.

Algorithm Frameworks Using Adaptive Sampling Richard Peng Georgia Tech.

Sparsified Matrix Algorithms for Graph Laplacians Richard Peng Georgia Tech.

Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.

Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.

High Performance Linear System Solvers with Focus on Graph Laplacians

Richard Peng Georgia Tech Michael Cohen Jon Kelner John Peebles

Resparsification of Graphs

Efficient methods for finding low-stretch spanning trees

Solving Linear Systems Ax=b

Improved Randomized Algorithms for Path Problems in Graphs

Parallel Algorithm Design using Spectral Graph Theory

Nearly-Linear Time Algorithms for Markov Chains and New Spectral Primitives for Directed Graphs Richard Peng Georgia Tech.

Lecture 22: Parallel Algorithms

Density Independent Algorithms for Sparsifying

Randomized Algorithms CS648

Matrix Martingales in Randomized Numerical Linear Algebra

On the effect of randomness on planted 3-coloring models

A Numerical Analysis Approach to Convex Optimization

On Solving Linear Systems in Sublinear Time

Optimization on Graphs

Presentation transcript:

An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

Efficient Parallel Solvers for SDD Linear Systems Richard Peng M.I.T. Work in progress with Dehua Cheng (USC), Yu Cheng (USC), Yintat Lee (MIT), Yan Liu (USC), Dan Spielman (Yale), and Shanghua Teng (USC)

OUTLINE L G x = b Why is it hard? Key Tool Parallel Solver Other Forms

LARGE GRAPHS Images Algorithmic challenges: How to store? How to analyze? How to optimize? Meshes Roads Social networks

GRAPH LAPLACIAN Row/column  vertex Off-diagonal  -weight Diagonal  weighted degree Input : graph Laplacian L, vector b Output : vector x s.t. Lx ≈ b n vertices m edges

THE LAPLACIAN PARADIGM Directly related : Elliptic systems Few iterations : Eigenvectors, Heat kernels Many iterations / modify algorithm Graph problems Image processing

Direct Methods: O(n 3 )  O(n ) Iterative methods: O(nm), O(mκ 1/2 ) Combinatorial Preconditioning [Vaidya`91]: O(m 7/4 ) [Boman-Hendrickson`01]: O(mn) [Spielman-Teng `03, `04]: O(m 1.31 )  O(mlog c n) [KMP`10][KMP`11][KOSZ 13][LS`13][CKMPPRX`14]: O(mlog 2 n)  O(mlog 1/2 n) SOLVERS n x n matrix m non-zeros

Nearly-linear work parallel Laplacian solvers [KM `07]: O(n 1/6+a ) for planar [BGKMPT `11]: O(m 1/3+a ) PARALLEL SPEEDUPS Speedups by splitting work Time: max # of dependent steps Work: # operations Common architectures: multicore, MapReduce

OUR RESULT Input : Graph Laplacian L G with condition number κ Output : Access to operator Z s.t. Z ≈ ε L G -1 Cost : O(log c1 m log c2 κ log(1/ε)) depth O(m log c1 m log c2 κ log(1/ε)) work Note: L G is low rank, omitting pseudoinverses Logarithmic dependency on error κ ≤ O(n 2 w max /w min ) Extension: sparse approximation of L G p for any -1 ≤ p ≤ 1 with poly(1/ε) dependency

SUMMARY Would like to solve L G x = b Goal: polylog depth, nearly-linear work

OUTLINE L G x = b Why is it hard? Key Tool Parallel Solver Other Forms

EXTREME INSTANCES Highly connected, need global steps Long paths / tree, need many steps Solvers must handle both simultaneously Each easy on their own: Iterative methodGaussian elimination

PREVIOUS FAST ALGORITHMS Combinatorial preconditioning Spectral sparsification Tree Routing Low stretch spanning trees Local partitioningTree ContractionIterative Methods Reduce G to a sparser G’ Terminate at a spanning tree T Polynomial in L G L T -1 Need: L G -1 L T = ( L G L T -1 ) -1 Horner’s method: degree d  O(dlogn) depth [Spielman-Teng` 04]: d ≈ n 1/2 Fast due to sparser graphs Focus of subsequent improvements ‘Driver’

If |a| ≤ ρ, κ = (1-ρ) -1 terms give good approximation to (1 – a) -1 POLYNOMIAL APPROXIMATIONS Division with multiplication: (1 – a) -1 = 1 + a + a 2 + a 3 + a 4 + a 5 … Spectral theorem: this works for marices! Better: Chebyshev / heavy ball: d = O(κ 1/2 ) sufficient Optimal ([OSV `12]) Exists G (,e.g. cycle) where κ( L G L T -1 ) needs to be Ω(n) Ω(n 1/2 ) lower bound on depth?

LOWER BOUND FOR LOWER BOUND [BGKMPT `11]: O(m 1/3+a ) via. (pseudo) inverse: Preprocess: O(log 2 n) depth, O(n ω ) work Solve: O(logn) depth, O(n 2 ) work Inverse is dense, expensive to use Only use on O(n 1/3 ) sized instances Possible improvement: can we make L G -1 sparse? Multiplying by L G -1 is highly parallel! [George `73][LRT `79]:yes for planar graphs

SUMMARY Would like to solve L G x = b Goal: polylog depth, nearly-linear work `Standard’ numerical methods have high depth Equivalent: sparse inverse representations Aside : cut approximation / oblivious routing schemes by [Madry `10][Sherman `13][KLOS `13] are parallel, can be viewed as asynchronous iterative methods

OUTLINE L G x = b Why is it hard? Key Tool Parallel Solver Other Forms

DEGREE D POLYNOMIAL  DEPTH D? Apply to power method: (1 – a) -1 = 1 + a + a 2 + a 3 + a 4 + a 5 + a 6 + a 7 … =(1 + a) (1 + a 2 ) (1 + a 4 )… a 16 = (((a 2 ) 2 ) 2 ) 2 Repeated squaring sidesteps assumption in lower bound! Matrix version: I + ( A ) 2 i

REDUCTION TO ( I – A ) -1 Adjust/rescale so diagonal = I Add to diag( L ) to make it full rank A: Weighted degree < 1 Random walk,| A | < 1

INTERPRETATION A : one step transition of random walk A 2 i : 2 i step transition of random walk One step of walk on each A i = A 2 i A I ( I – A ) -1 = ( I + A )( I + A 2 )…( I + A 2 i )… O(logκ) matrix multiplications O(n ω logκlogn) work Need: size reductions Until A 2 i becomes `expander’

SIMILAR TO ConnectivityParallel Solver Iteration A i+1 ≈ A i 2 Until | A d | small Size ReductionLow degreeSparse graph MethodDerandomizedRandomized Solution transferConnectivity ( I - A i )x i = b i Multiscale methods NC algorithm for shortest path Logspace connectivity: [Reingold `02] Deterministic squaring: [Rozenman Vadhan `05]

SUMMARY Would like to solve L G x = b Goal: polylog depth, nearly-linear work `Standard’ numerical methods have high depth Equivalent: sparse inverse representations Squaring gets around lower bound

OUTLINE L G x = b Why is it hard? Key Tool Parallel Solver Other Forms

b  x: linear operator, Z Algorithm  matrix Z ≈ ε ( I – A ) -1 WHAT IS AN ALGORITHM b x Goal: Z = sum/product of a few matrices InputOutput Z ≈ ε :, spectral similarity with relative error ε Symmetric, invertible, composable (additive)

SQUARING [BSS`09]: exists I - A ’ ≈ ε I – A 2 with O(nε -2 ) entries [ST `04][SS`08][OV `11] + some modifications: O(nlog c n ε -2 ) entries, efficient, parallel [Koutis `14]: faster algorithm based on spanners /low diameter decompositions

APPROXIMATE INVERSE CHAIN I - A 1 ≈ ε I – A 2 I – A 2 ≈ ε I – A 1 2 … I – A i ≈ ε I – A i-1 2 I - A d ≈ I I - A 0 I - A d ≈ I Convergence: | A i+1 |<| A i |/2 I – A i+1 ≈ ε I – A i 2 : | A i+1 |<| A i |/ 1.5 d = O(logκ)

ISSUE 1 Only have 1 – a i+1 ≈ 1 – a i 2 Solution: apply one at a time (1 – a i ) -1 = (1 + a i )(1 – a i 2 ) -1 ≈ (1 + a i )(1 – a i+1 ) -1 Induction: z i+1 ≈ (1 – a i+1 ) -1 I - A 0 I - A d ≈ I z i = (1 + a i ) z i+1 ≈ (1 + a i )(1 – a i+1 ) -1 ≈(1 – a i ) -1 Need to invoke: (1 – a) -1 = (1 + a) (1 + a 2 ) (1 + a 4 )… z d = (1 – a d ) -1 ≈ 1

ISSUE 2 In matrix setting, replacements by approximations need to be symmetric: Z ≈ Z ’  U T ZU ≈ U T Z ’ U In Z i, terms around ( I - A i 2 ) -1 ≈ Z i+1 needs to be symmetric ( I – A i ) Z i+1 is not symmetric around Z i+1  Solution 1 ([PS `14]): (1 – a) -1 =1/2 ( 1 + (1 + a)(1 – a 2 ) -1 (1 + a))

ALGORITHM Z i+1 ≈ α+ε ( 1 – A i 2 ) -1 ( I – A i ) -1 = ½ [ I +( 1 + A i ) ( I – A i 2 ) -1 ( 1 + A i )] Composition: Z i ≈ α+ε ( I – A i ) -1 Total error = dε= O(logκε) Chain: ( I – A i+1 ) -1 ≈ ε ( I – A i 2 ) -1 Z i  ½ [ I +(1 + A i ) Z i+1 ( I + A i )] Induction: Z i+1 ≈ α ( I – A i+1 ) -1

PSEUDOCODE x = Solve( I, A 0, … A d, b) 1.For i from 1 to d, set b i = ( I + A i ) b i-1. 2.Set x d = b d. 3.For i from d - 1 downto 0, set x i = ½[b i +( I + A i )x i+1 ].

TOTAL COST d = O(logκ) ε = 1 / d nnz( A i ): O(nlog c nlog 2 κ) O(log c nlogκ) depth, O(nlog c nlog 3 κ) work Multigrid V-cycle like call structure: each level makes one call to next Answer from d = O(log(κ)) matrix-vector multiplications

SUMMARY Would like to solve L G x = b Goal: polylog depth, nearly-linear work `Standard’ numerical methods have high depth Equivalent: sparse inverse representations Squaring gets around lower bound Can keep squares sparse Operator view of algorithms can drive its design

OUTLINE L G x = b Why is it hard? Key Tool Parallel Solver Other Forms

REPRESENTATION OF ( I – A ) -1 Algorithm from [PS `14] gives: (I – A ) -1 ≈ ½[ I + ( I + A 0 )[ I + ( I + A 1 )( I – A 2 ) -1 ( I + A 1 )]( I + A 0 )] Sum and product of O(logκ) matrices Need: just a product Gaussian graphical models sampling: Sample from Gaussian with covariance I – A Need C s.t. C T C ≈ (I – A) -1

SOLUTION 2 ( I – A ) -1 = ( I + A ) 1/2 ( I – A 2 ) -1 ( I + A ) 1/2 ≈ ( I + A ) 1/2 ( I – A 1 ) -1 ( I + A ) 1/2 Repeat on A 1 : (I – A) -1 ≈ C T C where C = ( I + A 0 ) 1/2 ( I + A 1 ) 1/2 …( I + A d ) 1/2 How to evaluate ( I + A i ) 1/2 ? Well-conditioned matrix Mclaurin series expansion = low degree polynomial What about ( I + A 0 ) 1/2 ? A 1 ≈ A 0 2: Eigenvalues between [0,1] Eigenvalues of I + A i in [1,2]

SOLUTION 3 ([CCLPT `14]) ( I – A ) -1 = ( I + A /2) 1/2 ( I – A /2 - A 2 /2) -1 ( I + A /2) 1/2 Modified chain: I – A i+1 ≈ I – A i /2 - A i 2 /2 I + A i /2 has eigenvalues in [1/2, 3/2] Replace with O(loglogκ) degree polynomial / Mclaurin series, T 1/2 C = T 1/2 ( I + A 0 /2) T 1/2 ( I + A 1 /2)…T 1/2 ( I + A d /2) gives (I – A) -1 ≈ C T C, Generalization to (I – A) p (-1 < p <1): T -p/2 ( I + A 0 ) T -p/2 ( I + A 1 ) …T -p/2 ( I + A d )

SUMMARY Would like to solve L G x = b Goal: polylog depth, nearly-linear work `Standard’ numerical methods have high depth Equivalent: sparse inverse representations Squaring gets around lower bound Can keep squares sparse Operator view of algorithms can drive its design Entire class of algorithms / factorizations Can approximate wider class of functions

OPEN QUESTIONS Generalizations: (Sparse) squaring as an iterative method? Connections to multigrid/multiscale methods? Other functions? log( I - A )? Rational functions? Other structured systems? Different notions of sparsification? More efficient: How fast for O(n) sized sparsifier? Better sparsifiers? for I – A 2 ? How to represent resistances? O(n) time solver? (O(mlog c n) preprocessing) Applications / implementations How fast can spectral sparsifiers run? What does L p give for -1<p<1? Trees (from sparsifiers) as a stand-alone tool?

THANK YOU! Questions? Manuscripts on arXiv: