Fast, Randomized Algorithms for Partitioning, Sparsification, and

Slides:

Advertisements

Similar presentations

05/11/2005 Carnegie Mellon School of Computer Science Aladdin Lamps 05 Combinatorial and algebraic tools for multigrid Yiannis Koutis Computer Science.

Advertisements

Great Theoretical Ideas in Computer Science

C&O 355 Lecture 23 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.

Solving Laplacian Systems: Some Contributions from Theoretical Computer Science Nick Harvey UBC Department of Computer Science.

2/14/13CMPS 3120 Computational Geometry1 CMPS 3120: Computational Geometry Spring 2013 Planar Subdivisions and Point Location Carola Wenk Based on: Computational.

Great Theoretical Ideas in Computer Science for Some.

The Combinatorial Multigrid Solver Yiannis Koutis, Gary Miller Carnegie Mellon University TexPoint fonts used in EMF. Read the TexPoint manual before you.

Solving linear systems through nested dissection Noga Alon Tel Aviv University Raphael Yuster University of Haifa.

Siddharth Choudhary.  Refines a visual reconstruction to produce jointly optimal 3D structure and viewing parameters  ‘bundle’ refers to the bundle.

An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

Information Networks Graph Clustering Lecture 14.

Approximation Algorithms for Unique Games Luca Trevisan Slides by Avi Eyal.

10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.

Random Walks Ben Hescott CS591a1 November 18, 2002.

Entropy Rates of a Stochastic Process

Analysis of Network Diffusion and Distributed Network Algorithms Rajmohan Rajaraman Northeastern University, Boston May 2012 Chennai Network Optimization.

Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.

Graph Sparsifiers: A Survey Nick Harvey Based on work by: Batson, Benczur, de Carli Silva, Fung, Hariharan, Harvey, Karger, Panigrahi, Sato, Spielman,

Graph Sparsifiers: A Survey Nick Harvey UBC Based on work by: Batson, Benczur, de Carli Silva, Fung, Hariharan, Harvey, Karger, Panigrahi, Sato, Spielman,

CS 584. Review n Systems of equations and finite element methods are related.

Spectral Graph Theory and its Applications Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005

EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.

Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.

Expanders Eliyahu Kiperwasser. What is it? Expanders are graphs with no small cuts. The later gives several unique traits to such graph, such as: – High.

Ramanujan Graphs of Every Degree Adam Marcus (Crisply, Yale) Daniel Spielman (Yale) Nikhil Srivastava (MSR India)

Lecture 16 Cramer’s Rule, Eigenvalue and Eigenvector Shang-Hua Teng.

15-853Page :Algorithms in the Real World Separators – Introduction – Applications.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.

Yiannis Koutis , U of Puerto Rico, Rio Piedras

Support-Graph Preconditioning John R. Gilbert MIT and U. C. Santa Barbara coauthors: Marshall Bern, Bruce Hendrickson, Nhat Nguyen, Sivan Toledo authors.

Solving Scalar Linear Systems Iterative approach Lecture 15 MA/CS 471 Fall 2003.

Theory of Computing Lecture 10 MAS 714 Hartmut Klauck.

Graph Sparsifiers Nick Harvey University of British Columbia Based on joint work with Isaac Fung, and independent work of Ramesh Hariharan & Debmalya Panigrahi.

Complexity of direct methods n 1/2 n 1/3 2D3D Space (fill): O(n log n)O(n 4/3 ) Time (flops): O(n 3/2 )O(n 2 ) Time and space to solve any problem on any.

Institute for Advanced Study, April Sushant Sachdeva Princeton University Joint work with Lorenzo Orecchia, Nisheeth K. Vishnoi Linear Time Graph.

DATA MINING LECTURE 13 Absorbing Random walks Coverage.

Models and Algorithms for Complex Networks Graph Clustering and Network Communities.

DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.

Fan Chung University of California, San Diego The PageRank of a graph.

Discrete Algorithms & Math Department Preconditioning ‘03 Algebraic Tools for Analyzing Preconditioners Bruce Hendrickson Erik Boman Sandia National Labs.

CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.

Topics in Algorithms 2007 Ramesh Hariharan. Tree Embeddings.

Artur Czumaj DIMAP DIMAP (Centre for Discrete Maths and it Applications) Computer Science & Department of Computer Science University of Warwick Testing.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University.

Graph Partitioning using Single Commodity Flows

Spectral Partitioning: One way to slice a problem in half C B A.

Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016.

CS 290H 31 October and 2 November Support graph preconditioners Final projects: Read and present two related papers on a topic not covered in class Or,

Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.

Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University

Great Theoretical Ideas in Computer Science for Some.

1 Algebraic and combinatorial tools for optimal multilevel algorithms Yiannis Koutis Carnegie Mellon University.

Generating Random Spanning Trees via Fast Matrix Multiplication Keyulu Xu University of British Columbia Joint work with Nick Harvey TexPoint fonts used.

Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.

Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.

Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.

Hans Bodlaender, Marek Cygan and Stefan Kratsch

Solving Linear Systems Ax=b

CS 290H Administrivia: April 16, 2008

Great Theoretical Ideas in Computer Science

Path Coupling And Approximate Counting

Density Independent Algorithms for Sparsifying

Structural Properties of Low Threshold Rank Graphs

Discrete Mathematics for Computer Science

CIS 700: “algorithms for Big Data”

Matrix Martingales in Randomized Numerical Linear Algebra

On the effect of randomness on planted 3-coloring models

Treewidth meets Planarity

Presentation transcript:

Fast, Randomized Algorithms for Partitioning, Sparsification, and the Solution of Linear Systems Daniel A. Spielman MIT Joint work with Shang-Hua Teng (Boston University)

Papers Nearly-Linear Time Algorithms for Graph Partitioning, Graph Sparsification, and Solving Linear Systems S-Teng ‘04 The Mixing Rate of Markov Chains, An Isoperimetric Inequality, and Computing The Volume Lovasz-Simonovits ‘93 The Eigenvalues of Random Symmetric Matrices Füredi-Komlós ‘81

Overview Preconditioning to solve linear system Find B that approximates A, such that solving is easy Three techniques: Augmented Spanning Trees [Vaidya ’90] Sparsification: Approximate graph by sparse graph Partitioning: Runtime proportional to #nodes removed

Outline Linear system solvers Combinatorial and Spectral Graph Theory Sparsification using partitioning and random sampling Graph partitioning by truncated random walks Combinatorial and Spectral Graph Theory Eigenvalues of random graphs

Diagonally Dominant Matrices Solve where diags of A at least sum of abs of others in row 1 2 3 Ex: Laplacian matrices of weighted graphs = negative weight from i to j diagonal = weighted degree Just solve for

Complexity of Solving Ax = b A positive semi-definite General Direct Methods: Gaussian Elimination (Cholesky) Fast matrix inversion Conjugate Gradient n is dimension m is number non-zeros

Complexity of Solving Ax = b A positive semi-definite, structured Express A = L LT Path: forward and backward elimination Tree: like path, work up from leaves Planar: Nested dissection [Lipton-Rose-Tarjan ’79]

Iterative Methods Preconditioned Conjugate Gradient Find easy B that approximates A Solve in time Omit for rest of talk Quality of approximation Time to solve By = c

Main Result For symmetric, diagonally dominant A Iteratively solve in time General: Planar:

Vaidya’s Subgraph Preconditioners Precondition A by the Laplacian of a subgraph, B 1 2 3 4 1 2 3 4 A B

History Vaidya ’90 Relate to graph embedding cong/dil use MST augmented MST planar: Gremban, Miller ‘96 Steiner vertices, many details Joshi ’97, Reif ‘98 Recursive, multi-level approach planar: Bern, Boman, Chen, Gilbert, Hendrickson, Nguyen, Toledo All the details, algebraic approach

History Maggs, Miller, Parekh, Ravi, Woo ‘02 after preprocessing Boman, Hendrickson ‘01 Low-stretch spanning trees S-Teng ‘03 Augmented low-stretch trees Clustering, Partitioning, Sparsification, Recursion Lower-stretch spanning trees

The relative condition number: if A is positive semi-definite if A-B is positive semi-definite, that is, For A Laplacian of graph with edges E

Bounding For B subgraph of A, and so

Fundamental Inequality 1 2 3 8 7 4 5 6 1 8

Fundamental Inequality 1 2 3 8 7 4 5 6 1 8

Application to eigenvalues of graphs = 0 for Laplacian matrices, and Example: For complete graph on n nodes, all non-zero eigs = n For path, -7 -5 -3 -1 1 3 5 7

Lower bound on [Gauttery-Leighton-Miller ‘97]

Lower bound on So And

Preconditioning with a Spanning Tree B Every edge of A not in B has unique path in B

When B is a Tree

Low-Stretch Spanning Trees Theorem (Boman-Hendrickson ’01) where Theorem (Alon-Karp-Peleg-West ’91) Theorem (Elkin-Emek-S-Teng ’04)

Vaidya’s Augmented Spanning Trees B a spanning tree plus s edges, in total

Adding Edges to a Tree

Adding Edges to a Tree Partition tree into t sub-trees, balancing stretch

Adding Edges to a Tree Partition tree into t sub-trees, balancing stretch For sub-trees connected in A, add one such bridge edge, carefully (and in blue)

Adding Edges to a Tree t sub-trees Theorem: improve to in general if planar

Sparsification All graphs can be well-approximated by a sparse graph Feder-Motwani ’91, Benczur-Karger ’96 D. Eppstein, Z. Galil, G.F. Italiano, and T. Spencer. ‘93 D. Eppstein, Z. Galil, G.F. Italiano, and A. Nissenzweig ’97 All graphs can be well-approximated by a sparse graph

Sparsification Benczur-Karger ’96: Can find sparse subgraph H s.t. (edges are subset, but different weights) We need H has edges

Example: Complete Graph If A is Laplacian of Kn, all non-zero eigenvalues are n If B is Laplacian of Ramanujan expander all non-zero eigenvalues satisfy And so

Example: Dumbell Kn A Kn If B does not contain middle edge,

Example: Grid plus edge 1 1 1 1 Random sampling not sufficient. Cut approximation not sufficient = (m-1)2 + k(m-1)

Conductance Cut = partition of vertices S Conductance of S Conductance of G

Conductance S S S

Conductance and Sparsification If conductance high (expander) can precondition by random sampling If conductance low can partition graph by removing few edges Decomposition: Partition of vertex set remove few edges graph on each partition has high conductance

Graph Decomposition Lemma: Exists partition of vertices Each Vi has large At most half the edges cross partition Alg: sample these edges recurse on these edges

Graph Decomposition Exists Lemma: Exists partition of vertices Each Vi has At most half edges cross Proof: Let S be largest set s.t. If then

Graph Decomposition Exists Proof: Let S be largest set s.t. If then S

Graph Decomposition Exists Proof: Let S be largest set s.t. If then If S big, V-S not too big If S small only recurse in S S S V - S Bounded recursion depth

Sparsification from Graph Decomposition Alg: sample these edges recurse on these edges Theorem: Exists B with #edges < Need to find the partition in nearly-linear time

Sparsification Thm: Given G, find subgraph H s.t. #edges(H) < in time Thm: For all A, find B edges General solve time:

Cheeger’s Inequality [Sinclair-Jerrum ‘89]: For Laplacian L, and D: diagonal matrix with = degree of node i

Random Sampling Given a graph will randomly sample to get So that is small, where is diagonal matrix of degrees in Useful if is big

Useful if is big If and Then, for all x orthogonal to (mutual) nullspace

But, don’t want D in there D has no impact: So implies

Work with adjacency matrix = weight of edge from i to j, 0 if i = j No difference, because

Random Sampling Rule Choose param governing sparsity of Keep edge (i,j) with prob If keep edge, raise weight by

Random Sampling Rule guarantees And, guarantees expect at most edges in

Random Sampling Theorem Useful for Will prove in unweighted case. From now on, all edges have weight 1

Analysis by Trace Trace = sum of diagonal entries = sum of eigenvalues For even k

Analysis by Trace Main Lemma: Proof of Theorem: Markov: k-th root

Expected Trace 3 2,3 1,2 1 2 3,4 2,1 4,2 4

Expected Trace Most terms zero because So, sum is zero unless each edge appears at least twice. Will code such walks to count them.

Coding walks S = {i : edge not used before, either way} For sequence where S = {i : edge not used before, either way} the index of the time edge was used before, either way

Coding walks: Example step: 0, 1, 2, 3, 4, 5, 6, 7, 8 vert: 1, 2, 3, 4, 2, 3, 4, 2, 1 2 1 3 4 S = {1, 2, 3, 4} : 1  2 2  3 3  4 4  2 : 5  2 6  3 7  4 8  1 Now, do a few slides pointing out what it meas

Coding walks: Example step: 0, 1, 2, 3, 4, 5, 6, 7, 8 vert: 1, 2, 3, 4, 2, 3, 4, 2, 1 2 1 3 4 S = {1, 2, 3, 4} : 1  2 2  3 3  4 4  2 : 5  2 6  3 7  4 8  1 Now, do a few slides pointing out what it meas

Coding walks: Example step: 0, 1, 2, 3, 4, 5, 6, 7, 8 vert: 1, 2, 3, 4, 2, 3, 4, 2, 1 2 1 3 4 S = {1, 2, 3, 4} : 1  2 2  3 3  4 4  2 : 5  2 6  3 7  4 8  1 Now, do a few slides pointing out what it meas

Coding walks: Example step: 0, 1, 2, 3, 4, 5, 6, 7, 8 vert: 1, 2, 3, 4, 2, 3, 4, 2, 1 2 1 3 4 S = {1, 2, 3, 4} : 1  2 2  3 3  4 4  2 : 5  2 6  3 7  4 8  1 Now, do a few slides pointing out what it meas

Coding walks: Example step: 0, 1, 2, 3, 4, 5, 6, 7, 8 vert: 1, 2, 3, 4, 2, 3, 4, 2, 1 2 1 3 4 S = {1, 2, 3, 4} : 1  2 2  3 3  4 4  2 : 5  2 6  3 7  4 8  1 Now, do a few slides pointing out what it meas

Valid s For each i in S is a neighbor of with a probability of being chosen < 1 That is, can be non-zero For each i not in S can take edge indicated by That is

Expected Trace by from Code

Expected Trace from Code Are ways to choose given

Expected Trace from Code Are ways to choose given

Random Sampling Theorem Useful for Random sampling sparsifies graphs of high conductance

Graph Partitioning Algorithms SDP/LP: too slow Spectral: one cut quickly, but can be unbalanced many runs Multilevel (Chaco/Metis): can’t analyze, miss small sparse cuts New Alg: Based on truncated random walks. Approximates optimal balance, Can be use to decompose Run time

Lazy Random Walk At each step: stay put with prob ½ otherwise, move to neighbor according to weight Diffusing probability mass: keep ½ for self, distribute rest among neighbors

Lazy Random Walk Diffusing probability mass: keep ½ for self, distribute rest among neighbors 2 Or, with self-loops, distribute evenly over edges 1 1 3 2

Lazy Random Walk Diffusing probability mass: keep ½ for self, distribute rest among neighbors 2 Or, with self-loops, distribute evenly over edges 1/2 1/2 1 3 2

Lazy Random Walk Diffusing probability mass: keep ½ for self, distribute rest among neighbors 2 Or, with self-loops, distribute evenly over edges 1/12 1/3 1/2 1 3 1/12 2

Lazy Random Walk Diffusing probability mass: keep ½ for self, distribute rest among neighbors 2 Or, with self-loops, distribute evenly over edges 7/48 1/4 11/24 1 3 7/48 2

Lazy Random Walk Diffusing probability mass: keep ½ for self, distribute rest among neighbors 2 2/8 1/8 3/8 1 In limit, prob mass ~ degree 3 2/8 2

Why lazy? Otherwise, might be no limit 1 1

Why so lazy to keep ½ mass? Diagonally dominant matrices weight of edge from i to j degree of node i, if i = j prob go from i to j

Rate of convergence Cheeger’s inequality: related to conductance If low conductance, slow convergence If start uniform on S, prob leave at each step = After steps, at least ¾ of mass still in S

Lovasz-Simonovits Theorem If slow convergence, then low conductance And, can find the cut from highest probability nodes .137 .135 .094 .134 .112 .129

Lovasz-Simonovits Theorem From now on, every node has same degree, d For all vertex sets S, and all t · T Where k verts with most prob at step t

Lovasz-Simonovits Theorem k verts with most prob at step t If start from node in set S with small conductance will output a set of small conductance Extension: mostly contained in S

Speed of Lovasz-Simonovits If want cut of conductance can run for steps. Want run-time proportional to #vertices removed Local clustering on massive graph Problem: most vertices can have non-zero mass when cut is found

Speeding up Lovasz-Simonovits Round all small entries of to zero Algorithm Nibble input: start vertex v, target set size k start: step: all values < map to zero Theorem: if v in conductance < size k set output a set with conductance < mostly overlapping

The Potential Function For integer k Linear in between these points

Concave: slopes decrease For

Easy Inequality

Fancy Inequality

Fancy Inequality

Chords with little progress

Dominating curve, makes progress

Dominating curve, makes progress

Proof of Easy Inequality Order vertices by probability mass time t-1 time t If top k at time t-1 only connect to top k at time t get equality

Proof of Easy Inequality Order vertices by probability mass time t-1 time t If top k at time t-1 only connect to top k at time t get equality Otherwise, some mass leaves, and get inequality

External edges from Self Loops Lemma: For every set S, and every set R of same size At least frac of edges from S don’t hit R Tight example: 3 3 3 3 (S) = 1/2 3 3 S R

External edges from Self Loops Lemma: For every set S, and every set R of same size At least (S) frac of edges from S don’t hit R Proof: When R=S, is by definition. Each edge in R-S can absorb d edges from S. But each vertex of S-R has d self-loops that do not go to R. S R

Proof of Fancy Inequality time t-1 top out in time t It(k) = top + in · top + (in + out)/2 = top/2 + (top + in + out)/2

Local Clustering Theorem: If S is set of conductance < v is random vertex of S Then output set of conductance < mostly in S, in time proportional to size of output.

Local Clustering Theorem: If S is set of conductance < v is random vertex of S Then output set of conductance < mostly in S, in time proportional to size of output. Can it be done for all conductances???

Experimental code: ClusTree 1. Cluster, crudely 2. Make trees in clusters 3. Add edges between trees, optimally No recursion on reduced matrices

Implementation ClusTree: in java timing not included could increase total time by 20% PCG Cholesky droptol Inc Chol Vaidya TAUCS Chen, Rotkin, Toledo Orderings: amd, genmmd, Metis, RCM Intel Xeon 3.06GHz, 512k L2 Cache, 1M L3 Cache

2D grid, Neumann bdry run to residual error 10-8

Impact of boundary conditions 2d Grid Dirichlet 2D grid, Neumann

2D Unstructured Delaunay Mesh Dirichlet Neumann

Future Work Practical Local Clustering More Sparsification Other physical problems: Cheeger’s inequalty Implications for combinatorics, and Spectral Hypergraph Theory

To learn more Will be split into two papers: “Nearly-Linear Time Algorithms for Graph Paritioning, Sparsification, and Solving Linear Systems” (STOC ’04, Arxiv) Will be split into two papers: numerical and combinatorial My lecture notes for Spectral Graph Theory and its Applications