Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast, Randomized Algorithms for Partitioning, Sparsification, and

Similar presentations


Presentation on theme: "Fast, Randomized Algorithms for Partitioning, Sparsification, and"— Presentation transcript:

1 Fast, Randomized Algorithms for Partitioning, Sparsification, and
the Solution of Linear Systems Daniel A. Spielman MIT Joint work with Shang-Hua Teng (Boston University)

2 Papers Nearly-Linear Time Algorithms for
Graph Partitioning, Graph Sparsification, and Solving Linear Systems S-Teng ‘04 The Mixing Rate of Markov Chains, An Isoperimetric Inequality, and Computing The Volume Lovasz-Simonovits ‘93 The Eigenvalues of Random Symmetric Matrices Füredi-Komlós ‘81

3 Overview Preconditioning to solve linear system
Find B that approximates A, such that solving is easy Three techniques: Augmented Spanning Trees [Vaidya ’90] Sparsification: Approximate graph by sparse graph Partitioning: Runtime proportional to #nodes removed

4 Outline Linear system solvers Combinatorial and Spectral Graph Theory
Sparsification using partitioning and random sampling Graph partitioning by truncated random walks Combinatorial and Spectral Graph Theory Eigenvalues of random graphs

5 Diagonally Dominant Matrices
Solve where diags of A at least sum of abs of others in row 1 2 3 Ex: Laplacian matrices of weighted graphs = negative weight from i to j diagonal = weighted degree Just solve for

6 Complexity of Solving Ax = b A positive semi-definite
General Direct Methods: Gaussian Elimination (Cholesky) Fast matrix inversion Conjugate Gradient n is dimension m is number non-zeros

7 Complexity of Solving Ax = b A positive semi-definite, structured
Express A = L LT Path: forward and backward elimination Tree: like path, work up from leaves Planar: Nested dissection [Lipton-Rose-Tarjan ’79]

8 Iterative Methods Preconditioned Conjugate Gradient
Find easy B that approximates A Solve in time Omit for rest of talk Quality of approximation Time to solve By = c

9 Main Result For symmetric, diagonally dominant A
Iteratively solve in time General: Planar:

10 Vaidya’s Subgraph Preconditioners
Precondition A by the Laplacian of a subgraph, B 1 2 3 4 1 2 3 4 A B

11 History Vaidya ’90 Relate to graph embedding cong/dil use MST
augmented MST planar: Gremban, Miller ‘96 Steiner vertices, many details Joshi ’97, Reif ‘98 Recursive, multi-level approach planar: Bern, Boman, Chen, Gilbert, Hendrickson, Nguyen, Toledo All the details, algebraic approach

12 History Maggs, Miller, Parekh, Ravi, Woo ‘02 after preprocessing
Boman, Hendrickson ‘01 Low-stretch spanning trees S-Teng ‘03 Augmented low-stretch trees Clustering, Partitioning, Sparsification, Recursion Lower-stretch spanning trees

13 The relative condition number:
if A is positive semi-definite if A-B is positive semi-definite, that is, For A Laplacian of graph with edges E

14 Bounding For B subgraph of A, and so

15 Fundamental Inequality
1 2 3 8 7 4 5 6 1 8

16 Fundamental Inequality
1 2 3 8 7 4 5 6 1 8

17 Application to eigenvalues of graphs
= 0 for Laplacian matrices, and Example: For complete graph on n nodes, all non-zero eigs = n For path, -7 -5 -3 -1 1 3 5 7

18 Lower bound on [Gauttery-Leighton-Miller ‘97]

19 Lower bound on So And

20 Preconditioning with a Spanning Tree B
Every edge of A not in B has unique path in B

21 When B is a Tree

22 Low-Stretch Spanning Trees
Theorem (Boman-Hendrickson ’01) where Theorem (Alon-Karp-Peleg-West ’91) Theorem (Elkin-Emek-S-Teng ’04)

23 Vaidya’s Augmented Spanning Trees
B a spanning tree plus s edges, in total

24 Adding Edges to a Tree

25 Adding Edges to a Tree Partition tree into t sub-trees, balancing stretch

26 Adding Edges to a Tree Partition tree into t sub-trees, balancing stretch For sub-trees connected in A, add one such bridge edge, carefully (and in blue)

27 Adding Edges to a Tree t sub-trees Theorem: improve to in general
if planar

28 Sparsification All graphs can be well-approximated by a sparse graph
Feder-Motwani ’91, Benczur-Karger ’96 D. Eppstein, Z. Galil, G.F. Italiano, and T. Spencer. ‘93 D. Eppstein, Z. Galil, G.F. Italiano, and A. Nissenzweig ’97 All graphs can be well-approximated by a sparse graph

29 Sparsification Benczur-Karger ’96: Can find sparse subgraph H s.t.
(edges are subset, but different weights) We need H has edges

30 Example: Complete Graph
If A is Laplacian of Kn, all non-zero eigenvalues are n If B is Laplacian of Ramanujan expander all non-zero eigenvalues satisfy And so

31 Example: Dumbell Kn A Kn If B does not contain middle edge,

32 Example: Grid plus edge
1 1 1 1 Random sampling not sufficient. Cut approximation not sufficient = (m-1)2 + k(m-1)

33 Conductance Cut = partition of vertices S Conductance of S
Conductance of G

34 Conductance S S S

35 Conductance and Sparsification
If conductance high (expander) can precondition by random sampling If conductance low can partition graph by removing few edges Decomposition: Partition of vertex set remove few edges graph on each partition has high conductance

36 Graph Decomposition Lemma: Exists partition of vertices
Each Vi has large At most half the edges cross partition Alg: sample these edges recurse on these edges

37 Graph Decomposition Exists
Lemma: Exists partition of vertices Each Vi has At most half edges cross Proof: Let S be largest set s.t. If then

38 Graph Decomposition Exists
Proof: Let S be largest set s.t. If then S

39 Graph Decomposition Exists
Proof: Let S be largest set s.t. If then If S big, V-S not too big If S small only recurse in S S S V - S Bounded recursion depth

40 Sparsification from Graph Decomposition
Alg: sample these edges recurse on these edges Theorem: Exists B with #edges < Need to find the partition in nearly-linear time

41 Sparsification Thm: Given G, find subgraph H s.t. #edges(H) <
in time Thm: For all A, find B edges General solve time:

42 Cheeger’s Inequality [Sinclair-Jerrum ‘89]: For Laplacian L, and
D: diagonal matrix with = degree of node i

43 Random Sampling Given a graph will randomly sample to get
So that is small, where is diagonal matrix of degrees in Useful if is big

44 Useful if is big If and Then, for all x orthogonal to (mutual) nullspace

45 But, don’t want D in there
D has no impact: So implies

46 Work with adjacency matrix
= weight of edge from i to j, 0 if i = j No difference, because

47 Random Sampling Rule Choose param governing sparsity of
Keep edge (i,j) with prob If keep edge, raise weight by

48 Random Sampling Rule guarantees And,
guarantees expect at most edges in

49 Random Sampling Theorem
Useful for Will prove in unweighted case. From now on, all edges have weight 1

50 Analysis by Trace Trace = sum of diagonal entries = sum of eigenvalues
For even k

51 Analysis by Trace Main Lemma: Proof of Theorem: Markov: k-th root

52 Expected Trace 3 2,3 1,2 1 2 3,4 2,1 4,2 4

53 Expected Trace Most terms zero because
So, sum is zero unless each edge appears at least twice. Will code such walks to count them.

54 Coding walks S = {i : edge not used before, either way}
For sequence where S = {i : edge not used before, either way} the index of the time edge was used before, either way

55 Coding walks: Example step: 0, 1, 2, 3, 4, 5, 6, 7, 8
vert: 1, 2, 3, 4, 2, 3, 4, 2, 1 2 1 3 4 S = {1, 2, 3, 4} : 1  2 2  3 3  4 4  2 : 5  2 6  3 7  4 8  1 Now, do a few slides pointing out what it meas

56 Coding walks: Example step: 0, 1, 2, 3, 4, 5, 6, 7, 8
vert: 1, 2, 3, 4, 2, 3, 4, 2, 1 2 1 3 4 S = {1, 2, 3, 4} : 1  2 2  3 3  4 4  2 : 5  2 6  3 7  4 8  1 Now, do a few slides pointing out what it meas

57 Coding walks: Example step: 0, 1, 2, 3, 4, 5, 6, 7, 8
vert: 1, 2, 3, 4, 2, 3, 4, 2, 1 2 1 3 4 S = {1, 2, 3, 4} : 1  2 2  3 3  4 4  2 : 5  2 6  3 7  4 8  1 Now, do a few slides pointing out what it meas

58 Coding walks: Example step: 0, 1, 2, 3, 4, 5, 6, 7, 8
vert: 1, 2, 3, 4, 2, 3, 4, 2, 1 2 1 3 4 S = {1, 2, 3, 4} : 1  2 2  3 3  4 4  2 : 5  2 6  3 7  4 8  1 Now, do a few slides pointing out what it meas

59 Coding walks: Example step: 0, 1, 2, 3, 4, 5, 6, 7, 8
vert: 1, 2, 3, 4, 2, 3, 4, 2, 1 2 1 3 4 S = {1, 2, 3, 4} : 1  2 2  3 3  4 4  2 : 5  2 6  3 7  4 8  1 Now, do a few slides pointing out what it meas

60 Valid s For each i in S is a neighbor of
with a probability of being chosen < 1 That is, can be non-zero For each i not in S can take edge indicated by That is

61 Expected Trace by from Code

62 Expected Trace from Code
Are ways to choose given

63 Expected Trace from Code
Are ways to choose given

64 Random Sampling Theorem
Useful for Random sampling sparsifies graphs of high conductance

65 Graph Partitioning Algorithms
SDP/LP: too slow Spectral: one cut quickly, but can be unbalanced many runs Multilevel (Chaco/Metis): can’t analyze, miss small sparse cuts New Alg: Based on truncated random walks. Approximates optimal balance, Can be use to decompose Run time

66 Lazy Random Walk At each step: stay put with prob ½
otherwise, move to neighbor according to weight Diffusing probability mass: keep ½ for self, distribute rest among neighbors

67 Lazy Random Walk Diffusing probability mass: keep ½ for self,
distribute rest among neighbors 2 Or, with self-loops, distribute evenly over edges 1 1 3 2

68 Lazy Random Walk Diffusing probability mass: keep ½ for self,
distribute rest among neighbors 2 Or, with self-loops, distribute evenly over edges 1/2 1/2 1 3 2

69 Lazy Random Walk Diffusing probability mass: keep ½ for self,
distribute rest among neighbors 2 Or, with self-loops, distribute evenly over edges 1/12 1/3 1/2 1 3 1/12 2

70 Lazy Random Walk Diffusing probability mass: keep ½ for self,
distribute rest among neighbors 2 Or, with self-loops, distribute evenly over edges 7/48 1/4 11/24 1 3 7/48 2

71 Lazy Random Walk Diffusing probability mass: keep ½ for self,
distribute rest among neighbors 2 2/8 1/8 3/8 1 In limit, prob mass ~ degree 3 2/8 2

72 Why lazy? Otherwise, might be no limit 1 1

73 Why so lazy to keep ½ mass?
Diagonally dominant matrices weight of edge from i to j degree of node i, if i = j prob go from i to j

74 Rate of convergence Cheeger’s inequality: related to conductance
If low conductance, slow convergence If start uniform on S, prob leave at each step = After steps, at least ¾ of mass still in S

75 Lovasz-Simonovits Theorem
If slow convergence, then low conductance And, can find the cut from highest probability nodes .137 .135 .094 .134 .112 .129

76 Lovasz-Simonovits Theorem
From now on, every node has same degree, d For all vertex sets S, and all t · T Where k verts with most prob at step t

77 Lovasz-Simonovits Theorem
k verts with most prob at step t If start from node in set S with small conductance will output a set of small conductance Extension: mostly contained in S

78 Speed of Lovasz-Simonovits
If want cut of conductance can run for steps. Want run-time proportional to #vertices removed Local clustering on massive graph Problem: most vertices can have non-zero mass when cut is found

79 Speeding up Lovasz-Simonovits
Round all small entries of to zero Algorithm Nibble input: start vertex v, target set size k start: step: all values < map to zero Theorem: if v in conductance < size k set output a set with conductance < mostly overlapping

80 The Potential Function
For integer k Linear in between these points

81 Concave: slopes decrease
For

82 Easy Inequality

83 Fancy Inequality

84 Fancy Inequality

85 Chords with little progress

86 Dominating curve, makes progress

87 Dominating curve, makes progress

88 Proof of Easy Inequality
Order vertices by probability mass time t-1 time t If top k at time t-1 only connect to top k at time t get equality

89 Proof of Easy Inequality
Order vertices by probability mass time t-1 time t If top k at time t-1 only connect to top k at time t get equality Otherwise, some mass leaves, and get inequality

90 External edges from Self Loops
Lemma: For every set S, and every set R of same size At least frac of edges from S don’t hit R Tight example: 3 3 3 3 (S) = 1/2 3 3 S R

91 External edges from Self Loops
Lemma: For every set S, and every set R of same size At least (S) frac of edges from S don’t hit R Proof: When R=S, is by definition. Each edge in R-S can absorb d edges from S. But each vertex of S-R has d self-loops that do not go to R. S R

92 Proof of Fancy Inequality
time t-1 top out in time t It(k) = top + in · top + (in + out)/2 = top/2 + (top + in + out)/2

93 Local Clustering Theorem: If S is set of conductance <
v is random vertex of S Then output set of conductance < mostly in S, in time proportional to size of output.

94 Local Clustering Theorem: If S is set of conductance <
v is random vertex of S Then output set of conductance < mostly in S, in time proportional to size of output. Can it be done for all conductances???

95 Experimental code: ClusTree
1. Cluster, crudely 2. Make trees in clusters 3. Add edges between trees, optimally No recursion on reduced matrices

96 Implementation ClusTree: in java timing not included
could increase total time by 20% PCG Cholesky droptol Inc Chol Vaidya TAUCS Chen, Rotkin, Toledo Orderings: amd, genmmd, Metis, RCM Intel Xeon 3.06GHz, 512k L2 Cache, 1M L3 Cache

97 2D grid, Neumann bdry run to residual error 10-8

98 Impact of boundary conditions
2d Grid Dirichlet 2D grid, Neumann

99 2D Unstructured Delaunay Mesh
Dirichlet Neumann

100 Future Work Practical Local Clustering More Sparsification
Other physical problems: Cheeger’s inequalty Implications for combinatorics, and Spectral Hypergraph Theory

101 To learn more Will be split into two papers:
“Nearly-Linear Time Algorithms for Graph Paritioning, Sparsification, and Solving Linear Systems” (STOC ’04, Arxiv) Will be split into two papers: numerical and combinatorial My lecture notes for Spectral Graph Theory and its Applications


Download ppt "Fast, Randomized Algorithms for Partitioning, Sparsification, and"

Similar presentations


Ads by Google