CIS 700: “algorithms for Big Data”

CIS 700: “algorithms for Big Data”
Lecture 6: Graph Sketching Slides at Grigory Yaroslavtsev

Sketching Graphs? We know how to sketch vectors: 𝑣→𝑀𝑣
How about sketching graphs? 𝐺 𝑉,𝐸 ≡ 𝐴 𝐺 (adjacency matrix): 𝐴 𝐺 →𝑀 𝐴 𝐺 Sketch columns of 𝐴 𝐺 𝑛= 𝑉 , 𝑚=|𝐸| 𝑂(𝑝𝑜𝑙𝑦( log 𝑛 )) sketch per vertex / 𝑂 (𝑛) total Check connectivity Check bipartiteness As always, space rather than dimension. Why?

Graph Streams Semi-streaming model: [Muthukrishnan ’05; Feigenbaum, Kannan, McGregor, Suri, Zhang’05] Graph defined by the stream of edges 𝑒 1 ,…, 𝑒 𝑚 Space 𝑂 𝑛 , edges processed in order Connectivity is easy on 𝑂 (𝑛) space for insertion-only Dynamic graphs: Stream of insertion/deletion updates + 𝑒 𝑖 1 , − 𝑒 𝑖 2 , …, − 𝑒 𝑖 𝑡 (assume sequence is correct) Resulting graph has edge 𝑒 𝑖 if it wasn’t deleted after the last insertion Linear sketching dynamic graphs: 𝑀 𝐴 𝐺∖𝑒 =𝑀 𝐴 𝐺 − MA e

Distributed Computing
Linear sketches for distributed processing 𝑆 servers with o(m) memory: Send 𝑚/𝑆 edges ( 𝐸 1 ,…, 𝐸 𝑠 ) to each server Compute sketches 𝑀 𝐸 1 ,…, 𝑀 𝐸 𝑠 locally Send sketches to a central server Compute 𝑀 𝐴 𝐺 = 𝑖 𝑠 𝑀 𝐸 𝑖 𝑀 has to have a small representation (same issue as in streaming)

Connectivity Thm. Connectivity is sketchable in 𝑂 (𝑛) space Framework:
Take existing connectivity algorithm (Boruvka) Sketch 𝐴 𝐺 →𝑀 𝐴 𝐺 Run Boruvka on 𝑀 𝐴 𝐺 Important that the sketch is homomorphic w.r.t the algorithm

Part 1: Parallel Connectivity (Boruvka)
Repeat until no edges left: For each vertex, select any incident edge Contract selected edges Lemma: process converges in 𝑂( log 𝑛 ) steps

Part 2: Graph Representation
For a vertex 𝑖 let 𝑎 𝑖 be a vector in ℝ 𝑛 2 Non-zero entries for edges 𝑖,𝑗 𝑎 𝑖 𝑖,𝑗 =1 if 𝑗>𝑖 𝑎 𝑖 𝑖,𝑗 =−1 if j <𝑖 Example: 𝑎 1 = 1, 1, 1, 1, 0, …,0 𝑎 2 = −1, 0, 0, 0, 0, 0, 1, 0, 1, …, 0 Lem: For any 𝑆⊆𝑉 supp 𝑖∈𝑆 𝑎 𝑖 =𝐸(𝑆,𝑉∖𝑆) 7 3 2 6 1 4 5 1,2 , 1,3 , 1,4 , 1,5 , 1,6 , 1,7 , 2,3 , ,5 , …

Part 3: 𝐿 0 -Sampling There is a distribution over 𝑀∈ ℝ 𝑑×𝑚 with 𝑑=𝑂( log 2 𝑚 ) such w.p. 9/10 that ∀𝑎∈ ℝ 𝑚 : 𝐶𝑎→𝑒∈𝑠𝑢𝑝𝑝(𝑎) [Cormode, Muthukrishnan, Rozenbaum’05; Jowhari, Saglam, Tardos ‘11] Constant probability suffices − still 𝑂 log 𝑛 Boruvka iterations

→𝑒∈𝑠𝑢𝑝𝑝 𝑗∈𝑆 𝑎 𝑗 =𝐸(𝑆,𝑉∖𝑆)
Final Algorithm Construct log 𝑛 ℓ 0 -samplers for each 𝑎 𝑖 Run Boruvka on sketches: Use 𝐶 1 𝑎 𝑗 to get an edge incident on a node 𝑗 For 𝑖=2 to 𝑡: To get incident edge on a component 𝑆⊆𝑉 use: 𝑗∈𝑆 𝐶 𝑖 𝑎 𝑗 = 𝐶 𝑖 𝑗∈𝑆 𝑎 𝑗 → →𝑒∈𝑠𝑢𝑝𝑝 𝑗∈𝑆 𝑎 𝑗 =𝐸(𝑆,𝑉∖𝑆)

K-Connectivity Graph is 𝑘-connected is every cut has size ≥𝑘
Thm: There is a 𝑂(𝑛𝑘 log 3 𝑛 )-size linear sketch for k-connectivity Generalization: There is an 𝑂(𝑛 log 5 𝑛 / 𝜖 2 )-size linear sketch which allows to approximate all cuts in a graph up to error (1±𝜖)

K-connectivity Algorithm
Algorithm for 𝑘-connectivity: Let 𝐹 1 be a spanning forest of 𝐺(𝑉,𝐸) For 𝑖=2, …, 𝑘 Let 𝐹 𝑖 be a spanning forest of 𝐺(𝑉, 𝐸∖ 𝐹 1 ∖…∖ 𝐹 𝑖−1 ) Lem: 𝐺(𝑉, 𝐹 1 +…+ 𝐹 𝑘 ) is k-connected iff G(V,E) is. ⇒ Trivial ⇐ Consider a cut in 𝐺(𝑉, 𝑖=1 𝑘 𝐹 𝑖 ) of size <𝑘 ⇒∃ 𝑖 ∗ : this cut didn’t grow in step 𝑖 ∗ ⇒ there is a cut in 𝐺(𝑉,𝐸) of size <𝑘 ⇒ contradiction

K-connectivity Algorithm
Construct 𝑘 independent linear sketches { 𝑀 1 𝐴 𝐺 , 𝑀 2 𝐴 𝐺 …, 𝑀 𝑘 𝐴 𝐺 } for connectivity Run 𝑘-connectivity algorithm on sketches: Use 𝑀 1 𝐴 𝐺 to get a spanning forest 𝐹 1 of 𝐺 Use 𝑀 2 𝐴 𝐺 − 𝑀 2 𝐴 𝐹 1 = 𝑀 2 ( 𝐴 𝐺− 𝐹 1 ) to find 𝐹 2 Use 𝑀 3 𝐴 𝐺 − 𝑀 3 𝐴 𝐹 1 − 𝑀 3 𝐴 𝐹 2 = 𝑀 3 ( 𝐴 𝐺− 𝐹 1 − 𝐹 2 ) to find 𝐹 3 …

Bipartiteness Reduction: Given 𝐺 define 𝐺′ where vertices 𝑣→( 𝑣 1 , 𝑣 2 ); edges 𝑢,𝑣 →( 𝑢 1 , 𝑣 2 ) & ( 𝑢 2 , 𝑣 1 ) Lem: # connected components doubles iff the graph is bipartite. Thm: 𝑂(𝑛 log 3 𝑛 )-size linear sketch for k-connectivity (sketch 𝐺′ (implicitly).) → →

𝑤 𝑀𝑆𝑇 ≤𝑛− 1+𝜖 𝑟 + 𝑖=0…𝑟−1 𝜆 𝑖 𝑛 𝑖 ≤ 1+𝜖 𝑤 𝑀𝑆𝑇
Minimum Spanning Tree If 𝑛 𝑖 =# connected components in a subgraph induced by edges of weight ≤ 1+𝜖 𝑖 : 𝑤 𝑀𝑆𝑇 ≤𝑛− 1+𝜖 𝑟 + 𝑖=0…𝑟−1 𝜆 𝑖 𝑛 𝑖 ≤ 1+𝜖 𝑤 𝑀𝑆𝑇 where 𝜆 𝑖 = ( 1+𝜖 𝑖+1 − 1+𝜖 𝑖 cc(G) = #connected components of 𝐺 Round weights up to the nearest power of 1+𝜖 𝐺 𝑖 ≡ subgraph with edges of weight ≤ 1+𝜖 𝑖 Edges taken by the Kruskal’s algorithm: n – cc( 𝐺 0 ) edges of weight 1 𝑐𝑐 𝐺 0 −𝑐𝑐( 𝐺 1 ) edges of weight (1+𝜖) … cc 𝐺 𝑖−1 − cc 𝐺 𝑖 edges of weight 1+𝜖 𝑖

Minimum Spanning Tree Let 𝑟= log 1+𝜖 𝑊 where 𝑊= max edge weight
Overall weight: 𝑛 −𝑐𝑐 𝐺 𝑟 1+𝜖 𝑖 (𝑐𝑐 𝐺 𝑖−1 −𝑐𝑐( 𝐺 𝑖 )) =𝑛− 1+𝜖 𝑟 + 0 𝑟−1 ( 1+𝜖 𝑖+1 − 1+𝜖 𝑖 ) 𝑐𝑐( 𝐺 𝑖 ) Thm: 1+𝜖 -approx. MST weight can be computed with 𝑂 𝑛 linear sketch for 𝑊=𝑝𝑜𝑙𝑦(𝑛)

MST: Single Linkage Clustering
[Zahn’71] Clustering via MST (Single-linkage): k clusters: remove 𝒌−𝟏 longest edges from MST Maximizes minimum intercluster distance “” [Kleinberg, Tardos]

Cut Sparsification Two problems: General cut sparsification framework:
Approximating Min-Cut in the graph (up to 1±𝜖) Preserving all cuts in the graph (up to 1±𝜖) General cut sparsification framework: Sample each edge 𝑒 with probability 𝑝 𝑒 Assign sampled edges weights 1/ 𝑝 𝑒 Expected weight of each cut is preserved, but too many cuts − can’t take union bound

Cut Sparsification For an edge 𝑒 let 𝜆 𝑒 = weight of the minimum cut that contains 𝑒 𝜆= size of the Min-Cut in G Thm [Fung et al.]: If 𝐺 is an undirected weighted graph then if 𝑝 𝑒 ≥ min 𝐶 log 2 𝑛 𝜆 𝑒 𝜖 2 ,1 then the cut sparsification alg. preserves weights of all cuts up to (1±𝜖) Thm [Karger]: 𝑝 𝑒 ≥ min 𝐶 log 𝑛 𝜆 𝜖 2 ,1 preserves Min-Cut up to (1±𝜖)

Minimum Cut Algorithm: For 𝑖 = 0,1,…, 2 log 𝑛 :
Let 𝐺 𝑖 be the subgraph of 𝐺 where each edge is sampled with probability 1/ 2 𝑖 Let 𝐻 𝑖 = F 1 ,…, 𝐹 𝑘 where 𝑘=𝑂 1 𝜖 2 ⋅ log 𝑛 and 𝐹 𝑖 are forests constructed by the k-connectivity alg. Return 2 𝑗 𝜆( 𝐻 𝑗 ) where 𝑗=min⁡{𝑖 :𝜆 𝐻 𝑖 <𝑘} Space: 𝑂 𝑛 log 4 𝑛 𝜖 2 , works for dynamic graph streams

Minimum Cut: Analysis Key property: If 𝐺 𝑖 has ≤𝑘 edges across a cut then 𝐻 𝑖 contains all such edges 𝑖 ∗ = log max 1, 𝜆 𝜖 2 6 log 𝑛 𝑖≤ 𝑖 ∗ ⇒ 𝑝 𝑒 ≥ min 6 log 𝑛 𝜆 𝜖 2 ,1 ⇒ min cut in 𝐺 𝑖 is approximating min-cut in 𝐺 up to (1±𝜖) 𝑖= 𝑖 ∗ : By Chernoff bound # edges in 𝐺 𝑖 ∗ that crosses min-cut in 𝐺 is 𝑂 1 𝜖 2 log 𝑛 ≤𝑘 w.h.p.

Cut Sparsification Algorithm: For 𝑖 = 0,1,…, 2 log 𝑛 :
Let 𝐺 𝑖 be the subgraph of 𝐺 where each edge is sampled with probability 1/ 2 𝑖 Let 𝐻 𝑖 = F 1 ,…, 𝐹 𝑘 where 𝑘=𝑂 1 𝜖 2 ⋅ log 2 𝑛 and 𝐹 𝑖 are forests constructed by the k-connectivity alg. For each edge 𝑒 let 𝑗 𝑒 =min i: 𝜆 𝑒 𝐻 𝑖 <𝑘 . If 𝑒∈ 𝐻 𝑗 𝑒 then add e to the sparsifier with weight 2 𝑗 𝑒 Space: 𝑂 𝑛 log 5 𝑛 𝜖 2 , works for dynamic graph streams Analysis similar to the Min-Cut using [Fung et al.]

CIS 700: “algorithms for Big Data”

Similar presentations

Presentation on theme: "CIS 700: “algorithms for Big Data”"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CIS 700: “algorithms for Big Data”

Similar presentations

Presentation on theme: "CIS 700: “algorithms for Big Data”"— Presentation transcript:

Similar presentations

About project

Feedback