Presentation is loading. Please wait.

Presentation is loading. Please wait.

CIS 700: β€œalgorithms for Big Data”

Similar presentations


Presentation on theme: "CIS 700: β€œalgorithms for Big Data”"β€” Presentation transcript:

1 CIS 700: β€œalgorithms for Big Data”
Lecture 6: Graph Sketching Slides at Grigory Yaroslavtsev

2 Sketching Graphs? We know how to sketch vectors: 𝑣→𝑀𝑣
How about sketching graphs? 𝐺 𝑉,𝐸 ≑ 𝐴 𝐺 (adjacency matrix): 𝐴 𝐺 →𝑀 𝐴 𝐺 Sketch columns of 𝐴 𝐺 𝑛= 𝑉 , π‘š=|𝐸| 𝑂(π‘π‘œπ‘™π‘¦( log 𝑛 )) sketch per vertex / 𝑂 (𝑛) total Check connectivity Check bipartiteness As always, space rather than dimension. Why?

3 Graph Streams Semi-streaming model: [Muthukrishnan ’05; Feigenbaum, Kannan, McGregor, Suri, Zhang’05] Graph defined by the stream of edges 𝑒 1 ,…, 𝑒 π‘š Space 𝑂 𝑛 , edges processed in order Connectivity is easy on 𝑂 (𝑛) space for insertion-only Dynamic graphs: Stream of insertion/deletion updates + 𝑒 𝑖 1 , βˆ’ 𝑒 𝑖 2 , …, βˆ’ 𝑒 𝑖 𝑑 (assume sequence is correct) Resulting graph has edge 𝑒 𝑖 if it wasn’t deleted after the last insertion Linear sketching dynamic graphs: 𝑀 𝐴 πΊβˆ–π‘’ =𝑀 𝐴 𝐺 βˆ’ MA e

4 Distributed Computing
Linear sketches for distributed processing 𝑆 servers with o(m) memory: Send π‘š/𝑆 edges ( 𝐸 1 ,…, 𝐸 𝑠 ) to each server Compute sketches 𝑀 𝐸 1 ,…, 𝑀 𝐸 𝑠 locally Send sketches to a central server Compute 𝑀 𝐴 𝐺 = 𝑖 𝑠 𝑀 𝐸 𝑖 𝑀 has to have a small representation (same issue as in streaming)

5 Connectivity Thm. Connectivity is sketchable in 𝑂 (𝑛) space Framework:
Take existing connectivity algorithm (Boruvka) Sketch 𝐴 𝐺 →𝑀 𝐴 𝐺 Run Boruvka on 𝑀 𝐴 𝐺 Important that the sketch is homomorphic w.r.t the algorithm

6 Part 1: Parallel Connectivity (Boruvka)
Repeat until no edges left: For each vertex, select any incident edge Contract selected edges Lemma: process converges in 𝑂( log 𝑛 ) steps

7 Part 2: Graph Representation
For a vertex 𝑖 let π‘Ž 𝑖 be a vector in ℝ 𝑛 2 Non-zero entries for edges 𝑖,𝑗 π‘Ž 𝑖 𝑖,𝑗 =1 if 𝑗>𝑖 π‘Ž 𝑖 𝑖,𝑗 =βˆ’1 if j <𝑖 Example: π‘Ž 1 = 1, 1, 1, 1, 0, …,0 π‘Ž 2 = βˆ’1, 0, 0, 0, 0, 0, 1, 0, 1, …, 0 Lem: For any π‘†βŠ†π‘‰ supp π‘–βˆˆπ‘† π‘Ž 𝑖 =𝐸(𝑆,π‘‰βˆ–π‘†) 7 3 2 6 1 4 5 1,2 , 1,3 , 1,4 , 1,5 , 1,6 , 1,7 , 2,3 , ,5 , …

8 Part 3: 𝐿 0 -Sampling There is a distribution over π‘€βˆˆ ℝ π‘‘Γ—π‘š with 𝑑=𝑂( log 2 π‘š ) such w.p. 9/10 that βˆ€π‘Žβˆˆ ℝ π‘š : πΆπ‘Žβ†’π‘’βˆˆπ‘ π‘’π‘π‘(π‘Ž) [Cormode, Muthukrishnan, Rozenbaum’05; Jowhari, Saglam, Tardos β€˜11] Constant probability suffices βˆ’ still 𝑂 log 𝑛 Boruvka iterations

9 β†’π‘’βˆˆπ‘ π‘’π‘π‘ π‘—βˆˆπ‘† π‘Ž 𝑗 =𝐸(𝑆,π‘‰βˆ–π‘†)
Final Algorithm Construct log 𝑛 β„“ 0 -samplers for each π‘Ž 𝑖 Run Boruvka on sketches: Use 𝐢 1 π‘Ž 𝑗 to get an edge incident on a node 𝑗 For 𝑖=2 to 𝑑: To get incident edge on a component π‘†βŠ†π‘‰ use: π‘—βˆˆπ‘† 𝐢 𝑖 π‘Ž 𝑗 = 𝐢 𝑖 π‘—βˆˆπ‘† π‘Ž 𝑗 β†’ β†’π‘’βˆˆπ‘ π‘’π‘π‘ π‘—βˆˆπ‘† π‘Ž 𝑗 =𝐸(𝑆,π‘‰βˆ–π‘†)

10 K-Connectivity Graph is π‘˜-connected is every cut has size β‰₯π‘˜
Thm: There is a 𝑂(π‘›π‘˜ log 3 𝑛 )-size linear sketch for k-connectivity Generalization: There is an 𝑂(𝑛 log 5 𝑛 / πœ– 2 )-size linear sketch which allows to approximate all cuts in a graph up to error (1Β±πœ–)

11 K-connectivity Algorithm
Algorithm for π‘˜-connectivity: Let 𝐹 1 be a spanning forest of 𝐺(𝑉,𝐸) For 𝑖=2, …, π‘˜ Let 𝐹 𝑖 be a spanning forest of 𝐺(𝑉, πΈβˆ– 𝐹 1 βˆ–β€¦βˆ– 𝐹 π‘–βˆ’1 ) Lem: 𝐺(𝑉, 𝐹 1 +…+ 𝐹 π‘˜ ) is k-connected iff G(V,E) is. β‡’ Trivial ⇐ Consider a cut in 𝐺(𝑉, 𝑖=1 π‘˜ 𝐹 𝑖 ) of size <π‘˜ β‡’βˆƒ 𝑖 βˆ— : this cut didn’t grow in step 𝑖 βˆ— β‡’ there is a cut in 𝐺(𝑉,𝐸) of size <π‘˜ β‡’ contradiction

12 K-connectivity Algorithm
Construct π‘˜ independent linear sketches { 𝑀 1 𝐴 𝐺 , 𝑀 2 𝐴 𝐺 …, 𝑀 π‘˜ 𝐴 𝐺 } for connectivity Run π‘˜-connectivity algorithm on sketches: Use 𝑀 1 𝐴 𝐺 to get a spanning forest 𝐹 1 of 𝐺 Use 𝑀 2 𝐴 𝐺 βˆ’ 𝑀 2 𝐴 𝐹 1 = 𝑀 2 ( 𝐴 πΊβˆ’ 𝐹 1 ) to find 𝐹 2 Use 𝑀 3 𝐴 𝐺 βˆ’ 𝑀 3 𝐴 𝐹 1 βˆ’ 𝑀 3 𝐴 𝐹 2 = 𝑀 3 ( 𝐴 πΊβˆ’ 𝐹 1 βˆ’ 𝐹 2 ) to find 𝐹 3 …

13 Bipartiteness Reduction: Given 𝐺 define 𝐺′ where vertices 𝑣→( 𝑣 1 , 𝑣 2 ); edges 𝑒,𝑣 β†’( 𝑒 1 , 𝑣 2 ) & ( 𝑒 2 , 𝑣 1 ) Lem: # connected components doubles iff the graph is bipartite. Thm: 𝑂(𝑛 log 3 𝑛 )-size linear sketch for k-connectivity (sketch 𝐺′ (implicitly).) β†’ β†’

14 𝑀 𝑀𝑆𝑇 β‰€π‘›βˆ’ 1+πœ– π‘Ÿ + 𝑖=0β€¦π‘Ÿβˆ’1 πœ† 𝑖 𝑛 𝑖 ≀ 1+πœ– 𝑀 𝑀𝑆𝑇
Minimum Spanning Tree If 𝑛 𝑖 =# connected components in a subgraph induced by edges of weight ≀ 1+πœ– 𝑖 : 𝑀 𝑀𝑆𝑇 β‰€π‘›βˆ’ 1+πœ– π‘Ÿ + 𝑖=0β€¦π‘Ÿβˆ’1 πœ† 𝑖 𝑛 𝑖 ≀ 1+πœ– 𝑀 𝑀𝑆𝑇 where πœ† 𝑖 = ( 1+πœ– 𝑖+1 βˆ’ 1+πœ– 𝑖 cc(G) = #connected components of 𝐺 Round weights up to the nearest power of 1+πœ– 𝐺 𝑖 ≑ subgraph with edges of weight ≀ 1+πœ– 𝑖 Edges taken by the Kruskal’s algorithm: n – cc( 𝐺 0 ) edges of weight 1 𝑐𝑐 𝐺 0 βˆ’π‘π‘( 𝐺 1 ) edges of weight (1+πœ–) … cc 𝐺 π‘–βˆ’1 βˆ’ cc 𝐺 𝑖 edges of weight 1+πœ– 𝑖

15 Minimum Spanning Tree Let π‘Ÿ= log 1+πœ– π‘Š where π‘Š= max edge weight
Overall weight: 𝑛 βˆ’π‘π‘ 𝐺 π‘Ÿ 1+πœ– 𝑖 (𝑐𝑐 𝐺 π‘–βˆ’1 βˆ’π‘π‘( 𝐺 𝑖 )) =π‘›βˆ’ 1+πœ– π‘Ÿ + 0 π‘Ÿβˆ’1 ( 1+πœ– 𝑖+1 βˆ’ 1+πœ– 𝑖 ) 𝑐𝑐( 𝐺 𝑖 ) Thm: 1+πœ– -approx. MST weight can be computed with 𝑂 𝑛 linear sketch for π‘Š=π‘π‘œπ‘™π‘¦(𝑛)

16 MST: Single Linkage Clustering
[Zahn’71] Clustering via MST (Single-linkage): k clusters: remove π’Œβˆ’πŸ longest edges from MST Maximizes minimum intercluster distance β€œβ€ [Kleinberg, Tardos]

17 Cut Sparsification Two problems: General cut sparsification framework:
Approximating Min-Cut in the graph (up to 1Β±πœ–) Preserving all cuts in the graph (up to 1Β±πœ–) General cut sparsification framework: Sample each edge 𝑒 with probability 𝑝 𝑒 Assign sampled edges weights 1/ 𝑝 𝑒 Expected weight of each cut is preserved, but too many cuts βˆ’ can’t take union bound

18 Cut Sparsification For an edge 𝑒 let πœ† 𝑒 = weight of the minimum cut that contains 𝑒 πœ†= size of the Min-Cut in G Thm [Fung et al.]: If 𝐺 is an undirected weighted graph then if 𝑝 𝑒 β‰₯ min 𝐢 log 2 𝑛 πœ† 𝑒 πœ– 2 ,1 then the cut sparsification alg. preserves weights of all cuts up to (1Β±πœ–) Thm [Karger]: 𝑝 𝑒 β‰₯ min 𝐢 log 𝑛 πœ† πœ– 2 ,1 preserves Min-Cut up to (1Β±πœ–)

19 Minimum Cut Algorithm: For 𝑖 = 0,1,…, 2 log 𝑛 :
Let 𝐺 𝑖 be the subgraph of 𝐺 where each edge is sampled with probability 1/ 2 𝑖 Let 𝐻 𝑖 = F 1 ,…, 𝐹 π‘˜ where π‘˜=𝑂 1 πœ– 2 β‹… log 𝑛 and 𝐹 𝑖 are forests constructed by the k-connectivity alg. Return 2 𝑗 πœ†( 𝐻 𝑗 ) where 𝑗=min⁑{𝑖 :πœ† 𝐻 𝑖 <π‘˜} Space: 𝑂 𝑛 log 4 𝑛 πœ– 2 , works for dynamic graph streams

20 Minimum Cut: Analysis Key property: If 𝐺 𝑖 has β‰€π‘˜ edges across a cut then 𝐻 𝑖 contains all such edges 𝑖 βˆ— = log max 1, πœ† πœ– 2 6 log 𝑛 𝑖≀ 𝑖 βˆ— β‡’ 𝑝 𝑒 β‰₯ min 6 log 𝑛 πœ† πœ– 2 ,1 β‡’ min cut in 𝐺 𝑖 is approximating min-cut in 𝐺 up to (1Β±πœ–) 𝑖= 𝑖 βˆ— : By Chernoff bound # edges in 𝐺 𝑖 βˆ— that crosses min-cut in 𝐺 is 𝑂 1 πœ– 2 log 𝑛 β‰€π‘˜ w.h.p.

21 Cut Sparsification Algorithm: For 𝑖 = 0,1,…, 2 log 𝑛 :
Let 𝐺 𝑖 be the subgraph of 𝐺 where each edge is sampled with probability 1/ 2 𝑖 Let 𝐻 𝑖 = F 1 ,…, 𝐹 π‘˜ where π‘˜=𝑂 1 πœ– 2 β‹… log 2 𝑛 and 𝐹 𝑖 are forests constructed by the k-connectivity alg. For each edge 𝑒 let 𝑗 𝑒 =min i: πœ† 𝑒 𝐻 𝑖 <π‘˜ . If π‘’βˆˆ 𝐻 𝑗 𝑒 then add e to the sparsifier with weight 2 𝑗 𝑒 Space: 𝑂 𝑛 log 5 𝑛 πœ– 2 , works for dynamic graph streams Analysis similar to the Min-Cut using [Fung et al.]


Download ppt "CIS 700: β€œalgorithms for Big Data”"

Similar presentations


Ads by Google