Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Algorithms for Geometric Graph Problems

Similar presentations


Presentation on theme: "Parallel Algorithms for Geometric Graph Problems"โ€” Presentation transcript:

1 Parallel Algorithms for Geometric Graph Problems
Grigory Yaroslavtsev + /blog/ STOC 2014, joint work with Alexandr Andoni, Krzysztof Onak and Aleksandar Nikolov.

2 Approximating Geometric Problems in Parallel Models
Geometric graph (implicit): Euclidean distances between n points in โ„ ๐’… Already have solutions for old NP-hard problems (Traveling Salesman, Steiner Tree, etc.) Minimum Spanning Tree (clustering, vision) Minimum Cost Bichromatic Matching (vision) Question: do we really have an O(1)-round algorithm for TSP and Steiner Tree?

3 Geometric Graph Problems
Combinatorial problems on graphs in โ„ ๐’… Need new theory! Polynomial time (โ€œeasyโ€) Minimum Spanning Tree Earth-Mover Distance = Min Weight Bi-chromatic Matching NP-hard (โ€œhardโ€) Steiner Tree Traveling Salesman Clustering (k-medians, facility location, etc.) Arora-Mitchell-style โ€œDivide and Conquerโ€, easy to implement in Massively Parallel Computational Models

4 MST: Single Linkage Clustering
[Zahnโ€™71] Clustering via MST (Single-linkage): k clusters: remove ๐’Œโˆ’๐Ÿ longest edges from MST Maximizes minimum intercluster distance โ€œโ€ [Kleinberg, Tardos]

5 Earth-Mover Distance Computer vision: compare two pictures of moving objects (stars, MRI scans)

6 Computational Model โ‡’ ๐Ž๐ฎ๐ญ๐ฉ๐ฎ๐ญ:๐‘ ๐‘–๐‘ง๐‘’ ๐‘ถ(๐’)
Input: n points in a d-dimensional space (d constant) ๐‘ด machines, space ๐‘บ on each (๐‘บ = ๐’ ๐›ผ , 0<๐›ผ<1 ) Constant overhead in total space: ๐‘ดโ‹…๐‘บ = ๐‘‚(๐’) Output: solution to a geometric problem (size O(๐’)) Doesnโ€™t fit on a single machine (๐‘บโ‰ช ๐’) ๐ˆ๐ง๐ฉ๐ฎ๐ญ: ๐’ points โ‡’ โ‡’ ๐Ž๐ฎ๐ญ๐ฉ๐ฎ๐ญ:๐‘ ๐‘–๐‘ง๐‘’ ๐‘ถ(๐’) ๐‘ด machines S space

7 Computational Model ๐‘ด machines ๐‘ถ( ๐‘บ ๐Ÿ+๐’(๐Ÿ) ) time S space
Computation/Communication in ๐‘น rounds: Every machine performs a near-linear time computation => Total running time ๐‘‚( ๐’ ๐Ÿ+๐’(๐Ÿ) ๐‘น) Every machine sends/receives at most ๐‘บ bits of information => Total communication ๐‘‚(๐’๐‘น). Goal: Minimize ๐‘น Our work: ๐‘น = constant. ๐‘ด machines S space โ‰ค๐‘บ bits ๐‘ถ( ๐‘บ ๐Ÿ+๐’(๐Ÿ) ) time

8 MapReduce-style computations
What I wonโ€™t discuss today PRAMs (shared memory, multiple processors) (see e.g. [Karloff, Suri, Vassilvitskiiโ€˜10]) Computing XOR requires ฮฉ ( log ๐‘› ) rounds in CRCW PRAM Can be done in ๐‘‚( log ๐’” ๐‘› ) rounds of MapReduce Pregel-style systems, Distributed Hash Tables (see e.g. Ashish Goelโ€™s class notes and papers) Lower-level implementation details (see e.g. Rajaraman-Leskovec-Ullman book)

9 Models of parallel computation
Bulk-Synchronous Parallel Model (BSP) [Valiant,90] Pro: Most general, generalizes all other models Con: Many parameters, hard to design algorithms Massive Parallel Computation [Feldman-Muthukrishnan-Sidiropoulos-Stein-Svitkinaโ€™07, Karloff-Suri-Vassilvitskiiโ€™10, Goodrich-Sitchinava-Zhangโ€™11, ..., Beame, Koutris, Suciuโ€™13] Pros: Inspired by modern systems (Hadoop, MapReduce, Dryad, โ€ฆ ) Few parameters, simple to design algorithms New algorithmic ideas, robust to the exact model specification # Rounds is an information-theoretic measure => can prove unconditional lower bounds Between linear sketching and streaming with sorting

10 Previous work Dense graphs vs. sparse graphs
Dense: ๐‘บโ‰ซ๐’ (or ๐‘บโ‰ซ solution size) โ€œFilteringโ€ (Output fits on a single machine) [Karloff, Suri Vassilvitskii, SODAโ€™10; Ene, Im, Moseley, KDDโ€™11; Lattanzi, Moseley, Suri, Vassilvitskii, SPAAโ€™11; Suri, Vassilvitskii, WWWโ€™11] Sparse: ๐‘บโ‰ช ๐’ (or ๐‘บโ‰ช solution size) Sparse graph problems appear hard (Big open question: (s,t)-connectivity in o(log ๐‘›) rounds?) VS.

11 Large geometric graphs
Graph algorithms: Dense graphs vs. sparse graphs Dense: ๐‘บโ‰ซ๐’. Sparse: ๐‘บโ‰ช ๐’. Our setting: Dense graphs, sparsely represented: O(n) space Output doesnโ€™t fit on one machine (๐‘บโ‰ช ๐’) Today: (1+๐œ–)-approximate MST ๐’…=2 (easy to generalize) ๐‘น= log ๐‘บ ๐’ = O(1) rounds (๐‘บ= ๐’ ๐›€(๐Ÿ) )

12 ๐‘‚(log ๐‘›)-MST in ๐‘…=๐‘‚(log ๐‘›) rounds
Assume points have integer coordinates 0, โ€ฆ,ฮ” , where ฮ”=๐‘‚ ๐’ ๐Ÿ . Impose an ๐‘‚(logโก๐’)-depth quadtree Bottom-up: For each cell in the quadtree compute optimum MSTs in subcells Use only one representative from each cell on the next level Wrong representative: O(1)-approximation per level

13 ๐๐‘ณ-nets ๐‘ณ ๐‘ณ ๐๐‘ณ-net for a cell C with side length ๐‘ณ:
Collection S of vertices in C, every vertex is at distance <= ๐๐‘ณ from some vertex in S. (Fact: Can efficiently compute ๐-net of size ๐‘‚ 1 ๐ 2 ) Bottom-up: For each cell in the quadtree Compute optimum MSTs in subcells Use ๐๐‘ณ-net from each cell on the next level Idea: Pay only O(๐๐‘ณ) for an edge cut by cell with side ๐‘ณ Randomly shift the quadtree: Pr ๐‘๐‘ข๐‘ก ๐‘’๐‘‘๐‘”๐‘’ ๐‘œ๐‘“ ๐‘™๐‘’๐‘›๐‘”๐‘กโ„Ž โ„“ ๐‘๐‘ฆ ๐‘ณ โˆผโ„“/๐‘ณ โ€“ charge errors Wrong representative: O(1)-approximation per level ๐‘ณ ๐‘ณ ๐œ–๐‘ณ

14 Randomly shifted quadtree
Top cell shifted by a random vector in 0,๐‘ณ 2 Impose a randomly shifted quadtree (top cell length ๐Ÿ๐šซ) Bottom-up: For each cell in the quadtree Compute optimum MSTs in subcells Use ๐๐‘ณ-net from each cell on the next level Pay 5 instead of 4 Pr[๐๐š๐ ๐‚๐ฎ๐ญ] = ๐›€(1) 2 ๐๐š๐ ๐‚๐ฎ๐ญ 1

15 1+๐ -MST in ๐‘=๐‘‚(log ๐‘›) rounds
Idea: Only use short edges inside the cells Impose a randomly shifted quadtree (top cell length ๐Ÿ๐šซ ๐ ) Bottom-up: For each node (cell) in the quadtree compute optimum Minimum Spanning Forests in subcells, using edges of length โ‰ค๐๐‘ณ Use only ๐ ๐Ÿ ๐‘ณ-net from each cell on the next level Sketch of analysis ( ๐‘ป โˆ— = optimum MST): ๐”ผ[Extra cost] = ๐”ผ[ ๐’†โˆˆ ๐‘ป โˆ— Pr ๐’† ๐‘–๐‘  ๐‘๐‘ข๐‘ก ๐‘๐‘ฆ ๐‘๐‘’๐‘™๐‘™ ๐‘ค๐‘–๐‘กโ„Ž ๐‘ ๐‘–๐‘‘๐‘’ ๐‘ณ โ‹… ๐๐‘ณ ] โ‰ค ๐ log ๐’ ๐’†โˆˆ ๐‘ป โˆ— ๐‘‘ ๐’† = ๐ log ๐’โ‹…๐‘๐‘œ๐‘ ๐‘ก( ๐‘ป โˆ— ) ๐‘ณ=๐›€( ๐Ÿ ๐ ) 2 Pr[๐๐š๐ ๐‚๐ฎ๐ญ] = ๐‘ถ(๐) 1

16 โ‡’ 1+๐ -MST in ๐‘=๐‘‚(1) rounds ๐’Ž = ๐’ ฮฉ(1)
๐‘‚(log ๐’) rounds => O( log ๐‘บ ๐’ ) = O(1) rounds Flatten the tree: ( ๐’Ž ร— ๐’Ž )-grids instead of (2x2) grids at each level. Impose a randomly shifted ( ๐’Ž ร— ๐’Ž )-tree Bottom-up: For each node (cell) in the tree compute optimum MSTs in subcells via edges of length โ‰ค๐๐‘ณ Use only ๐ ๐Ÿ ๐‘ณ-net from each cell on the next level ๐’Ž = ๐’ ฮฉ(1) โ‡’

17 1+๐ -MST in ๐‘=๐‘‚(1) rounds ๐šซ ๐‘ท ๐‘ข,๐‘ฃ
Theorem: Let ๐’ = # levels in a random tree P ๐”ผ ๐‘ท ๐€๐‹๐† โ‰ค 1+๐‘‚ ๐๐’๐’… ๐Ž๐๐“ Proof (sketch): ๐šซ ๐‘ท (๐‘ข,๐‘ฃ) = cell length, which first partitions (๐‘ข, ๐‘ฃ) New weights: ๐’˜ ๐‘ท ๐‘ข,๐‘ฃ = ๐‘ข โˆ’๐‘ฃ 2 +๐ ๐šซ ๐‘ท ๐‘ข,๐‘ฃ ๐‘ข โˆ’๐‘ฃ 2 โ‰ค ๐”ผ ๐‘ท [๐’˜ ๐‘ท ๐‘ข,๐‘ฃ ]โ‰ค 1+๐‘‚ ๐๐’๐’… ๐‘ข โˆ’๐‘ฃ 2 Our algorithm implements Kruskal for weights ๐’˜ ๐‘ท ๐‘ข ๐‘ฃ ๐šซ ๐‘ท ๐‘ข,๐‘ฃ

18 โ€œSolve-And-Sketchโ€ Framework
(1+๐œ–)-MST: โ€œLoad balancingโ€: partition the tree into parts of the same size Almost linear time: Approximate Nearest Neighbor data structure [Indykโ€™99] Dependence on dimension d (size of ๐-net is ๐‘‚ ๐’… ๐ ๐’… ) Generalizes to bounded doubling dimension Basic version is teachable (Jelani Nelsonโ€™s ``Big Dataโ€™โ€™ class at Harvard) Implementation in progressโ€ฆ

19 โ€œSolve-And-Sketchโ€ Framework
(1+๐œ–)-Earth-Mover Distance, Transportation Cost No simple โ€œdivide-and-conquerโ€ Arora-Mitchell-style algorithm (unlike for general matching) Only recently sequential 1+๐œ– -apprxoimation in ๐‘‚ ๐œ– ๐’ log ๐‘‚ 1 ๐’ time [Sharathkumar, Agarwal โ€˜12] Our approach (convex sketching): Switch to the flow-based version In every cell, send the flow to the closest net-point until we can connect the net points

20 โ€œSolve-And-Sketchโ€ Framework
Convex sketching the cost function for ๐‰ net points ๐น: โ„ ๐‰โˆ’1 โ†’โ„ = the cost of routing fixed amounts of flow through the net points Function ๐นโ€™=๐น + โ€œnormalizationโ€ is monotone, convex and Lipschitz, (1+๐)-approximates ๐น We can (1+๐)-sketch it using a lower convex hull

21 Thank you! http://grigory.us
Open problems: Exetension to high dimensions? Probably no, reduce from connectivity => conditional lower bound : ฮฉ log ๐‘› rounds for MST in โ„“ โˆž ๐‘› The difficult setting is ๐‘‘=ฮ˜( log ๐’ ) (can do JL) Streaming alg for EMD and Transporation Cost? Our work: First near-linear time algorithm for Transportation Cost Is it possible to reconstruct the solution itself?


Download ppt "Parallel Algorithms for Geometric Graph Problems"

Similar presentations


Ads by Google