Download presentation
Presentation is loading. Please wait.
Published bySudomo Rachman Modified over 6 years ago
1
Parallel Algorithms for Geometric Graph Problems
Grigory Yaroslavtsev + /blog/ STOC 2014, joint work with Alexandr Andoni, Krzysztof Onak and Aleksandar Nikolov.
2
Approximating Geometric Problems in Parallel Models
Geometric graph (implicit): Euclidean distances between n points in โ ๐
Already have solutions for old NP-hard problems (Traveling Salesman, Steiner Tree, etc.) Minimum Spanning Tree (clustering, vision) Minimum Cost Bichromatic Matching (vision) Question: do we really have an O(1)-round algorithm for TSP and Steiner Tree?
3
Geometric Graph Problems
Combinatorial problems on graphs in โ ๐
Need new theory! Polynomial time (โeasyโ) Minimum Spanning Tree Earth-Mover Distance = Min Weight Bi-chromatic Matching NP-hard (โhardโ) Steiner Tree Traveling Salesman Clustering (k-medians, facility location, etc.) Arora-Mitchell-style โDivide and Conquerโ, easy to implement in Massively Parallel Computational Models
4
MST: Single Linkage Clustering
[Zahnโ71] Clustering via MST (Single-linkage): k clusters: remove ๐โ๐ longest edges from MST Maximizes minimum intercluster distance โโ [Kleinberg, Tardos]
5
Earth-Mover Distance Computer vision: compare two pictures of moving objects (stars, MRI scans)
6
Computational Model โ ๐๐ฎ๐ญ๐ฉ๐ฎ๐ญ:๐ ๐๐ง๐ ๐ถ(๐)
Input: n points in a d-dimensional space (d constant) ๐ด machines, space ๐บ on each (๐บ = ๐ ๐ผ , 0<๐ผ<1 ) Constant overhead in total space: ๐ดโ
๐บ = ๐(๐) Output: solution to a geometric problem (size O(๐)) Doesnโt fit on a single machine (๐บโช ๐) ๐๐ง๐ฉ๐ฎ๐ญ: ๐ points โ โ ๐๐ฎ๐ญ๐ฉ๐ฎ๐ญ:๐ ๐๐ง๐ ๐ถ(๐) ๐ด machines S space
7
Computational Model ๐ด machines ๐ถ( ๐บ ๐+๐(๐) ) time S space
Computation/Communication in ๐น rounds: Every machine performs a near-linear time computation => Total running time ๐( ๐ ๐+๐(๐) ๐น) Every machine sends/receives at most ๐บ bits of information => Total communication ๐(๐๐น). Goal: Minimize ๐น Our work: ๐น = constant. ๐ด machines S space โค๐บ bits ๐ถ( ๐บ ๐+๐(๐) ) time
8
MapReduce-style computations
What I wonโt discuss today PRAMs (shared memory, multiple processors) (see e.g. [Karloff, Suri, Vassilvitskiiโ10]) Computing XOR requires ฮฉ ( log ๐ ) rounds in CRCW PRAM Can be done in ๐( log ๐ ๐ ) rounds of MapReduce Pregel-style systems, Distributed Hash Tables (see e.g. Ashish Goelโs class notes and papers) Lower-level implementation details (see e.g. Rajaraman-Leskovec-Ullman book)
9
Models of parallel computation
Bulk-Synchronous Parallel Model (BSP) [Valiant,90] Pro: Most general, generalizes all other models Con: Many parameters, hard to design algorithms Massive Parallel Computation [Feldman-Muthukrishnan-Sidiropoulos-Stein-Svitkinaโ07, Karloff-Suri-Vassilvitskiiโ10, Goodrich-Sitchinava-Zhangโ11, ..., Beame, Koutris, Suciuโ13] Pros: Inspired by modern systems (Hadoop, MapReduce, Dryad, โฆ ) Few parameters, simple to design algorithms New algorithmic ideas, robust to the exact model specification # Rounds is an information-theoretic measure => can prove unconditional lower bounds Between linear sketching and streaming with sorting
10
Previous work Dense graphs vs. sparse graphs
Dense: ๐บโซ๐ (or ๐บโซ solution size) โFilteringโ (Output fits on a single machine) [Karloff, Suri Vassilvitskii, SODAโ10; Ene, Im, Moseley, KDDโ11; Lattanzi, Moseley, Suri, Vassilvitskii, SPAAโ11; Suri, Vassilvitskii, WWWโ11] Sparse: ๐บโช ๐ (or ๐บโช solution size) Sparse graph problems appear hard (Big open question: (s,t)-connectivity in o(log ๐) rounds?) VS.
11
Large geometric graphs
Graph algorithms: Dense graphs vs. sparse graphs Dense: ๐บโซ๐. Sparse: ๐บโช ๐. Our setting: Dense graphs, sparsely represented: O(n) space Output doesnโt fit on one machine (๐บโช ๐) Today: (1+๐)-approximate MST ๐
=2 (easy to generalize) ๐น= log ๐บ ๐ = O(1) rounds (๐บ= ๐ ๐(๐) )
12
๐(log ๐)-MST in ๐
=๐(log ๐) rounds
Assume points have integer coordinates 0, โฆ,ฮ , where ฮ=๐ ๐ ๐ . Impose an ๐(logโก๐)-depth quadtree Bottom-up: For each cell in the quadtree compute optimum MSTs in subcells Use only one representative from each cell on the next level Wrong representative: O(1)-approximation per level
13
๐๐ณ-nets ๐ณ ๐ณ ๐๐ณ-net for a cell C with side length ๐ณ:
Collection S of vertices in C, every vertex is at distance <= ๐๐ณ from some vertex in S. (Fact: Can efficiently compute ๐-net of size ๐ 1 ๐ 2 ) Bottom-up: For each cell in the quadtree Compute optimum MSTs in subcells Use ๐๐ณ-net from each cell on the next level Idea: Pay only O(๐๐ณ) for an edge cut by cell with side ๐ณ Randomly shift the quadtree: Pr ๐๐ข๐ก ๐๐๐๐ ๐๐ ๐๐๐๐๐กโ โ ๐๐ฆ ๐ณ โผโ/๐ณ โ charge errors Wrong representative: O(1)-approximation per level ๐ณ ๐ณ ๐๐ณ
14
Randomly shifted quadtree
Top cell shifted by a random vector in 0,๐ณ 2 Impose a randomly shifted quadtree (top cell length ๐๐ซ) Bottom-up: For each cell in the quadtree Compute optimum MSTs in subcells Use ๐๐ณ-net from each cell on the next level Pay 5 instead of 4 Pr[๐๐๐ ๐๐ฎ๐ญ] = ๐(1) 2 ๐๐๐ ๐๐ฎ๐ญ 1
15
1+๐ -MST in ๐=๐(log ๐) rounds
Idea: Only use short edges inside the cells Impose a randomly shifted quadtree (top cell length ๐๐ซ ๐ ) Bottom-up: For each node (cell) in the quadtree compute optimum Minimum Spanning Forests in subcells, using edges of length โค๐๐ณ Use only ๐ ๐ ๐ณ-net from each cell on the next level Sketch of analysis ( ๐ป โ = optimum MST): ๐ผ[Extra cost] = ๐ผ[ ๐โ ๐ป โ Pr ๐ ๐๐ ๐๐ข๐ก ๐๐ฆ ๐๐๐๐ ๐ค๐๐กโ ๐ ๐๐๐ ๐ณ โ
๐๐ณ ] โค ๐ log ๐ ๐โ ๐ป โ ๐ ๐ = ๐ log ๐โ
๐๐๐ ๐ก( ๐ป โ ) ๐ณ=๐( ๐ ๐ ) 2 Pr[๐๐๐ ๐๐ฎ๐ญ] = ๐ถ(๐) 1
16
โ 1+๐ -MST in ๐=๐(1) rounds ๐ = ๐ ฮฉ(1)
๐(log ๐) rounds => O( log ๐บ ๐ ) = O(1) rounds Flatten the tree: ( ๐ ร ๐ )-grids instead of (2x2) grids at each level. Impose a randomly shifted ( ๐ ร ๐ )-tree Bottom-up: For each node (cell) in the tree compute optimum MSTs in subcells via edges of length โค๐๐ณ Use only ๐ ๐ ๐ณ-net from each cell on the next level ๐ = ๐ ฮฉ(1) โ
17
1+๐ -MST in ๐=๐(1) rounds ๐ซ ๐ท ๐ข,๐ฃ
Theorem: Let ๐ = # levels in a random tree P ๐ผ ๐ท ๐๐๐ โค 1+๐ ๐๐๐
๐๐๐ Proof (sketch): ๐ซ ๐ท (๐ข,๐ฃ) = cell length, which first partitions (๐ข, ๐ฃ) New weights: ๐ ๐ท ๐ข,๐ฃ = ๐ข โ๐ฃ 2 +๐ ๐ซ ๐ท ๐ข,๐ฃ ๐ข โ๐ฃ 2 โค ๐ผ ๐ท [๐ ๐ท ๐ข,๐ฃ ]โค 1+๐ ๐๐๐
๐ข โ๐ฃ 2 Our algorithm implements Kruskal for weights ๐ ๐ท ๐ข ๐ฃ ๐ซ ๐ท ๐ข,๐ฃ
18
โSolve-And-Sketchโ Framework
(1+๐)-MST: โLoad balancingโ: partition the tree into parts of the same size Almost linear time: Approximate Nearest Neighbor data structure [Indykโ99] Dependence on dimension d (size of ๐-net is ๐ ๐
๐ ๐
) Generalizes to bounded doubling dimension Basic version is teachable (Jelani Nelsonโs ``Big Dataโโ class at Harvard) Implementation in progressโฆ
19
โSolve-And-Sketchโ Framework
(1+๐)-Earth-Mover Distance, Transportation Cost No simple โdivide-and-conquerโ Arora-Mitchell-style algorithm (unlike for general matching) Only recently sequential 1+๐ -apprxoimation in ๐ ๐ ๐ log ๐ 1 ๐ time [Sharathkumar, Agarwal โ12] Our approach (convex sketching): Switch to the flow-based version In every cell, send the flow to the closest net-point until we can connect the net points
20
โSolve-And-Sketchโ Framework
Convex sketching the cost function for ๐ net points ๐น: โ ๐โ1 โโ = the cost of routing fixed amounts of flow through the net points Function ๐นโ=๐น + โnormalizationโ is monotone, convex and Lipschitz, (1+๐)-approximates ๐น We can (1+๐)-sketch it using a lower convex hull
21
Thank you! http://grigory.us
Open problems: Exetension to high dimensions? Probably no, reduce from connectivity => conditional lower bound : ฮฉ log ๐ rounds for MST in โ โ ๐ The difficult setting is ๐=ฮ( log ๐ ) (can do JL) Streaming alg for EMD and Transporation Cost? Our work: First near-linear time algorithm for Transportation Cost Is it possible to reconstruct the solution itself?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.