Download presentation
Presentation is loading. Please wait.
Published byKevin Henderson Modified over 6 years ago
1
Grigory Yaroslavtsev http://grigory.us
Clustering on Clusters 2049: Massively Parallel Algorithms for Clustering Graphs and Vectors vs Grigory Yaroslavtsev
2
Clustering on Clusters: Overview
Algorithm design for massively parallel computing Blog: MPC algorithms for graphs Connectivity Correlation clustering MPC algorithms for vectors K-means Single-linkage clustering Open problems and directions
3
Clustering on Clusters 2049: Overview
Graphs Vectors Basic Connectivity Connectivity++ K-means -- Advanced Correlation Clustering Single-Linkage Clustering MST
4
Cluster Computation (a la BSP)
Input: size n (e.g. n = billions of edges in a graph) ๐ด Machines, ๐บ Space (RAM) each Constant overhead in RAM: ๐ดโ
๐บ = ๐(๐) ๐บ = ๐ 1โ๐ , e.g. ๐ = 0.1 or ๐ = 0.5 (๐ด=๐บ=๐( ๐ )) Output: solution to a problem (often size O(๐)) Doesnโt fit in local RAM (๐บโช๐) ๐๐ง๐ฉ๐ฎ๐ญ:size ๐ โ โ ๐๐ฎ๐ญ๐ฉ๐ฎ๐ญ ๐ด machines S space
5
Cluster Computation (a la BSP)
Computation/Communication in ๐น rounds: Every machine performs a near-linear time computation => Total user time ๐( ๐บ ๐+๐(๐) ๐น) Every machine sends/receives at most ๐บ bits of information => Total communication ๐(๐๐น). Goal: Minimize ๐น Ideally: ๐น = constant. ๐ด machines S space โค๐บ bits ๐ถ( ๐บ ๐+๐(๐) ) time
6
MapReduce-style computations
What I wonโt discuss today PRAMs (shared memory, multiple processors) (see e.g. [Karloff, Suri, Vassilvitskiiโ10]) Computing XOR requires ฮฉ ( log ๐ ) rounds in CRCW PRAM Can be done in ๐( log ๐ ๐ ) rounds of MapReduce Pregel-style systems, Distributed Hash Tables (see e.g. Ashish Goelโs class notes and papers) Lower-level implementation details (see e.g. Rajaraman-Leskovec-Ullman book)
7
Models of parallel computation
Bulk-Synchronous Parallel Model (BSP) [Valiant,90] Pro: Most general, generalizes all other models Con: Many parameters, hard to design algorithms Massive Parallel Computation [Andoni, Onak, Nikolov, Y. โ14] [Feldman-Muthukrishnan-Sidiropoulos-Stein-Svitkinaโ07, Karloff-Suri-Vassilvitskiiโ10, Goodrich-Sitchinava-Zhangโ11, ..., Beame, Koutris, Suciuโ13] Pros: Inspired by modern systems (Hadoop, MapReduce, Dryad, Spark, Giraph, โฆ) Few parameters, simple to design algorithms New algorithmic ideas, robust to the exact model specification # Rounds is an information-theoretic measure => can prove unconditional results Con: sometimes not enough to model more complex behavior
8
Business perspective Pricings: https://cloud.google.com/pricing/
~Linear with space and time usage 100 machines: 5K $/year 10000 machines: 0.5M $/year You pay a lot more for using provided algorithms
9
Part 1: Clustering Graphs
Applications: Community detection Fake account detection Deduplication Storage localization โฆ
10
Problem 1: Connectivity
Input: n edges of a graph (arbitrarily partitioned between machines) Output: is the graph connected? (or # of connected components) Question: how many rounds does it take? ๐ 1 ๐ log ๐ผ n ๐( n ๐ผ ) ๐( 2 ๐ผn ) Impossible
11
Algorithm for Connectivity
Version of Boruvkaโs algorithm: All vertices assigned to different components Repeat ๐( log |V| ) times: Each component chooses a neighboring component All pairs of chosen components get merged How to avoid chaining? If the graph of components is bipartite and only one side gets to choose then no chaining Randomly assign components to the sides
12
Algorithms for Graphs: Graph Density
Dense: ๐บโซ|๐|, e.g. ๐บโฅ ๐ 3/2 Semi-dense: ๐บ=ฮ(|๐|) Sparse: ๐บโช ๐ , e.g. ๐บโค ๐ 1/2
13
Algorithms for Graphs: Graph Density
Dense: ๐บโซ|๐|, e.g. ๐บโฅ ๐ 3/2 Linear sketching: one round, see [McGregorโ14] Workshop at Berkeley tomorrow: โFilteringโ [Karloff, Suri Vassilvitskii, SODAโ10; Ene, Im, Moseley, KDDโ11; Lattanzi, Moseley, Suri, Vassilvitskii, SPAAโ11; Suri, Vassilvitskii, WWWโ11]โฆ
14
Algorithms for Graphs: Graph Density
Semi-dense Graphs: ๐บ=ฮ(|๐|) [Avdyukhin, Y.] Run Boruvkaโs algorithm for O( log |๐| ) rounds # Vertices reduces down to |๐| log |๐| Repeat O( log |๐| ) times: Compute a spanning tree of locally stored edges Put log |๐| such trees per machine
15
Algorithms for Graphs: Graph Density
Sparse: ๐บโช ๐ , ๐บโค ๐ 1/2 Sparse graph problems appear hard Big open question: connectivity in o(log |๐|) rounds? Probably no: [Roughgarden, Vassilvitskii, Wangโ16] โOne Cycle vs. Two Cycleโ Problem Distinguish one cycle from two in o(log |๐|) rounds? VS.
16
Other Connectivity Algorithms
[Rastogi,Machanavajjhala,Chitnis, Das Sarmaโ13] D = graph diameter Algorithm MR Rounds Communication per round Hash-Min D ๐( ๐ +|๐ธ|) Hash-to-all Log D ๐( ๐ 2 +|๐ธ|) Hash-to-Min ๐( log |๐| ) for paths ๐ (( ๐ + ๐ธ )) Hash-Greater-to-Min O(log n) O(|V| + |E|)
17
Graph-Dependent Connectivity Algs?
Big question: connectivity in ๐ log ๐ท rounds with ๐ ( ๐ +|๐ธ|) communication per round? [Rastogi et alโ13] conjectured that Hash-to-Min can achieve this [Avdyukhin, Y.โ17]: Hash-to-Min takes ฮฉ ๐ท rounds Open problem: better connectivity algorithms if we parametrize by graph expansion? Other work: [Kiveris et al. โ14]
18
What about clustering? โsame ideas work for Single-Linkage Clustering
Using connectivity as a primitive can preserve cuts in graphs [Benczur, Kargerโ98] Construct a graph with O(n log n) edges All cut sizes are preserved with a factor of 2 Allows to run clustering algorithms that use cuts in the objective using this sparse graph
19
Single Linkage Clustering
[Zahnโ71] Clustering via Minimum Spanning Tree: k clusters: remove ๐โ๐ longest edges from MST Maximizes minimum intercluster distance โโ [Kleinberg, Tardos]
20
Part 2: Clustering Vectors
Input: ๐ฃ 1 ,โฆ, ๐ฃ ๐ โ โ ๐
Feature vectors in ML, word embeddings in NLP, etc. (Implicit) weighted graph of pairwise distances Applications: Same as before + Data visualization
21
Large geometric graphs
Graph algorithms: Dense graphs vs. sparse graphs Dense: ๐บโซ|๐|. Sparse: ๐บโช|๐|. Our setting: Dense graphs, sparsely represented: O(n) space Output doesnโt fit on one machine (๐บโช ๐) Today: (1+๐)-approximate MST [Andoni, Onak, Nikolov, Y.] ๐
=2 (easy to generalize) ๐น= log ๐บ ๐ = O(1) rounds (๐บ= ๐ ๐(๐) )
22
๐(log ๐)-MST in ๐
=๐(log ๐) rounds
Assume points have integer coordinates 0, โฆ,ฮ , where ฮ=๐ ๐ ๐ . Impose an ๐(logโก๐)-depth quadtree Bottom-up: For each cell in the quadtree compute optimum MSTs in subcells Use only one representative from each cell on the next level Wrong representative: O(1)-approximation per level
23
๐๐ณ-nets ๐ณ ๐ณ ๐๐ณ-net for a cell C with side length ๐ณ:
Collection S of vertices in C, every vertex is at distance <= ๐๐ณ from some vertex in S. (Fact: Can efficiently compute ๐-net of size ๐ 1 ๐ 2 ) Bottom-up: For each cell in the quadtree Compute optimum MSTs in subcells Use ๐๐ณ-net from each cell on the next level Idea: Pay only O(๐๐ณ) for an edge cut by cell with side ๐ณ Randomly shift the quadtree: Pr ๐๐ข๐ก ๐๐๐๐ ๐๐ ๐๐๐๐๐กโ โ ๐๐ฆ ๐ณ โผโ/๐ณ โ charge errors Wrong representative: O(1)-approximation per level ๐ณ ๐ณ ๐๐ณ
24
Randomly shifted quadtree
Top cell shifted by a random vector in 0,๐ณ 2 Impose a randomly shifted quadtree (top cell length ๐๐ซ) Bottom-up: For each cell in the quadtree Compute optimum MSTs in subcells Use ๐๐ณ-net from each cell on the next level Pay 5 instead of 4 Pr[๐๐๐ ๐๐ฎ๐ญ] = ๐(1) 2 ๐๐๐ ๐๐ฎ๐ญ 1
25
1+๐ -MST in ๐=๐(log ๐) rounds
Idea: Only use short edges inside the cells Impose a randomly shifted quadtree (top cell length ๐๐ซ ๐ ) Bottom-up: For each node (cell) in the quadtree compute optimum Minimum Spanning Forests in subcells, using edges of length โค๐๐ณ Use only ๐ ๐ ๐ณ-net from each cell on the next level Sketch of analysis ( ๐ป โ = optimum MST): ๐ผ[Extra cost] = ๐ผ[ ๐โ ๐ป โ Pr ๐ ๐๐ ๐๐ข๐ก ๐๐ฆ ๐๐๐๐ ๐ค๐๐กโ ๐ ๐๐๐ ๐ณ โ
๐๐ณ ] โค ๐ log ๐ ๐โ ๐ป โ ๐ ๐ = ๐ log ๐โ
๐๐๐ ๐ก( ๐ป โ ) ๐ณ=๐( ๐ ๐ ) 2 Pr[๐๐๐ ๐๐ฎ๐ญ] = ๐ถ(๐) 1
26
โ 1+๐ -MST in ๐=๐(1) rounds ๐ด = ๐ ฮฉ(1)
๐(log ๐) rounds => O( log ๐บ ๐ ) = O(1) rounds Flatten the tree: ( ๐ด ร ๐ด )-grids instead of (2x2) grids at each level. Impose a randomly shifted ( ๐ด ร ๐ด )-tree Bottom-up: For each node (cell) in the tree compute optimum MSTs in subcells via edges of length โค๐๐ณ Use only ๐ ๐ ๐ณ-net from each cell on the next level ๐ด = ๐ ฮฉ(1) โ
27
Single-Linkage Clustering [Y., Vadapalli]
Q: Single-linkage clustering from (1+๐)-MST? A: No, a fixed edge can be arbitrarily distorted Idea: Run ๐ log ๐ times & collect all (1+๐)-MST edges Compute MST of these edges using Boruvka Use this MST for k-Single Linkage Clustering for all k Overall: O(log n) rounds of MPC instead of O(1) Q: Is this actually necessary? A: Most likely yes, i.e. yes, assuming sparse connectivity is hard
28
Single-Linkage Clustering [Y., Vadapalli]
Conj 1: Sparse Connectivity requires ฮฉ( log |๐| ) Conj 2: โ1 cycle vs. 2 cyclesโ requires ฮฉ( log |๐| ) Under โ ๐ -Distances: Distance Approximation Hardness under Conjecture 1* Hardness under Conjecture 2* Hamming Exact 2 3 โ 1 (1+๐) โ 2 1.41 โ๐ 1.84 โ๐ โ โ
29
Thanks! Questions? Slides will be available on http://grigory.us
More about algorithms for massive data: More in the classes I teach:
30
1+๐ -MST in ๐=๐(1) rounds ๐ซ ๐ท ๐ข,๐ฃ
Theorem: Let ๐ = # levels in a random tree P ๐ผ ๐ท ๐๐๐ โค 1+๐ ๐๐๐
๐๐๐ Proof (sketch): ๐ซ ๐ท (๐ข,๐ฃ) = cell length, which first partitions (๐ข, ๐ฃ) New weights: ๐ ๐ท ๐ข,๐ฃ = ๐ข โ๐ฃ 2 +๐ ๐ซ ๐ท ๐ข,๐ฃ ๐ข โ๐ฃ 2 โค ๐ผ ๐ท [๐ ๐ท ๐ข,๐ฃ ]โค 1+๐ ๐๐๐
๐ข โ๐ฃ 2 Our algorithm implements Kruskal for weights ๐ ๐ท ๐ข ๐ฃ ๐ซ ๐ท ๐ข,๐ฃ
31
Technical Details (1+๐)-MST:
โLoad balancingโ: partition the tree into parts of the same size Almost linear time locally: Approximate Nearest Neighbor data structure [Indykโ99] Dependence on dimension d (size of ๐-net is ๐ ๐
๐ ๐
) Generalizes to bounded doubling dimension
32
Algorithm for Connectivity: Setup
Data: n edges of an undirected graph. Notation: ๐(๐ฃ) โก unique id of ๐ฃ ฮ(๐)ย โก set of neighbors of a subset of verticesย S. Labels: Algorithm assigns a labelย โ(๐ฃ)ย to each ๐ฃ. ๐ฟ ๐ฃ ย โก the set of vertices with the labelย โ(๐ฃ) (invariant: subset of the connected component containingย ๐ฃ). Active vertices: Some vertices will be called active (exactly one per ๐ฟ ๐ฃ ).
33
Algorithm for Connectivity
Mark every vertexย asย activeย and letย โ(๐ฃ)=๐(๐ฃ). For phasesย ๐=1,2,โฆ,๐(logโกn)ย do: Call eachย activeย vertex aย leaderย with probabilityย 1/2. Ifย vย is a leader, mark all vertices inย ๐ฟ ๐ฃ ย asย leaders. For everyย active non-leaderย vertexย w, find the smallestย leader (byย ๐) vertexย w โ in ฮ( ๐ฟ ๐ค ). Markย wย passive, relabel each vertex with labelย wย byย w โ . Output: set of connected components based on ย โ.
34
Algorithm for Connectivity: Analysis
Ifย โ(๐ข)=โ(๐ฃ)ย thenย ๐ข andย ๐ฃย are in the same CC. Claim: Unique labels with high probability afterย ๐( log ๐ )ย phases. For every CC # active vertices reduces by a constant factor in every phase. Half of the active vertices declared as non-leaders. Fix an active non-leader vertexย ๐. If at least two different labels in the CC ofย vย then there is an edgeย (๐โฒ,๐)ย such thatย โ(๐)=โ(๐โฒ)ย and โ(๐โฒ)โ โ(๐). ๐ย marked as a leader with probabilityย 1/2 โ half of the active non-leader vertices will change their label. Overall, expectย 1/4ย of labels to disappear. Afterย ๐( log ๐ )ย phases # of active labels in every connected component will drop to one with high probability
35
Algorithm for Connectivity: Implementation Details
Distributed data structure of size ๐ ๐ to maintain labels, ids, leader/non-leader status, etc. O(1) rounds per stage to update the data structure Edges stored locally with all auxiliary info Between stages: use distributed data structure to update local info on edges For everyย active non-leaderย vertexย w, find the smallestย leader (w.r.tย ๐) vertexย w โ โฮ( ๐ฟ ๐ค ) Each (non-leader, leader) edge sends an update to the distributed data structure Much faster with Distributed Hash Table Service (DHT) [Kiveris, Lattanzi, Mirrokni, Rastogi, Vassilvitskiiโ14]
36
โ Problem 3: K-means Input: ๐ฃ 1 ,โฆ, ๐ฃ ๐ โ โ ๐
Find ๐ centers ๐ 1 ,โฆ, ๐ ๐ Minimize sum of squared distance to the closest center: ๐=1 ๐ min ๐=1 ๐ || ๐ฃ ๐ โ ๐ ๐ | | 2 2 || ๐ฃ ๐ โ ๐ ๐ | | 2 2 = ๐ก=1 ๐
๐ฃ ๐๐ก โ ๐ ๐๐ก 2 NP-hard โ
37
K-means++ [Arthur,Vassilvitskiiโ07]
๐ถ={ ๐ 1 ,โฆ, ๐ ๐ก } (collection of centers) ๐ 2 ๐ฃ,๐ถ = min ๐=1 ๐ ||๐ฃโ ๐ ๐ | | 2 2 K-means++ algorithm (gives ๐ log ๐ -approximation): Pick ๐ 1 uniformly at random from the data Pick centers ๐ 2 โฆ, ๐ ๐ sequentially from the distribution where point ๐ฃ has probability ๐ 2 ๐ฃ,๐ถ ๐=1 ๐ ๐ 2 ( ๐ฃ ๐ , ๐ถ)
38
K-means|| [Bahmani et al. โ12]
Pick ๐ถ=๐ 1 uniformly at random from data Initial cost: ๐= ๐=1 ๐ ๐ 2 ( ๐ฃ ๐ , ๐ 1 ) Do ๐( log ๐ ) times: Add ๐ ๐ centers from the distribution where point ๐ฃ has probability ๐ 2 ๐ฃ,๐ถ ๐=1 ๐ ๐ 2 ( ๐ฃ ๐ , ๐ถ) Solve k-means for these O(๐ log ๐ ) points locally Thm. If final step gives ๐ถ-approximation โ ๐(๐ถ)-approximation overall
39
Problem 2: Correlation Clustering
Inspired by machine learning at Practice: [Cohen, McCallum โ01, Cohen, Richman โ02] Theory: [Blum, Bansal, Chawla โ04]
40
Correlation Clustering: Example
Minimize # of incorrectly classified pairs: # Covered non-edges + # Non-covered edges 4 incorrectly classified = 1 covered non-edge + 3 non-covered edges
41
Approximating Correlation Clustering
Minimize # of incorrectly classified pairs โ20000-approximation [Blum, Bansal, Chawlaโ04] [Demaine, Emmanuel, Fiat, Immorlicaโ04],[Charikar, Guruswami, Wirthโ05], [Ailon, Charikar, Newmanโ05] [Williamson, van Zuylenโ07], [Ailon, Libertyโ08],โฆ โ2-approximation [Chawla, Makarychev, Schramm, Y. โ15] Maximize # of correctly classified pairs (1โ๐)-approximation [Blum, Bansal, Chawlaโ04]
42
Correlation Clustering
One of the most successful clustering methods: Only uses qualitative information about similarities # of clusters unspecified (selected to best fit data) Applications: document/image deduplication (data from crowds or black-box machine learning) NP-hard [Bansal, Blum, Chawla โ04], admits simple approximation algorithms with good provable guarantees
43
Correlation Clustering
More: Survey [Wirth] KDDโ14 tutorial: โCorrelation Clustering: From Theory to Practiceโ [Bonchi, Garcia-Soriano, Liberty] Wikipedia article:
44
Data-Based Randomized Pivoting
3-approximation (expected) [Ailon, Charikar, Newman] Algorithm: Pick a random pivot vertex ๐ Make a cluster ๐โช๐(๐), where ๐ ๐ is the set of neighbors of ๐ Remove the cluster from the graph and repeat
45
Data-Based Randomized Pivoting
Pick a random pivot vertex ๐ Make a cluster ๐โช๐(๐), where ๐ ๐ is the set of neighbors of ๐ Remove the cluster from the graph and repeat 8 incorrectly classified = 2 covered non-edges + 6 non-covered edges
46
Parallel Pivot Algorithm
(3+๐)-approx. in ๐( log 2 ๐ / ๐) rounds [Chierichetti, Dalvi, Kumar, KDDโ14] Algorithm: while the graph is not empty ๐ซ= current maximum degree Activate each node independently with prob. ๐/๐ซ Deactivate nodes connected to other active nodes The remaining nodes are pivots Create cluster around each pivot as before Remove the clusters
47
Parallel Pivot Algorithm: Analysis
Fact: Halves max degree after 1 ๐ log ๐ rounds โ terminates in O log 2 ๐ ๐ rounds Fact: Activation process induces close to uniform marginal distribution of the pivots โ analysis similar to regular pivot gives (3+๐)-approximation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.