Download presentation
Presentation is loading. Please wait.
Published byLorin Sibyl Atkins Modified over 8 years ago
1
Clustering – Definition and Basic Algorithms Seminar on Geometric Approximation Algorithms, spring 11/12
2
motivation
4
Metric spaces Pair (χ, d) χ is a set d : χ x χ → [0, ∞) – d(x, y) == 0 iff x =y – d(x, y) = d(y, x) – d(x, y) + d(y, z) ≥ d(x, z) Example – R² with regular Euclidean distance
5
Norm L P - norm L 1 - norm L ∞ - norm L 2 - norm regular Euclidean distance
6
norms For any point and s > t Intuition
7
The clustering problem
8
Cont’d Metric space (χ, d) P χ – set of n points C – set of centers Every point from P assigned to its nearest neighbor from C. All the points of P that are assigned to a center c from denote by,
9
Cont’d The center set C partition P into clusters, this partition is known as a Voronoi partition let
10
K-center clustering |C| = k, k is the input We want to minimize The opt solution is The solution is This problem is NP-Hard
11
The greedy clustering algorithm The first iteration
12
The greedy clustering algorithm The second iteration
13
The greedy clustering algorithm The end (for k = 3)
14
The greedy clustering algorithm Picking an arbitrary point and setting Do (k – 1) times (I = 1 to k-1) – – realized this equation –
15
Cont’d To do this algorithm slightly faster This algorithm is O(n*k)
16
2 - approximation We have C, a set of c centers. 2 – approximation means that
17
The proof The distance between any pair of points in D is at least
18
Cont’d Assume for the sake of contradiction The optimal solution cover P by k balls with radius None of this balls can cover two points of The optimal solution can’t cover D because Contradiction, so
19
The greedy permutation If n =k, is a permutation of P If we take the radiuses we can say that all points in P are within a distance at most from
20
r-packing
21
A set is r-packing for P if – Covering – Separation At every iteration, i, at greedy permutation (n=k) we have that is packing for P Proof –
22
K-median clustering |C| = k, k is the input We want to minimize The opt solution is The solution is
23
Claim For any set P of n points and parameter k Proof – For any
24
Proof- cont’d Let |C| = k, realizing Let |D| = k, realizing
25
2n - approximation The previous algorithm that computes a set L of k centers is 2n-approximation to this problem Proof
26
Improving it - algLocalSearchKMed Let 0 < τ < 1 After the previous alg’ we have – We checks if the current solution can be improved by replacing one of the center by a center from the outside (swap) – If then – Stop if there is no efficient swap
27
Running time The previous alg’ is O(nk) An iteration required O(nk) swap. The price of every swap is O(nk). We have at most The total time is
28
The constant approximation Define nn(p, X) is the nearest neighbor to p in X For a point let be its optimal center, and let Let be the modified partition of P by the function Let be the price of this reassignment
29
Lemma 1-
30
Some definition we mapped every center from to it’s nearest neighbor in. If deg(c) = 0 then c called drifter. If deg(c) = 1 then c called anchor. If deg(c) > 1 then c called tyrant. For we define
31
Cont’d For – Optimal price – Local price let be the set of all centers of that are assigned to tyrants\anchor by nn(, L) Let D be the set of all drifters in L
32
Lemma 2 If is a drifter and o is any center of then
33
Lemma 3 proof
34
Lemma 4 We have that Proof c – with the lowest ransom(c)
35
Lemma 5 We have that
36
Constant approximation Proof
37
conclusion Let P be a set of n points in a metric space. For 0<ε<1, one can compute a (5+ ε) app’ to the optimal k-median clustering of P. the running time of this algorithm is
38
K-means clustering Same as before but for The algorithm is same to before and compute. (25+ ε)-app’. It’s running time is
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.