Download presentation
Presentation is loading. Please wait.
Published byClinton Sherman Modified over 9 years ago
1
Clustering with Spectral Norm and the k-means algorithm Ravi Kannan Microsoft Research Bangalore joint work with Amit Kumar (Indian Institute of Technology, Delhi)
2
Mixture of Distributions Given k distributions F 1, …, F k with relative weights w 1,…, w k A point is sampled by first choosing F i with prob. w i and then sampling from it.
3
Mixture of Distributions p(x) = i f i (x ) w i
4
Learning mixture of dist. Goal : Given samples, in d space, from a mixture of k components, (d>>k), learn the mixture. Classify samples. Distributions : Gaussians, heavy-tailed,… Many applications : data-mining, vision,…
5
Notation ii ii ¾ i : maximum variance of the projection of F i along any line ¾ = max i ¾ i
6
Gaussian Components How many standard deviations apart do the means need to be? Early results: function of d,k. But d>>k Later: function of k alone for spherical Gaussians, some non- spherical too Vempala, Wang….
7
Other Distributions Product Distributions : | i j | ¸ ¾ poly(k/ ² ) [Dasgupta et. al. ’05, Chaudhuri, Rao ’08] Planted Partition Model [McSherry ’01] These algorithms are quite different from each other and often quite non-trivial. Gaussians, Planted partition, have exponentially falling tails. What about heavier tails? Say only mean, variance bounded?
8
Our Approach A deterministic separation condition on the samples which will guarantee correct clustering A : set of n points Clustered into k clusters C 1, …, C k i : mean of cluster C i
9
… Our Approach How to model ¾ ? A= x1x1 x2x2 x3x3 xnxn … d C= 11 22 33 nn d Recall : ||T|| = max |x|=1 |Tx| Max _u (Mean squared distance to centroid in direction u)
10
Proximity Condition rr ss x DrDr DsDs contains k, 1/w x 2 C r satisfies proximity condition if the above holds for all s r
11
Proximity Condition rr ss x DrDr DsDs Proximity in projection Proximity in whole space Is much weaker than
12
Our Result rr ss x DrDr DsDs Thm : If all points satisfy proximity condition, then we can correctly classify all the points. Answers an open question posed in [Kannan Vempala ‘09] Much weaker condition
13
Our Result (Approx. version) rr ss x DrDr DsDs Thm : If all but ² fraction of points satisfy proximity condition, then we can correctly classify all but O( ² ) fraction of the points.
14
Our Result rr ss x DrDr DsDs Applies to many settings of learning mixtures. Algorithm : spectral clustering + Lloyd’s k-means shows that Lloyd’s algorithm converges if the initial seed is chosen carefully.
15
Applications : Gaussians [Dasgupta et. al. ’07] rr x DrDr D r · O( ¾ ) with high prob. So inter-center separation of Ω(σ) implies proximity.
16
Applications : Planted Models V1V1 V2V2 V3V3 V= V 1 [ … [ V k k x k matrix P of probabilities 0.10.90.7 0.4 0.2 0.1 Edge between x 2 V i, y 2 V j with prob P ij Goal : Given an instance G, recover the partitions V 1, …, V k
17
Applications : Planted Models V1V1 V2V2 V3V3 A : n x n adjacency matrix Rows of A : points in n-dimensions Each entry of A : ind. Bernoulli r.v. with std. dev. σ ||A-C|| = O( ¾ n 1/2 ) [Wigner] Our algorithm matches best-known results. [McSherry ‘01] So inter-row separation (in C) of Ω(σ) implies proximity. implies
18
Applications : distributions with bounded variance | s - r | ¸ ¾ / ² All but O( ² ) fraction of the points satisfy the proximity condition Do not need product distributions Assuming
19
Algorithm Two key steps. 1. Compute best rank-k approx to the points. Run any constant approx. algorithm for the k-means problem on these projected points. Yields a set of initial candidate means. Message to practioners – ML, Statistics, and Others who knew long before theory that k-means Works: You are right (this time), but do start with ``natural’’ step 1 Starting points.
20
Algorithm Theorem For each original center r, we get an estimated center r close to it from step 1(SVD).
21
Algorithm Step 2 : 1. Let 1, …, k be the current centers. 2. Assign each point to the closer center partitions the points into S 1, …, S k 3. Update new centers as the means of S 1, …, S k Repeat
22
22 Step 2 : Example
23
23 Step 2 : Example
24
24 Step 2 : Example
25
Key Technical Lemma 1. Let 1, …, k be the current centers. 2. Assign each point to the closer center partitions the points into S 1, …, S k 3. Update new centers as the means of S 1, …, S k ´ 1, …, ´ k be the new centers.
26
Misclassified Point 11 1 2 22 4¾4¾ t¾t¾ ±¾ u How many such points ? Otherwise, (A-C).y has at least these many coordinates with value at least u. y But then, |(A-C).y| is more than ¾ n 1/2
27
Misclassified Point Number of misclassified points
28
Misclassified Point Number of misclassified points Removing these points from C 1 shifts the mean by at most ±¾ /t Similarly for addition of misclassified points from other clusters
29
Misclassified Points Mean of misclassified points is at distance at most from overall mean of the cluster. This will shift the mean of remaining points by M
30
Open Problems Weaker proximity conditions that will yield separation between means of C r and C s depending on ¾ r and ¾ s only ? Better dependence on w min ? Distributions with unbounded variance ? How to capture other separation conditions (e.g., [Brubaker Vempala ’08]) ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.