Presentation is loading. Please wait.

Presentation is loading. Please wait.

K-Means and variants Rahul K Mishra Guide: Prof. G. Ramakrishnan.

Similar presentations


Presentation on theme: "K-Means and variants Rahul K Mishra Guide: Prof. G. Ramakrishnan."— Presentation transcript:

1 K-Means and variants Rahul K Mishra Guide: Prof. G. Ramakrishnan

2 Papers 1.A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: A review. ACM Computing Surveys, 31(3):264–323, 1999. 2.Y. Zhao and G. Karypis. Criterion functions for document clustering: Experiments and analysis. Technical Report TR#01–40, Department of Computer Science, University ofMinnesota, Minneapolis, MN, 2001. 3.J. Kleinberg, “An impossibility theorem for clustering,” in Proc. 2002 Conf. Advances in Neural Information Processing Systems, vol. 15, 2002, pp. 463–470. 4.Bradley, P. S., Bennett, K. P., & Demiriz, A. (2000). Constrained k-means clustering (Technical Report MSR-TR- 2000-65). Microsoft Research, Redmond, WA. 5.K. Wagstaff, S. Rogers, and S. Schroedl, “Constrained K- means clustering with background knowledge,” in Proc. 8th Int. Conf. Machine Learning, 2001, pp. 577–584.

3 Papers continued… 6.Sugato Basu. Semi-supervised Clustering: Probabilistic Models, Algorithms and Experiments. Ph.D. thesis, Department of Computer Sciences, University of Texas at Austin, 2005. 7.Klein, D., Kamvar, S. D., & Manning, C. (2002). From instance- level constraints to space level constraints: Making the most of prior knowledge in data clustering. In Proceedings of the The Nineteenth International Conference on Machine Learning (ICML-2002), pp. 307–314 Sydney, Australia. 8.Wagstaff, K., & Cardie, C. (2000). Clustering with instance- level constraints. The Seventeenth International Conference on Machine Learning (pp. 1103–1110). 9.Kulis et al. Semi-supervised Graph Clustering: A Kernel Approach

4 Presentation references http://www.stanford.edu/~sdkamvar/talks/co nstrained-clustering.ppt http://www.stanford.edu/~sdkamvar/talks/co nstrained-clustering.ppt http://www.cs.utexas.edu/~mooney/talks/se misup-clustering.ppt http://www.cs.utexas.edu/~mooney/talks/se misup-clustering.ppt www.cs.virginia.edu/~ils9d/theory/impossclus tering.ppt www.cs.virginia.edu/~ils9d/theory/impossclus tering.ppt

5 Tools/Programming gCLUTO and CLUTO[vCluster and sCluster] Weka Rapidminer

6 Impossibility theorem on clustering Motivation Each criterion optimizes different features Is there one clustering criterion with phenomenal cosmic powers?

7 Method Give three intuitive axioms that any criterion should satisfy Surprise: Not possible to satisfy all three Reminiscent of Arrow’s Impossibility theorem: ranking is impossible

8 Axiom 1 – Scale-Invariance For any distance function d and any β >0 we have that f(S,d)=f(S,βd)

9 Axiom 2 - Richness Range(f) is equal to all partitions of S i.e. All possible clusterings can be generated given the right distances

10 Axiom 3 - Consistency Let d and d’ be two distance functions. If f(d) = and d’ is such that the distance between all points in a cluster is less than in d and the distance between inter-cluster points is larger than in d then f(d’)= d(i,j) d’(i,j)

11 Definition Anti-chain: A collection of partitions is an anti- chain if it does not contain two distinct partitions such that one is a refinement of the other Anti-Chains can not satisfy Richness

12 Main Result For each, there is no clustering function f that satisfies Scale-Invariance, Richness and Consistency Implied by proof that if f satisfies Scale-Invariance and Consistency, then Range(f) is an anti-chain

13 Understanding Criterion Functions

14 Formulae

15 Semi-Supervised Learning Combines labeled and unlabeled data during training to improve performance: – Semi-supervised classification: Training on labeled data exploits additional unlabeled data, frequently resulting in a more accurate classifier. – Semi-supervised clustering: Uses small amount of labeled data to aid and bias the clustering of unlabeled data.

16 . Semi-Supervised Clustering Example...................

17 ....................

18 . Second Semi-Supervised Clustering Example...................

19 ....................

20 Semi-Supervised Clustering Can group data using the categories in the initial labeled data. Can also extend and modify the existing set of categories as needed to reflect other regularities in the data. Can cluster a disjoint set of unlabeled data using the labeled data as a “guide” to the type of clusters desired.

21 Unsupervised KMeans Clustering KMeans is a partitional clustering algorithm based on iterative relocation that partitions a dataset into K clusters. Algorithm: Initialize K cluster centers randomly. Repeat until convergence: – Cluster Assignment Step: Assign each data point x to the cluster X l, such that L 2 distance of x from (center of X l ) is minimum – Center Re-estimation Step: Re-estimate each cluster center as the mean of the points in that cluster

22 KMeans Objective Function Locally minimizes sum of squared distance between the data points and their corresponding cluster centers: Initialization of K cluster centers: – Totally random – Random perturbation from global mean – Number of runs – Heuristic to ensure well-separated centers etc.

23 K Means Example

24 K Means Example Randomly Initialize Means x x

25 K Means Example Assign Points to Clusters x x

26 K Means Example Re-estimate Means x x

27 K Means Example Re-assign Points to Clusters x x

28 K Means Example Re-estimate Means x x

29 K Means Example Re-assign Points to Clusters x x

30 Semi-Supervised KMeans Seeded KMeans: – Labeled data provided by user are used for initialization: initial center for cluster i is the mean of the seed points having label i. – Seed points are only used for initialization, and not in subsequent steps. Constrained KMeans: – Labeled data provided by user are used to initialize KMeans algorithm. – Cluster labels of seed data are kept unchanged in the cluster assignment steps, and only the labels of the non-seed data are re- estimated.

31 Semi-Supervised K Means Example Initialize Means Using Labeled Data x x

32 Semi-Supervised K Means Example Assign Points to Clusters x x

33 Semi-Supervised K Means Example Re-estimate Means and Converge xx

34 COP-KMeans COP-KMeans [Wagstaff et al.: ICML01] is KMeans with must- link (must be in same cluster) and cannot-link (cannot be in same cluster) constraints on data points. Initialization: Cluster centers are chosen randomly, but as each one is chosen any must-link constraints that it participates in are enforced (so that they cannot later be chosen as the center of another cluster). Algorithm: During cluster assignment step in COP-KMeans, a point is assigned to its nearest cluster without violating any of its constraints. If no such assignment exists, abort.

35

36 Constrain #instance K-means Clustering

37 Constrain #instance Algorithm

38 Clustering Assignment Minimum cost flow

39 Constrained K-means with Background Knowledge Constrained K-means [Wagstaff and Cardie 2000] – Do K-means, but… Instead of assigning data to the closest mean, assign to the closest mean which results in a legal assignment. – Satisfies constraints like a CSP. – Results in instance-level bias. 4 Constraints

40 Pairwise Constraints Are x and y in the same cluster? Weaker than partial labeling. More compatible with data exploration. xy xy YES must-link NO cannot-link

41 When can we use constraints? Patterns in Feature Space TOO EASY Don’t need constraints JUST RIGHT Constraints effective TOO HARD Can’t use constraints

42 In the 1 st case where the features are picked perfectly, constraints won’t be of any use because any reasonable clustering method should get the right clusters without the help of constraints. In the 3 rd case where the features have no correlation to the desired classes, clusters are not likely to be detected even with constraints. The case where constraints are useful is when the feature space is not too bad, but not too good either. More specifically, constraints will be useful when the feature space gives good local similarity measures, but poor global similarity measures. So, as you see in this middle picture, points are usually in the same cluster as their nearest neighbors, but points that are very far from each other can also be in the same cluster, while medium distance points can be in different clusters. Luckily for us, most data looks more like the middle picture than the other two pictures.

43 Demo http://nlp.stanford.edu/~danklein/demos/c onstrained-clustering-demo.shtml

44 Implications of Constraints What do pairwise constraints imply? Instance-Level Bias (COP K-means) Constrained points should alter their behavior. Space-Level Bias Regions near constrained points should alter their behavior

45 A Spatial Interpretation Imposing Constraints – Must-linked pairs should be close. – Cannot-linked pairs should be far. Propagating Constraints – If A is close to B, and B is must-linked to C, then A should be close to C. – If A is close to B, and B is cannot-linked to C, then A should be far from C. A B C A B C

46 Imposing Constraints Must-linked pairs should be close. Cannot-linked pairs should be far.

47 Propagating Must-Links All-Pairs Shortest Paths

48 – Complete-link clustering Begin: Each point is a cluster At each stage, merge closest pair of clusters. Distance between clusters = distance between furthest pair of points in the clusters. Propagating Cannot-Links

49 Complete-Link Clustering Propagates Cannot-Links

50 Constrained CL Input are data and constraints. – Calculate proximities in feature space. Must-Links – Must-linked arcs have their distances set to zero. – To propagate must-links, run all-pairs shortest-paths. Cannot-Links – Cannot-linked arcs have max(M) added to their distances. – The complete-link algorithm implicitly propagates cannot-links.

51 Active Learning Again, based on the principle that local similarities are correct. Algorithm: – Choose some threshold distance d. – Cluster using CL until you are about to merge two clusters whose distance is greater than d. – Ask if the root nodes of the two clusters are must- linked or cannot-linked. – Impose and Propagate constraint accordingly. – Continue to cluster using CL.

52 ? Active Learning ??

53 Further topics to be explored Relational Clustering Distance metric learning, with application to clustering with side-information Semi-supervised graph-based clustering using kernels Max Margin Clustering


Download ppt "K-Means and variants Rahul K Mishra Guide: Prof. G. Ramakrishnan."

Similar presentations


Ads by Google