Download presentation
Presentation is loading. Please wait.
Published byClinton Henry Modified over 9 years ago
1
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014
2
2 k-Means Clustering Problem: Find k reference vectors M={m 1,…, m k } (representatives/centroids) and a 1-of-k coding scheme (X M) represented by b_ti that minimizes the “reconstruction error”: b_it and M has to be found that minimizes E: Step1: find the best b_it: Step2: find M: m i =( b_it*x t )/( b_it) Problem: b_it depends on M!! only an iterative solution can be found!! Significantly different from book!! Eick: K-Means and EM
3
Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3 k-means Clustering
4
4 k-Means Clustering Reformulated Given dataset X, k (and a random seed) Problem: Find k vectors M={m 1,…, m k } d and 1-of-k coding scheme b i t (b i t : X M) that minimizes the following error: Subject to: {0,1} (t)= i =1 for t=1,…,N (“coding scheme”) Eick: K-Means and EM
5
5 k-Means Clustering Generalized Given dataset X, k, distance function d Problem: Find k vectors M={m 1,…, m k } d and 1-of-k coding scheme b i t (b i t : X M) that minimizes the following error: Subject to: {0,1} (t)= i =1 for t=1,…,N (“coding scheme”) Eick: K-Means and EM
6
6 K-mean’s Complexity—O(kntd) K-means performs t iterations of E/M-steps; in each iteration the following computations have to be done: E-Step: 1. Find the nearest centroid for each object; because there are n objects and k centroids n*k distances have to be computed and compared; the complexity of computing a distance is O(d) yielding: O(k*n*d) 2. Assign objects to clusters/update clusters (O(n)) M-Step: Compute centroid for each cluster (O(n)) We obtain: O(t*(k*n*d + n + n))=O(t*k*n*d) Eick: K-Means and EM
7
Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7 Semiparametric Density Estimation Parametric: Assume a single model for p (x | C i ) (Chapter 4 and 5) Semiparametric: p (x | C i ) is a mixture of densities Multiple possible explanations/prototypes: Nonparametric: No model; data speaks for itself (Chapter 8)
8
Eick: K-Means and EM 8 Mixture Densities where G i the components/groups/clusters, P ( G i ) mixture proportions (priors), p ( x | G i ) component densities Gaussian mixture where p(x|G i ) ~ N ( μ i, ∑ i ) parameters Φ = {P ( G i ), μ i, ∑ i } k i=1 unlabeled sample X={x t } t (unsupervised learning) Idea: use a density function for each cluster or use multiple density functions for a single class
9
9 Expectation-Maximization (EM) Clustering Assumptions: datasets X are created a mixture of k multivariate Gaussian G 1,…G k also preserving their priors P(G 1 ),…,P(G k ). Example (2-D Dataset): p(x|G 1 )~N((2,3), 1 ), P(G 1 )=0.5 p(x|G 2 )~ N((-4,1), 2 ), P(G 2 )=0.4 p(x|G 3 )~N((0,-9), 3 ), P(G 3 )=0.1 EM’s Task: Given X, reconstruct the k Gaussian and their priors; each Gaussian is a model of a cluster in the dataset. Forming of clusters: Assign x i to the cluster G r which has the maximum value for: p(x i |G r ) P(G r ); alternatively, set p(x i |G r ) P(G r )= (soft EM) Eick: K-Means and EM
10
10 Performance Measure of EM EM maximizes Log likelihood of a mixture model We choose which makes X most likely. Assume hidden variables z, which when known, make optimization much simpler. z gives the probability of an object belonging to a particular cluster; z_ij is not known and has to be estimated; h_ij denotes its estimator. EM maximizes the log likelihood of the sample, but instead of maximizing L( Φ |X), it maximizes L( Φ |X,Z), in terms of x and z. Eick: K-Means and EM
11
Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11 E- and M-steps Iterate the two steps 1. E-step: Estimate z given X and current Φ 2. M-step: Find new Φ ’ given z, X, and old Φ. An increase in Q increases incomplete likelihood Assignments of samples to clusters Current Model Remark: z_it denotes the unknown membership of the t-th example
12
12 EM in Gaussian Mixtures z t i = p(x|G i ); assume p(x|G i )~N( μ i,∑ i ) E-step: Estimate mixture probabilities for X={x 1,…,x n } M-step: Obtain: l+1 namely (P(G i ),m i,S i ) l+1 for i=1,…,k Use estimated labels in place of unknown labels Remarks: h_it plays the role of b_it in K-Means h_it acts as an estimator for the unknown labels z_ij.
13
13 EM Means Expectation-Maximization 1. E-step (Expectation): Determine object membership in clusters (from model) 2. M-step (Maximization): Create model by using Maximum Likelihood Estimation (from cluster memberships) Computations in the two steps (K-means/EM): 1. Compute b_it / h_it 2. Compute centroid/(prior, mean,co-variance matrix) Remark: K-means uses a simple form of the EM-procedure! Eick: K-Means and EM
14
14 Commonalities between K-Means and EM 1. They start with random clusters and rely on a 2 step- approach to minimize the objective function using the EM-procedure (see previous transparency). 2. Use the same optimization procedure of an objective function f(a 1,…,a m,b 1,…,,b k ); we basically, maximize the a-values (keeping the b-values fixed) and then the b- values (keeping the a-values fixed) until some convergence is reached. Consequently, both algorithms only find a local minimum of the objective function are sensitive to initialization 3. Both assume the number of clusters k is known Eick: K-Means and EM
15
15 Differences between K-Means and EM I 1. K-means is distanced-based and relies on 1-NN queries to form clusters. EM is density based/probabilistic; EM usually works with multivariate Gaussians but can be generalized to work with other probability distributions. 2. K-means minimizes the squared distance of on object to its cluster prototype (usually the centroid). EM maximizes the log-likelihood of a sample given a model (p(X| )); models are assumed to be mixtures of k Gaussians and their priors. 3. K-means is a hard clustering, EM is a soft clustering algorithm: h_it [0,1] Eick: K-Means and EM
16
16 Differences between K-Means and EM II 4. K-means cluster models are just k centroids; EM models are k “priors, means+co-variance matrices”. 5. EM directly deals with dependencies between attributes in its density estimation approach: the degree to which an object x belongs to a cluster c depends on the product of c’s prior with the Mahalanobis distance between x and the c’s mean; therefore, EM clusters do not depend on units of measurements and orientation of attributes in space. 6. The distance metrics can be viewed as an input parameter when using k-means, and generalizations of K-means have been proposed which use different distance functions. EM implicitly relies on the Mahalanobis distance function which is part of its density estimation approach. Eick: K-Means and EM
17
Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17 Mixture of Mixtures In classification, the input comes from a mixture of classes (supervised). If each class is also a mixture, e.g., of Gaussians, (unsupervised), we have a mixture of mixtures:
18
Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18 Choosing k Defined by the application, e.g., image quantization Plot data (after PCA) and check for clusters Incremental (leader-cluster) algorithm: Add one at a time until “elbow” (reconstruction error/log likelihood/intergroup distances) Manual check for meaning Run with multiple k-values and compare the results
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.