Presentation is loading. Please wait.

Presentation is loading. Please wait.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Similar presentations


Presentation on theme: "MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:"— Presentation transcript:

1 MACHINE LEARNING 8. Clustering

2 Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:  Need P(C|X)  Bayes: can be computed from P(x|C)  Need to estimation P(x|C) from data  Assume a model (e.g. normal distribution) up to parameters  Compute estimators(ML, MAP) for parameters from data  Regression  Need to estimate joint P(x,r)  Bayes: can be computed from P(r|x)  Assume model up to parameters (e.g. linear)  Compute parameters from data (e.g. least squares)

3 Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3  Not always can assume that data came from single distribution/model  Nonparametric method: don’t assume any model, compute probability of new data directly from old data  Semi-parametric/mixture models: assume data came from a unknown mixture of known models

4 Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4  Optical Character Recognition  Two way to write 7 (w/o horizontal bar)  Can’t assume single distribution  Mixture of unknown number of templates  Compared to classification  Number of classes is known  Each training sample has a label of a class  Supervised Learning

5 Mixture Densities Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5  where G i the components/groups/clusters, P ( G i ) mixture proportions (priors), p ( x | G i ) component densities  Gaussian mixture where p(x| G i ) ~ N ( μ i, ∑ i ) parameters Φ = {P ( G i ), μ i, ∑ i } k i=1 unlabeled sample X ={x t } t (unsupervised learning)

6 Example : Color quantization Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6  Image: each pixels represented by 24 bit color  Colors come from different distribution (e.g. sky, grass)  Don’t have labeling for each pixels if it’s sky or grass  Want to use only 256 colors in palette to represent image as close as possible to original  Quantize uniformly: assign single color to each 2^24/256 interval  Waste values for rarely occurring intervals

7 Quantization Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7  Sample (pixels):  k reference vectors (palette):  Select reference vector for each pixel:  Reference vectors: codebook vectors or code words  Compress image  Reconstruction error

8 Encoding/Decoding Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8

9 K-means clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9  Minimize reconstruction error  Take derivatives and set to zero  Reference vectors is the mean of all instances it represents

10 K-Means clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10  Iterative procedure for finding reference vectors  Start with random reference vectors  Estimate labels b  Re-compute reference vectors as means  Continue till converge

11 k-means Clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11

12 Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12

13 Expectation Maximization(EM): Motivation Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13  Date came from several distribution  Assume each distribution is known up to parameters  If we would know for each data instance from what distribution it came, could use parametric estimation  Introduce unobservable (latent) variables which indicate source distribution  Run iterative process  Estimate latent variables from data and current estimation of distribution parameters  Use current values of latent variables to refine parameter estimation

14 EM Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14  Log-Likelihood  Assume hidden variables Z, which when known, make optimization much simpler  Complete likelihood, L c ( Φ | X, Z ), in terms of X and Z  Incomplete likelihood, L ( Φ | X ), in terms of X

15 Latent Variables Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15  Unknown  Can’t compute complete likelihood L c ( Φ | X, Z )  Can compute its expected value

16 E- and M-steps Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16  Iterate the two steps: 1. E-step: Estimate z given X and current Φ 2. M-step: Find new Φ ’ given z, X, and old Φ.

17 Example: Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17  Data came from mix of Gaussians  Maximize likelihood assuming we know latent “indicator variable”  E-step: expected value of indicator variables

18 Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18 P(G 1 |x)=h 1 =0.5

19 EM for Gaussian mixtures Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19  Assume all groups/clusters are Gaussians  Multivariate Uncorrelated  Same Variance  Harden indicators  EM: expected values are between 0 and 1  K-means: 0 or 1  Same as k-means

20 Dimensionality Reduction vs. Clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20  Dimensionality reduction methods find correlations between features and group features  Age and Income are correlated  Clustering methods find similarities between instances and group instances  Customer A and B are from the same cluster

21 Clustering: Usage for supervised learning Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21  Describe data in terms of cluster  Represent all data in cluster by cluster mean  Range of attributes  Map data into new space(preprocessing)  D- dimension original space  k- number of clusters  Use indicator variables as data representations  k might be larger then d

22 Mixture of Mixtures Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22  In classification, the input comes from a mixture of classes (supervised).  If each class is also a mixture, e.g., of Gaussians, (unsupervised), we have a mixture of mixtures:

23 Hierarchical Clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 23  Probabilistic view  Fit mixture model to data  Find codewords minimizing reconstruction error  Hierarchical clustering  Group similar items together  No specific model/distribution  Items in groups is more similar to each other than instances in different groups

24 Hierarchical Clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 24 Minkowski (L p ) (Euclidean for p = 2) City-block distance

25 Agglomerative Clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 25  Start with clusters each having single point  At each step merge similar clusters  Measure of similarity  Minimal distance(single link) Distance between closest points in2 groups  Maximal distance(complete link) Distance between most distant points in2 groups  Average distance Distance between group centers

26 Example: Single-Link Clustering Based on for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 26 Dendrogram

27 Choosing k Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 27  Defined by the application, e.g., image quantization  Incremental (leader-cluster) algorithm: Add one at a time until “elbow” (reconstruction error/log likelihood/intergroup distances)


Download ppt "MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:"

Similar presentations


Ads by Google