Download presentation
Presentation is loading. Please wait.
Published byBrian Cobb Modified over 9 years ago
1
MACHINE LEARNING 8. Clustering
2
Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Classification problem: Need P(C|X) Bayes: can be computed from P(x|C) Need to estimation P(x|C) from data Assume a model (e.g. normal distribution) up to parameters Compute estimators(ML, MAP) for parameters from data Regression Need to estimate joint P(x,r) Bayes: can be computed from P(r|x) Assume model up to parameters (e.g. linear) Compute parameters from data (e.g. least squares)
3
Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3 Not always can assume that data came from single distribution/model Nonparametric method: don’t assume any model, compute probability of new data directly from old data Semi-parametric/mixture models: assume data came from a unknown mixture of known models
4
Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4 Optical Character Recognition Two way to write 7 (w/o horizontal bar) Can’t assume single distribution Mixture of unknown number of templates Compared to classification Number of classes is known Each training sample has a label of a class Supervised Learning
5
Mixture Densities Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5 where G i the components/groups/clusters, P ( G i ) mixture proportions (priors), p ( x | G i ) component densities Gaussian mixture where p(x| G i ) ~ N ( μ i, ∑ i ) parameters Φ = {P ( G i ), μ i, ∑ i } k i=1 unlabeled sample X ={x t } t (unsupervised learning)
6
Example : Color quantization Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6 Image: each pixels represented by 24 bit color Colors come from different distribution (e.g. sky, grass) Don’t have labeling for each pixels if it’s sky or grass Want to use only 256 colors in palette to represent image as close as possible to original Quantize uniformly: assign single color to each 2^24/256 interval Waste values for rarely occurring intervals
7
Quantization Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7 Sample (pixels): k reference vectors (palette): Select reference vector for each pixel: Reference vectors: codebook vectors or code words Compress image Reconstruction error
8
Encoding/Decoding Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8
9
K-means clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9 Minimize reconstruction error Take derivatives and set to zero Reference vectors is the mean of all instances it represents
10
K-Means clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10 Iterative procedure for finding reference vectors Start with random reference vectors Estimate labels b Re-compute reference vectors as means Continue till converge
11
k-means Clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11
12
Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12
13
Expectation Maximization(EM): Motivation Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13 Date came from several distribution Assume each distribution is known up to parameters If we would know for each data instance from what distribution it came, could use parametric estimation Introduce unobservable (latent) variables which indicate source distribution Run iterative process Estimate latent variables from data and current estimation of distribution parameters Use current values of latent variables to refine parameter estimation
14
EM Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14 Log-Likelihood Assume hidden variables Z, which when known, make optimization much simpler Complete likelihood, L c ( Φ | X, Z ), in terms of X and Z Incomplete likelihood, L ( Φ | X ), in terms of X
15
Latent Variables Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15 Unknown Can’t compute complete likelihood L c ( Φ | X, Z ) Can compute its expected value
16
E- and M-steps Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16 Iterate the two steps: 1. E-step: Estimate z given X and current Φ 2. M-step: Find new Φ ’ given z, X, and old Φ.
17
Example: Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17 Data came from mix of Gaussians Maximize likelihood assuming we know latent “indicator variable” E-step: expected value of indicator variables
18
Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18 P(G 1 |x)=h 1 =0.5
19
EM for Gaussian mixtures Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19 Assume all groups/clusters are Gaussians Multivariate Uncorrelated Same Variance Harden indicators EM: expected values are between 0 and 1 K-means: 0 or 1 Same as k-means
20
Dimensionality Reduction vs. Clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20 Dimensionality reduction methods find correlations between features and group features Age and Income are correlated Clustering methods find similarities between instances and group instances Customer A and B are from the same cluster
21
Clustering: Usage for supervised learning Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21 Describe data in terms of cluster Represent all data in cluster by cluster mean Range of attributes Map data into new space(preprocessing) D- dimension original space k- number of clusters Use indicator variables as data representations k might be larger then d
22
Mixture of Mixtures Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22 In classification, the input comes from a mixture of classes (supervised). If each class is also a mixture, e.g., of Gaussians, (unsupervised), we have a mixture of mixtures:
23
Hierarchical Clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 23 Probabilistic view Fit mixture model to data Find codewords minimizing reconstruction error Hierarchical clustering Group similar items together No specific model/distribution Items in groups is more similar to each other than instances in different groups
24
Hierarchical Clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 24 Minkowski (L p ) (Euclidean for p = 2) City-block distance
25
Agglomerative Clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 25 Start with clusters each having single point At each step merge similar clusters Measure of similarity Minimal distance(single link) Distance between closest points in2 groups Maximal distance(complete link) Distance between most distant points in2 groups Average distance Distance between group centers
26
Example: Single-Link Clustering Based on for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 26 Dendrogram
27
Choosing k Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 27 Defined by the application, e.g., image quantization Incremental (leader-cluster) algorithm: Add one at a time until “elbow” (reconstruction error/log likelihood/intergroup distances)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.