Download presentation
Presentation is loading. Please wait.
Published byJessie Cannon Modified over 9 years ago
1
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3
2
Midsem Presentations Graded Mean 8/10 = 80% –Min: 3 –Max: 10 (C) Dhruv Batra2
3
Tasks (C) Dhruv Batra3 Classificationxy Regressionxy Discrete Continuous Clusteringxc Discrete ID Dimensionality Reduction xz Continuous Supervised Learning Unsupervised Learning
4
Learning only with X –Y not present in training data Some example unsupervised learning problems: –Clustering / Factor Analysis –Dimensionality Reduction / Embeddings –Density Estimation with Mixture Models (C) Dhruv Batra4
5
New Topic: Clustering Slide Credit: Carlos Guestrin5
6
Synonyms Clustering Vector Quantization Latent Variable Models Hidden Variable Models Mixture Models Algorithms: –K-means –Expectation Maximization (EM) (C) Dhruv Batra6
7
Some Data 7(C) Dhruv BatraSlide Credit: Carlos Guestrin
8
K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 8(C) Dhruv BatraSlide Credit: Carlos Guestrin
9
K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 9(C) Dhruv BatraSlide Credit: Carlos Guestrin
10
K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints) 10(C) Dhruv BatraSlide Credit: Carlos Guestrin
11
K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. 4.Each Center finds the centroid of the points it owns 11(C) Dhruv BatraSlide Credit: Carlos Guestrin
12
K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. 4.Each Center finds the centroid of the points it owns… 5.…and jumps there 6.…Repeat until terminated! 12(C) Dhruv BatraSlide Credit: Carlos Guestrin
13
K-means Randomly initialize k centers – (0) = 1 (0),…, k (0) Assign: –Assign each point i {1,…n} to nearest center: – Recenter: – j becomes centroid of its points 13(C) Dhruv BatraSlide Credit: Carlos Guestrin
14
K-means Demo –http://www.kovan.ceng.metu.edu.tr/~maya/kmeans/http://www.kovan.ceng.metu.edu.tr/~maya/kmeans/ –http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.htmlhttp://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.html (C) Dhruv Batra14
15
What is K-means optimizing? Objective F( ,C): function of centers and point allocations C: – –1-of-k encoding Optimal K-means: –min min a F( ,a) 15(C) Dhruv Batra
16
Coordinate descent algorithms 16(C) Dhruv BatraSlide Credit: Carlos Guestrin Want: min a min b F(a,b) Coordinate descent: –fix a, minimize b –fix b, minimize a –repeat Converges!!! –if F is bounded –to a (often good) local optimum as we saw in applet (play with it!) K-means is a coordinate descent algorithm!
17
Optimize objective function: Fix , optimize a (or C) 17(C) Dhruv BatraSlide Credit: Carlos Guestrin K-means as Co-ordinate Descent
18
Optimize objective function: Fix a (or C), optimize 18(C) Dhruv BatraSlide Credit: Carlos Guestrin K-means as Co-ordinate Descent
19
One important use of K-means Bag-of-word models in computer vision (C) Dhruv Batra19
20
Bag of Words model aardvark0 about2 all2 Africa1 apple0 anxious0... gas1... oil1 … Zaire0 Slide Credit: Carlos Guestrin(C) Dhruv Batra20
21
Object Bag of ‘words’ Fei-Fei Li
23
Interest Point Features Normalize patch Detect patches [Mikojaczyk and Schmid ’02] [Matas et al. ’02] [Sivic et al. ’03] Compute SIFT descriptor [Lowe’99] Slide credit: Josef Sivic
24
… Patch Features Slide credit: Josef Sivic
25
dictionary formation … Slide credit: Josef Sivic
26
Clustering (usually k-means) Vector quantization … Slide credit: Josef Sivic
27
Clustered Image Patches Fei-Fei et al. 2005
28
Visual Polysemy. Single visual word occurring on different (but locally similar) parts on different object categories. Visual Synonyms. Two different visual words representing a similar part of an object (wheel of a motorbike). Visual synonyms and polysemy Andrew Zisserman
29
Image representation ….. frequency codewords Fei-Fei Li
30
(One) bad case for k-means Clusters may overlap Some clusters may be “wider” than others GMM to the rescue! Slide Credit: Carlos Guestrin(C) Dhruv Batra30
31
GMM (C) Dhruv Batra31Figure Credit: Kevin Murphy
32
Recall Multi-variate Gaussians (C) Dhruv Batra32
33
GMM (C) Dhruv Batra33Figure Credit: Kevin Murphy
34
Hidden Data Causes Problems #1 Fully Observed (Log) Likelihood factorizes Marginal (Log) Likelihood doesn’t factorize All parameters coupled! (C) Dhruv Batra34
35
GMM vs Gaussian Joint Bayes Classifier On Board –Observed Y vs Unobserved Z –Likelihood vs Marginal Likelihood (C) Dhruv Batra35
36
Hidden Data Causes Problems #2 (C) Dhruv Batra36Figure Credit: Kevin Murphy
37
Hidden Data Causes Problems #2 Identifiability (C) Dhruv Batra37Figure Credit: Kevin Murphy
38
Hidden Data Causes Problems #3 Likelihood has singularities if one Gaussian “collapses” (C) Dhruv Batra38
39
Special case: spherical Gaussians and hard assignments Slide Credit: Carlos Guestrin(C) Dhruv Batra39 If P(X|Z=k) is spherical, with same for all classes: If each x i belongs to one class C(i) (hard assignment), marginal likelihood: M(M)LE same as K-means!!!
40
The K-means GMM assumption There are k components Component i has an associated mean vector Slide Credit: Carlos Guestrin(C) Dhruv Batra40
41
The K-means GMM assumption There are k components Component i has an associated mean vector Each component generates data from a Gaussian with mean m i and covariance matrix Each data point is generated according to the following recipe: Slide Credit: Carlos Guestrin(C) Dhruv Batra41
42
The K-means GMM assumption There are k components Component i has an associated mean vector Each component generates data from a Gaussian with mean m i and covariance matrix Each data point is generated according to the following recipe: 1.Pick a component at random: Choose component i with probability P(y=i) Slide Credit: Carlos Guestrin(C) Dhruv Batra42
43
The K-means GMM assumption There are k components Component i has an associated mean vector Each component generates data from a Gaussian with mean m i and covariance matrix Each data point is generated according to the following recipe: 1.Pick a component at random: Choose component i with probability P(y=i) 2.Datapoint x Slide Credit: Carlos Guestrin(C) Dhruv Batra43
44
The General GMM assumption There are k components Component i has an associated mean vector m i Each component generates data from a Gaussian with mean m i and covariance matrix i Each data point is generated according to the following recipe: 1.Pick a component at random: Choose component i with probability P(y=i) 2.Datapoint ~ N(m i, i ) Slide Credit: Carlos Guestrin(C) Dhruv Batra44
45
K-means vs GMM K-Means –http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.htmlhttp://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.html GMM –http://www.socr.ucla.edu/applets.dir/mixtureem.htmlhttp://www.socr.ucla.edu/applets.dir/mixtureem.html (C) Dhruv Batra45
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.