ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3

Midsem Presentations Graded Mean 8/10 = 80% –Min: 3 –Max: 10 (C) Dhruv Batra2

Tasks (C) Dhruv Batra3 Classificationxy Regressionxy Discrete Continuous Clusteringxc Discrete ID Dimensionality Reduction xz Continuous Supervised Learning Unsupervised Learning

Learning only with X –Y not present in training data Some example unsupervised learning problems: –Clustering / Factor Analysis –Dimensionality Reduction / Embeddings –Density Estimation with Mixture Models (C) Dhruv Batra4

New Topic: Clustering Slide Credit: Carlos Guestrin5

Synonyms Clustering Vector Quantization Latent Variable Models Hidden Variable Models Mixture Models Algorithms: –K-means –Expectation Maximization (EM) (C) Dhruv Batra6

Some Data 7(C) Dhruv BatraSlide Credit: Carlos Guestrin

K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 8(C) Dhruv BatraSlide Credit: Carlos Guestrin

K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 9(C) Dhruv BatraSlide Credit: Carlos Guestrin

K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints) 10(C) Dhruv BatraSlide Credit: Carlos Guestrin

K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. 4.Each Center finds the centroid of the points it owns 11(C) Dhruv BatraSlide Credit: Carlos Guestrin

K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. 4.Each Center finds the centroid of the points it owns… 5.…and jumps there 6.…Repeat until terminated! 12(C) Dhruv BatraSlide Credit: Carlos Guestrin

K-means Randomly initialize k centers –  (0) =  1 (0),…,  k (0) Assign: –Assign each point i  {1,…n} to nearest center: – Recenter: –  j becomes centroid of its points 13(C) Dhruv BatraSlide Credit: Carlos Guestrin

K-means Demo –http://www.kovan.ceng.metu.edu.tr/~maya/kmeans/http://www.kovan.ceng.metu.edu.tr/~maya/kmeans/ –http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.htmlhttp://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.html (C) Dhruv Batra14

What is K-means optimizing? Objective F( ,C): function of centers  and point allocations C: – –1-of-k encoding Optimal K-means: –min  min a F( ,a) 15(C) Dhruv Batra

Coordinate descent algorithms 16(C) Dhruv BatraSlide Credit: Carlos Guestrin Want: min a min b F(a,b) Coordinate descent: –fix a, minimize b –fix b, minimize a –repeat Converges!!! –if F is bounded –to a (often good) local optimum as we saw in applet (play with it!) K-means is a coordinate descent algorithm!

Optimize objective function: Fix , optimize a (or C) 17(C) Dhruv BatraSlide Credit: Carlos Guestrin K-means as Co-ordinate Descent

Optimize objective function: Fix a (or C), optimize  18(C) Dhruv BatraSlide Credit: Carlos Guestrin K-means as Co-ordinate Descent

One important use of K-means Bag-of-word models in computer vision (C) Dhruv Batra19

Bag of Words model aardvark0 about2 all2 Africa1 apple0 anxious0... gas1... oil1 … Zaire0 Slide Credit: Carlos Guestrin(C) Dhruv Batra20

Object Bag of ‘words’ Fei-Fei Li

Interest Point Features Normalize patch Detect patches [Mikojaczyk and Schmid ’02] [Matas et al. ’02] [Sivic et al. ’03] Compute SIFT descriptor [Lowe’99] Slide credit: Josef Sivic

… Patch Features Slide credit: Josef Sivic

dictionary formation … Slide credit: Josef Sivic

Clustering (usually k-means) Vector quantization … Slide credit: Josef Sivic

Clustered Image Patches Fei-Fei et al. 2005

Visual Polysemy. Single visual word occurring on different (but locally similar) parts on different object categories. Visual Synonyms. Two different visual words representing a similar part of an object (wheel of a motorbike). Visual synonyms and polysemy Andrew Zisserman

Image representation ….. frequency codewords Fei-Fei Li

(One) bad case for k-means Clusters may overlap Some clusters may be “wider” than others GMM to the rescue! Slide Credit: Carlos Guestrin(C) Dhruv Batra30

GMM (C) Dhruv Batra31Figure Credit: Kevin Murphy

Recall Multi-variate Gaussians (C) Dhruv Batra32

GMM (C) Dhruv Batra33Figure Credit: Kevin Murphy

Hidden Data Causes Problems #1 Fully Observed (Log) Likelihood factorizes Marginal (Log) Likelihood doesn’t factorize All parameters coupled! (C) Dhruv Batra34

GMM vs Gaussian Joint Bayes Classifier On Board –Observed Y vs Unobserved Z –Likelihood vs Marginal Likelihood (C) Dhruv Batra35

Hidden Data Causes Problems #2 (C) Dhruv Batra36Figure Credit: Kevin Murphy

Hidden Data Causes Problems #2 Identifiability (C) Dhruv Batra37Figure Credit: Kevin Murphy

Hidden Data Causes Problems #3 Likelihood has singularities if one Gaussian “collapses” (C) Dhruv Batra38

Special case: spherical Gaussians and hard assignments Slide Credit: Carlos Guestrin(C) Dhruv Batra39 If P(X|Z=k) is spherical, with same  for all classes: If each x i belongs to one class C(i) (hard assignment), marginal likelihood: M(M)LE same as K-means!!!

The K-means GMM assumption There are k components Component i has an associated mean vector      Slide Credit: Carlos Guestrin(C) Dhruv Batra40

The K-means GMM assumption There are k components Component i has an associated mean vector    Each component generates data from a Gaussian with mean m i and covariance matrix    Each data point is generated according to the following recipe:    Slide Credit: Carlos Guestrin(C) Dhruv Batra41

The K-means GMM assumption There are k components Component i has an associated mean vector   Each component generates data from a Gaussian with mean m i and covariance matrix    Each data point is generated according to the following recipe: 1.Pick a component at random: Choose component i with probability P(y=i)  Slide Credit: Carlos Guestrin(C) Dhruv Batra42

The K-means GMM assumption There are k components Component i has an associated mean vector   Each component generates data from a Gaussian with mean m i and covariance matrix    Each data point is generated according to the following recipe: 1.Pick a component at random: Choose component i with probability P(y=i) 2.Datapoint       x Slide Credit: Carlos Guestrin(C) Dhruv Batra43

The General GMM assumption    There are k components Component i has an associated mean vector m i Each component generates data from a Gaussian with mean m i and covariance matrix  i Each data point is generated according to the following recipe: 1.Pick a component at random: Choose component i with probability P(y=i) 2.Datapoint ~ N(m i,  i ) Slide Credit: Carlos Guestrin(C) Dhruv Batra44

K-means vs GMM K-Means –http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.htmlhttp://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.html GMM –http://www.socr.ucla.edu/applets.dir/mixtureem.htmlhttp://www.socr.ucla.edu/applets.dir/mixtureem.html (C) Dhruv Batra45

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3.

Similar presentations

Presentation on theme: "ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3.

Similar presentations

Presentation on theme: "ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3."— Presentation transcript:

Similar presentations

About project

Feedback