Presentation is loading. Please wait.

Presentation is loading. Please wait.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3.

Similar presentations


Presentation on theme: "ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3."— Presentation transcript:

1 ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3

2 Midsem Presentations Graded Mean 8/10 = 80% –Min: 3 –Max: 10 (C) Dhruv Batra2

3 Tasks (C) Dhruv Batra3 Classificationxy Regressionxy Discrete Continuous Clusteringxc Discrete ID Dimensionality Reduction xz Continuous Supervised Learning Unsupervised Learning

4 Learning only with X –Y not present in training data Some example unsupervised learning problems: –Clustering / Factor Analysis –Dimensionality Reduction / Embeddings –Density Estimation with Mixture Models (C) Dhruv Batra4

5 New Topic: Clustering Slide Credit: Carlos Guestrin5

6 Synonyms Clustering Vector Quantization Latent Variable Models Hidden Variable Models Mixture Models Algorithms: –K-means –Expectation Maximization (EM) (C) Dhruv Batra6

7 Some Data 7(C) Dhruv BatraSlide Credit: Carlos Guestrin

8 K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 8(C) Dhruv BatraSlide Credit: Carlos Guestrin

9 K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 9(C) Dhruv BatraSlide Credit: Carlos Guestrin

10 K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints) 10(C) Dhruv BatraSlide Credit: Carlos Guestrin

11 K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. 4.Each Center finds the centroid of the points it owns 11(C) Dhruv BatraSlide Credit: Carlos Guestrin

12 K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. 4.Each Center finds the centroid of the points it owns… 5.…and jumps there 6.…Repeat until terminated! 12(C) Dhruv BatraSlide Credit: Carlos Guestrin

13 K-means Randomly initialize k centers –  (0) =  1 (0),…,  k (0) Assign: –Assign each point i  {1,…n} to nearest center: – Recenter: –  j becomes centroid of its points 13(C) Dhruv BatraSlide Credit: Carlos Guestrin

14 K-means Demo –http://www.kovan.ceng.metu.edu.tr/~maya/kmeans/http://www.kovan.ceng.metu.edu.tr/~maya/kmeans/ –http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.htmlhttp://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.html (C) Dhruv Batra14

15 What is K-means optimizing? Objective F( ,C): function of centers  and point allocations C: – –1-of-k encoding Optimal K-means: –min  min a F( ,a) 15(C) Dhruv Batra

16 Coordinate descent algorithms 16(C) Dhruv BatraSlide Credit: Carlos Guestrin Want: min a min b F(a,b) Coordinate descent: –fix a, minimize b –fix b, minimize a –repeat Converges!!! –if F is bounded –to a (often good) local optimum as we saw in applet (play with it!) K-means is a coordinate descent algorithm!

17 Optimize objective function: Fix , optimize a (or C) 17(C) Dhruv BatraSlide Credit: Carlos Guestrin K-means as Co-ordinate Descent

18 Optimize objective function: Fix a (or C), optimize  18(C) Dhruv BatraSlide Credit: Carlos Guestrin K-means as Co-ordinate Descent

19 One important use of K-means Bag-of-word models in computer vision (C) Dhruv Batra19

20 Bag of Words model aardvark0 about2 all2 Africa1 apple0 anxious0... gas1... oil1 … Zaire0 Slide Credit: Carlos Guestrin(C) Dhruv Batra20

21 Object Bag of ‘words’ Fei-Fei Li

22

23 Interest Point Features Normalize patch Detect patches [Mikojaczyk and Schmid ’02] [Matas et al. ’02] [Sivic et al. ’03] Compute SIFT descriptor [Lowe’99] Slide credit: Josef Sivic

24 … Patch Features Slide credit: Josef Sivic

25 dictionary formation … Slide credit: Josef Sivic

26 Clustering (usually k-means) Vector quantization … Slide credit: Josef Sivic

27 Clustered Image Patches Fei-Fei et al. 2005

28 Visual Polysemy. Single visual word occurring on different (but locally similar) parts on different object categories. Visual Synonyms. Two different visual words representing a similar part of an object (wheel of a motorbike). Visual synonyms and polysemy Andrew Zisserman

29 Image representation ….. frequency codewords Fei-Fei Li

30 (One) bad case for k-means Clusters may overlap Some clusters may be “wider” than others GMM to the rescue! Slide Credit: Carlos Guestrin(C) Dhruv Batra30

31 GMM (C) Dhruv Batra31Figure Credit: Kevin Murphy

32 Recall Multi-variate Gaussians (C) Dhruv Batra32

33 GMM (C) Dhruv Batra33Figure Credit: Kevin Murphy

34 Hidden Data Causes Problems #1 Fully Observed (Log) Likelihood factorizes Marginal (Log) Likelihood doesn’t factorize All parameters coupled! (C) Dhruv Batra34

35 GMM vs Gaussian Joint Bayes Classifier On Board –Observed Y vs Unobserved Z –Likelihood vs Marginal Likelihood (C) Dhruv Batra35

36 Hidden Data Causes Problems #2 (C) Dhruv Batra36Figure Credit: Kevin Murphy

37 Hidden Data Causes Problems #2 Identifiability (C) Dhruv Batra37Figure Credit: Kevin Murphy

38 Hidden Data Causes Problems #3 Likelihood has singularities if one Gaussian “collapses” (C) Dhruv Batra38

39 Special case: spherical Gaussians and hard assignments Slide Credit: Carlos Guestrin(C) Dhruv Batra39 If P(X|Z=k) is spherical, with same  for all classes: If each x i belongs to one class C(i) (hard assignment), marginal likelihood: M(M)LE same as K-means!!!

40 The K-means GMM assumption There are k components Component i has an associated mean vector      Slide Credit: Carlos Guestrin(C) Dhruv Batra40

41 The K-means GMM assumption There are k components Component i has an associated mean vector    Each component generates data from a Gaussian with mean m i and covariance matrix    Each data point is generated according to the following recipe:    Slide Credit: Carlos Guestrin(C) Dhruv Batra41

42 The K-means GMM assumption There are k components Component i has an associated mean vector   Each component generates data from a Gaussian with mean m i and covariance matrix    Each data point is generated according to the following recipe: 1.Pick a component at random: Choose component i with probability P(y=i)  Slide Credit: Carlos Guestrin(C) Dhruv Batra42

43 The K-means GMM assumption There are k components Component i has an associated mean vector   Each component generates data from a Gaussian with mean m i and covariance matrix    Each data point is generated according to the following recipe: 1.Pick a component at random: Choose component i with probability P(y=i) 2.Datapoint       x Slide Credit: Carlos Guestrin(C) Dhruv Batra43

44 The General GMM assumption    There are k components Component i has an associated mean vector m i Each component generates data from a Gaussian with mean m i and covariance matrix  i Each data point is generated according to the following recipe: 1.Pick a component at random: Choose component i with probability P(y=i) 2.Datapoint ~ N(m i,  i ) Slide Credit: Carlos Guestrin(C) Dhruv Batra44

45 K-means vs GMM K-Means –http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.htmlhttp://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.html GMM –http://www.socr.ucla.edu/applets.dir/mixtureem.htmlhttp://www.socr.ucla.edu/applets.dir/mixtureem.html (C) Dhruv Batra45


Download ppt "ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3."

Similar presentations


Ads by Google