ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3.

Slides:

Advertisements

Similar presentations

Mixture Models and the EM Algorithm

Advertisements

Unsupervised Learning

Clustering Beyond K-means

Expectation Maximization

K Means Clustering , Nearest Cluster and Gaussian Mixture

Supervised Learning Recap

CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Lecture 28: Bag-of-words models

Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.

Scalable Data Mining The Auton Lab, Carnegie Mellon University Brigham Anderson, Andrew Moore, Dan Pelleg, Alex Gray, Bob Nichols, Andy.

Bag-of-features models

Generative learning methods for bags of features

Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction

Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”

“Bag of Words”: recognition using texture : Advanced Machine Perception A. Efros, CMU, Spring 2006 Adopted from Fei-Fei Li, with some slides from.

Unsupervised Learning

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

ECE 5984: Introduction to Machine Learning

EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Discriminative and generative methods for bags of features

Clustering & Dimensionality Reduction 273A Intro Machine Learning.

Gaussian Mixture Models and Expectation Maximization.

Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Exercise Session 10 – Image Categorization

Computer Vision James Hays, Brown

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Naïve Bayes Readings: Barber

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Lecture 17 Gaussian Mixture Models and Expectation Maximization

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Nearest Neighbour Readings: Barber 14 (kNN)

HMM - Part 2 The EM algorithm Continuous density HMM.

Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.

Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Lecture 2: Statistical learning primer for biologists

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Regression Readings: Barber 17.1, 17.2.

Flat clustering approaches

Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:

CS654: Digital Image Analysis

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Decision/Classification Trees Readings: Murphy ; Hastie 9.2.

CSE 446: Expectation Maximization (EM) Winter 2012 Daniel Weld Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer.

Advanced Artificial Intelligence Lecture 8: Advance machine learning.

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Expectation Maximization –Principal Component Analysis (PCA) Readings:

A PPLICATIONS OF TOPIC MODELS Daphna Weinshall B Slides credit: Joseph Sivic, Li Fei-Fei, Brian Russel and others.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

ECE 5424: Introduction to Machine Learning

The topic discovery models

Classification of unlabeled data:

ECE 5424: Introduction to Machine Learning

The topic discovery models

ECE 5424: Introduction to Machine Learning

Probabilistic Models with Latent Variables

The topic discovery models

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

INTRODUCTION TO Machine Learning

EM Algorithm and its Applications

Presentation transcript:

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber

Midsem Presentations Graded Mean 8/10 = 80% –Min: 3 –Max: 10 (C) Dhruv Batra2

Tasks (C) Dhruv Batra3 Classificationxy Regressionxy Discrete Continuous Clusteringxc Discrete ID Dimensionality Reduction xz Continuous Supervised Learning Unsupervised Learning

Learning only with X –Y not present in training data Some example unsupervised learning problems: –Clustering / Factor Analysis –Dimensionality Reduction / Embeddings –Density Estimation with Mixture Models (C) Dhruv Batra4

New Topic: Clustering Slide Credit: Carlos Guestrin5

Synonyms Clustering Vector Quantization Latent Variable Models Hidden Variable Models Mixture Models Algorithms: –K-means –Expectation Maximization (EM) (C) Dhruv Batra6

Some Data 7(C) Dhruv BatraSlide Credit: Carlos Guestrin

K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 8(C) Dhruv BatraSlide Credit: Carlos Guestrin

K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 9(C) Dhruv BatraSlide Credit: Carlos Guestrin

K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints) 10(C) Dhruv BatraSlide Credit: Carlos Guestrin

K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. 4.Each Center finds the centroid of the points it owns 11(C) Dhruv BatraSlide Credit: Carlos Guestrin

K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. 4.Each Center finds the centroid of the points it owns… 5.…and jumps there 6.…Repeat until terminated! 12(C) Dhruv BatraSlide Credit: Carlos Guestrin

K-means Randomly initialize k centers –  (0) =  1 (0),…,  k (0) Assign: –Assign each point i  {1,…n} to nearest center: – Recenter: –  j becomes centroid of its points 13(C) Dhruv BatraSlide Credit: Carlos Guestrin

K-means Demo – – ppletKM.htmlhttp://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.html (C) Dhruv Batra14

What is K-means optimizing? Objective F( ,C): function of centers  and point allocations C: – –1-of-k encoding Optimal K-means: –min  min a F( ,a) 15(C) Dhruv Batra

Coordinate descent algorithms 16(C) Dhruv BatraSlide Credit: Carlos Guestrin Want: min a min b F(a,b) Coordinate descent: –fix a, minimize b –fix b, minimize a –repeat Converges!!! –if F is bounded –to a (often good) local optimum as we saw in applet (play with it!) K-means is a coordinate descent algorithm!

Optimize objective function: Fix , optimize a (or C) 17(C) Dhruv BatraSlide Credit: Carlos Guestrin K-means as Co-ordinate Descent

Optimize objective function: Fix a (or C), optimize  18(C) Dhruv BatraSlide Credit: Carlos Guestrin K-means as Co-ordinate Descent

One important use of K-means Bag-of-word models in computer vision (C) Dhruv Batra19

Bag of Words model aardvark0 about2 all2 Africa1 apple0 anxious0... gas1... oil1 … Zaire0 Slide Credit: Carlos Guestrin(C) Dhruv Batra20

Object Bag of ‘words’ Fei-Fei Li

Interest Point Features Normalize patch Detect patches [Mikojaczyk and Schmid ’02] [Matas et al. ’02] [Sivic et al. ’03] Compute SIFT descriptor [Lowe’99] Slide credit: Josef Sivic

… Patch Features Slide credit: Josef Sivic

dictionary formation … Slide credit: Josef Sivic

Clustering (usually k-means) Vector quantization … Slide credit: Josef Sivic

Clustered Image Patches Fei-Fei et al. 2005

Visual Polysemy. Single visual word occurring on different (but locally similar) parts on different object categories. Visual Synonyms. Two different visual words representing a similar part of an object (wheel of a motorbike). Visual synonyms and polysemy Andrew Zisserman

Image representation ….. frequency codewords Fei-Fei Li

(One) bad case for k-means Clusters may overlap Some clusters may be “wider” than others GMM to the rescue! Slide Credit: Carlos Guestrin(C) Dhruv Batra30

GMM (C) Dhruv Batra31Figure Credit: Kevin Murphy

Recall Multi-variate Gaussians (C) Dhruv Batra32

GMM (C) Dhruv Batra33Figure Credit: Kevin Murphy

Hidden Data Causes Problems #1 Fully Observed (Log) Likelihood factorizes Marginal (Log) Likelihood doesn’t factorize All parameters coupled! (C) Dhruv Batra34

GMM vs Gaussian Joint Bayes Classifier On Board –Observed Y vs Unobserved Z –Likelihood vs Marginal Likelihood (C) Dhruv Batra35

Hidden Data Causes Problems #2 (C) Dhruv Batra36Figure Credit: Kevin Murphy

Hidden Data Causes Problems #2 Identifiability (C) Dhruv Batra37Figure Credit: Kevin Murphy

Hidden Data Causes Problems #3 Likelihood has singularities if one Gaussian “collapses” (C) Dhruv Batra38

Special case: spherical Gaussians and hard assignments Slide Credit: Carlos Guestrin(C) Dhruv Batra39 If P(X|Z=k) is spherical, with same  for all classes: If each x i belongs to one class C(i) (hard assignment), marginal likelihood: M(M)LE same as K-means!!!

The K-means GMM assumption There are k components Component i has an associated mean vector      Slide Credit: Carlos Guestrin(C) Dhruv Batra40

The K-means GMM assumption There are k components Component i has an associated mean vector    Each component generates data from a Gaussian with mean m i and covariance matrix    Each data point is generated according to the following recipe:    Slide Credit: Carlos Guestrin(C) Dhruv Batra41

The K-means GMM assumption There are k components Component i has an associated mean vector   Each component generates data from a Gaussian with mean m i and covariance matrix    Each data point is generated according to the following recipe: 1.Pick a component at random: Choose component i with probability P(y=i)  Slide Credit: Carlos Guestrin(C) Dhruv Batra42

The K-means GMM assumption There are k components Component i has an associated mean vector   Each component generates data from a Gaussian with mean m i and covariance matrix    Each data point is generated according to the following recipe: 1.Pick a component at random: Choose component i with probability P(y=i) 2.Datapoint       x Slide Credit: Carlos Guestrin(C) Dhruv Batra43

The General GMM assumption    There are k components Component i has an associated mean vector m i Each component generates data from a Gaussian with mean m i and covariance matrix  i Each data point is generated according to the following recipe: 1.Pick a component at random: Choose component i with probability P(y=i) 2.Datapoint ~ N(m i,  i ) Slide Credit: Carlos Guestrin(C) Dhruv Batra44

K-means vs GMM K-Means – ppletKM.htmlhttp://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.html GMM – (C) Dhruv Batra45