ECE 5424: Introduction to Machine Learning

Slides:

Advertisements

Similar presentations

Mixture Models and the EM Algorithm

Advertisements

Unsupervised Learning

Clustering Beyond K-means

Expectation Maximization

K Means Clustering , Nearest Cluster and Gaussian Mixture

Supervised Learning Recap

K-means clustering Hongning Wang

Machine Learning and Data Mining Clustering

Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.

Scalable Data Mining The Auton Lab, Carnegie Mellon University Brigham Anderson, Andrew Moore, Dan Pelleg, Alex Gray, Bob Nichols, Andy.

Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction

Expectation Maximization Algorithm

Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”

Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?

ECE 5984: Introduction to Machine Learning

1 EM for BNs Graphical Models – Carlos Guestrin Carnegie Mellon University November 24 th, 2008 Readings: 18.1, 18.2, –  Carlos Guestrin.

Gaussian Mixture Models and Expectation Maximization.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Naïve Bayes Readings: Barber

CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber

Lecture 17 Gaussian Mixture Models and Expectation Maximization

Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Nearest Neighbour Readings: Barber 14 (kNN)

Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.

Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.

Lecture 2: Statistical learning primer for biologists

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Expectation Maximization –Principal Component Analysis (PCA) Readings:

CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Machine Learning and Data Mining Clustering

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

Dhruv Batra Georgia Tech

Lecture 18 Expectation Maximization

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

Classification of unlabeled data:

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

Basic machine learning background with Python scikit-learn

ECE 5424: Introduction to Machine Learning

CS 2750: Machine Learning Expectation Maximization

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

Latent Variables, Mixture Models and EM

ECE 5424: Introduction to Machine Learning

Hidden Markov Models Part 2: Algorithms

دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry

Bayesian Models in Machine Learning

Probabilistic Models with Latent Variables

ECE 5424: Introduction to Machine Learning

Gaussian Mixture Models And their training with the EM algorithm

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

EECS 473 Advanced Embedded Systems

EECS 473 Advanced Embedded Systems

Text Categorization Berlin Chen 2003 Reference:

Machine Learning and Data Mining Clustering

Junheng, Shengming, Yunsheng 11/09/2018

Linear Discrimination

EM Algorithm and its Applications

Clustering (2) & EM algorithm

Machine Learning and Data Mining Clustering

Content and Presentation

Presentation transcript:

ECE 5424: Introduction to Machine Learning Topics: Expectation Maximization For GMMs For General Latent Model Learning Readings: Barber 20.1-20.3 Stefan Lee Virginia Tech

Project Poster Poster Presentation: Dec 6th 1:30-3:30pm Goodwin Hall Atrium Print poster (or bunch of slides) Fedex, Library, ECE support, CS support Format: Portrait, 2 feet (width) x 36 inches (height) See https://filebox.ece.vt.edu/~f16ece5424/project.html Submit poster as PDF by Dec 6th 1:30pm Makes up the final portion of your project grade Best Project Prize!

How to Style a Scientific Poster Layout content consistently top to bottom, left to right in columns is common usually numbered headings

How to Style a Scientific Poster DONT

How to Style a Scientific Poster Be cautious with colors

How to Style a Scientific Poster Be cautious with colors

How to Style a Scientific Poster Less text, more pictures. Bullets are your friend!

How to Style a Scientific Poster Avoid giant tables Don’t use more than 2 fonts Make sure everything is readable Make sure each figure has an easy to find take-away message (write it nearby, maybe in an associated box)

How to Style a Scientific Poster

How to Present a Scientific Poster Typical poster session at a conference: You stand by your poster and people stop to check it out You will need: 30 second summary gives the visitor an overview and gauges interest if they are not really interested, they will leave after this bit 3-5 minutes full walkthrough if someone stuck around past your 30 second summary or asked some follow up questions, walk them through your poster in more detail. DON’T READ IT TO THEM! A bottle of water is typically useful.

Again Poster Presentation: Submit poster as PDF by Dec 6th 1:30pm Goodwin Hall Atrium Print poster (or bunch of slides) Fedex, Library, ECE support, CS support Format: Portrait, 2 feet (width) x 36 inches (height) Submit poster as PDF by Dec 6th 1:30pm Makes up the final portion of your project grade If you are worried about your project, talk to me soon. Best Project Prize!

Homework & Grading HW3 & HW4 should be graded this week Will release solutions this week as well

Final Exam Dec 14th in class (DURH 261); 2:05 - 4:05 pm Content: Almost completely about material since the midterm SVM, Neural Networks, Descision Trees, Ensemble Techniques, K-means, EM (today), Factor Analysis (Thrusday) True/False (explain your choice like last time) Multiple Choice Some ‘Prove this’ Some ‘What would happen with algorithm A on this dataset”

One last thing SPOT surveys

Recap of Last Time (C) Dhruv Batra

Some Data (C) Dhruv Batra Slide Credit: Carlos Guestrin

Ask user how many clusters they’d like. (e.g. k=5) K-means Ask user how many clusters they’d like. (e.g. k=5) (C) Dhruv Batra Slide Credit: Carlos Guestrin

K-means Ask user how many clusters they’d like. (e.g. k=5) Randomly guess k cluster Center locations (C) Dhruv Batra Slide Credit: Carlos Guestrin

K-means Ask user how many clusters they’d like. (e.g. k=5) Randomly guess k cluster Center locations Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints) (C) Dhruv Batra Slide Credit: Carlos Guestrin

K-means Ask user how many clusters they’d like. (e.g. k=5) Randomly guess k cluster Center locations Each datapoint finds out which Center it’s closest to. Each Center finds the centroid of the points it owns (C) Dhruv Batra Slide Credit: Carlos Guestrin

K-means Ask user how many clusters they’d like. (e.g. k=5) Randomly guess k cluster Center locations Each datapoint finds out which Center it’s closest to. Each Center finds the centroid of the points it owns… …and jumps there …Repeat until terminated! (C) Dhruv Batra Slide Credit: Carlos Guestrin

K-means Randomly initialize k centers Assign: Recenter: Assign each point i{1,…n} to nearest center: Recenter: j becomes centroid of its points (C) Dhruv Batra Slide Credit: Carlos Guestrin

K-means as Co-ordinate Descent Optimize objective function: Fix , optimize a (or C) (C) Dhruv Batra Slide Credit: Carlos Guestrin

K-means as Co-ordinate Descent Optimize objective function: Fix a (or C), optimize  (C) Dhruv Batra Slide Credit: Carlos Guestrin

Object Bag of ‘words’ Fei-Fei Li

Clustered Image Patches Fei-Fei et al. 2005

(One) bad case for k-means Clusters may overlap Some clusters may be “wider” than others GMM to the rescue! (C) Dhruv Batra Slide Credit: Carlos Guestrin

GMM (C) Dhruv Batra Figure Credit: Kevin Murphy

GMM (C) Dhruv Batra Figure Credit: Kevin Murphy

K-means vs GMM K-Means GMM http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html GMM http://www.socr.ucla.edu/applets.dir/mixtureem.html (C) Dhruv Batra

Hidden Data Causes Problems #1 Fully Observed (Log) Likelihood factorizes Marginal (Log) Likelihood doesn’t factorize All parameters coupled! (C) Dhruv Batra

Hidden Data Causes Problems #2 Identifiability (C) Dhruv Batra Figure Credit: Kevin Murphy

Hidden Data Causes Problems #3 Likelihood has singularities if one Gaussian “collapses” (C) Dhruv Batra

Special case: spherical Gaussians and hard assignments If P(X|Z=k) is spherical, with same  for all classes: If each xi belongs to one class C(i) (hard assignment), marginal likelihood: M(M)LE same as K-means!!! (C) Dhruv Batra Slide Credit: Carlos Guestrin

The K-means GMM assumption There are k components Component i has an associated mean vector mi m2 m1 m3 (C) Dhruv Batra Slide Credit: Carlos Guestrin

The K-means GMM assumption There are k components Component i has an associated mean vector mi Each component generates data from a Gaussian with mean mi and covariance matrix s2I Each data point is generated according to the following recipe: m2 m1 m3 (C) Dhruv Batra Slide Credit: Carlos Guestrin

The K-means GMM assumption There are k components Component i has an associated mean vector mi Each component generates data from a Gaussian with mean mi and covariance matrix s2I Each data point is generated according to the following recipe: Pick a component at random: Choose component i with probability P(y=i) m2 (C) Dhruv Batra Slide Credit: Carlos Guestrin

The K-means GMM assumption There are k components Component i has an associated mean vector mi Each component generates data from a Gaussian with mean mi and covariance matrix s2I Each data point is generated according to the following recipe: Pick a component at random: Choose component i with probability P(y=i) Datapoint ~ N(mi, s2I ) m2 x (C) Dhruv Batra Slide Credit: Carlos Guestrin

The General GMM assumption There are k components Component i has an associated mean vector mi Each component generates data from a Gaussian with mean mi and covariance matrix Si Each data point is generated according to the following recipe: Pick a component at random: Choose component i with probability P(y=i) Datapoint ~ N(mi, Si ) m2 m1 m3 (C) Dhruv Batra Slide Credit: Carlos Guestrin

K-means vs GMM K-Means GMM http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html GMM http://www.socr.ucla.edu/applets.dir/mixtureem.html (C) Dhruv Batra

EM Expectation Maximization [Dempster ‘77] Often looks like “soft” K-means Extremely general Extremely useful algorithm Essentially THE goto algorithm for unsupervised learning Plan EM for learning GMM parameters EM for general unsupervised learning problems (C) Dhruv Batra

EM for Learning GMMs Simple Update Rules E-Step: estimate P(zi = j | xi) M-Step: maximize full likelihood weighted by posterior (C) Dhruv Batra

Gaussian Mixture Example: Start (C) Dhruv Batra Slide Credit: Carlos Guestrin

After 1st iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin

After 2nd iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin

After 3rd iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin

After 4th iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin

After 5th iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin

After 6th iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin

After 20th iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin