ECE 5424: Introduction to Machine Learning

Slides:



Advertisements
Similar presentations
Mixture Models and the EM Algorithm
Advertisements

Unsupervised Learning
Clustering Beyond K-means
Expectation Maximization
K Means Clustering , Nearest Cluster and Gaussian Mixture
Supervised Learning Recap
K-means clustering Hongning Wang
Machine Learning and Data Mining Clustering
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Scalable Data Mining The Auton Lab, Carnegie Mellon University Brigham Anderson, Andrew Moore, Dan Pelleg, Alex Gray, Bob Nichols, Andy.
Clustering.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Expectation Maximization Algorithm
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
ECE 5984: Introduction to Machine Learning
1 EM for BNs Graphical Models – Carlos Guestrin Carnegie Mellon University November 24 th, 2008 Readings: 18.1, 18.2, –  Carlos Guestrin.
Gaussian Mixture Models and Expectation Maximization.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Naïve Bayes Readings: Barber
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber
Lecture 17 Gaussian Mixture Models and Expectation Maximization
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Nearest Neighbour Readings: Barber 14 (kNN)
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
Lecture 2: Statistical learning primer for biologists
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Expectation Maximization –Principal Component Analysis (PCA) Readings:
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Machine Learning and Data Mining Clustering
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Dhruv Batra Georgia Tech
Lecture 18 Expectation Maximization
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14
Classification of unlabeled data:
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Basic machine learning background with Python scikit-learn
ECE 5424: Introduction to Machine Learning
CS 2750: Machine Learning Expectation Maximization
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Latent Variables, Mixture Models and EM
ECE 5424: Introduction to Machine Learning
Hidden Markov Models Part 2: Algorithms
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
ECE 5424: Introduction to Machine Learning
Gaussian Mixture Models And their training with the EM algorithm
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
EECS 473 Advanced Embedded Systems
EECS 473 Advanced Embedded Systems
Text Categorization Berlin Chen 2003 Reference:
Machine Learning and Data Mining Clustering
Junheng, Shengming, Yunsheng 11/09/2018
Linear Discrimination
EM Algorithm and its Applications
Clustering (2) & EM algorithm
Machine Learning and Data Mining Clustering
Content and Presentation
Presentation transcript:

ECE 5424: Introduction to Machine Learning Topics: Expectation Maximization For GMMs For General Latent Model Learning Readings: Barber 20.1-20.3 Stefan Lee Virginia Tech

Project Poster Poster Presentation: Dec 6th 1:30-3:30pm Goodwin Hall Atrium Print poster (or bunch of slides) Fedex, Library, ECE support, CS support Format: Portrait, 2 feet (width) x 36 inches (height) See https://filebox.ece.vt.edu/~f16ece5424/project.html Submit poster as PDF by Dec 6th 1:30pm Makes up the final portion of your project grade Best Project Prize!

How to Style a Scientific Poster Layout content consistently top to bottom, left to right in columns is common usually numbered headings

How to Style a Scientific Poster DONT

How to Style a Scientific Poster Be cautious with colors

How to Style a Scientific Poster Be cautious with colors

How to Style a Scientific Poster Less text, more pictures. Bullets are your friend!

How to Style a Scientific Poster Avoid giant tables Don’t use more than 2 fonts Make sure everything is readable Make sure each figure has an easy to find take-away message (write it nearby, maybe in an associated box)

How to Style a Scientific Poster

How to Present a Scientific Poster Typical poster session at a conference: You stand by your poster and people stop to check it out You will need: 30 second summary gives the visitor an overview and gauges interest if they are not really interested, they will leave after this bit 3-5 minutes full walkthrough if someone stuck around past your 30 second summary or asked some follow up questions, walk them through your poster in more detail. DON’T READ IT TO THEM! A bottle of water is typically useful.

Again Poster Presentation: Submit poster as PDF by Dec 6th 1:30pm Goodwin Hall Atrium Print poster (or bunch of slides) Fedex, Library, ECE support, CS support Format: Portrait, 2 feet (width) x 36 inches (height) Submit poster as PDF by Dec 6th 1:30pm Makes up the final portion of your project grade If you are worried about your project, talk to me soon. Best Project Prize!

Homework & Grading HW3 & HW4 should be graded this week Will release solutions this week as well

Final Exam Dec 14th in class (DURH 261); 2:05 - 4:05 pm Content: Almost completely about material since the midterm SVM, Neural Networks, Descision Trees, Ensemble Techniques, K-means, EM (today), Factor Analysis (Thrusday) True/False (explain your choice like last time) Multiple Choice Some ‘Prove this’ Some ‘What would happen with algorithm A on this dataset”

One last thing SPOT surveys

Recap of Last Time (C) Dhruv Batra

Some Data (C) Dhruv Batra Slide Credit: Carlos Guestrin

Ask user how many clusters they’d like. (e.g. k=5) K-means Ask user how many clusters they’d like. (e.g. k=5) (C) Dhruv Batra Slide Credit: Carlos Guestrin

K-means Ask user how many clusters they’d like. (e.g. k=5) Randomly guess k cluster Center locations (C) Dhruv Batra Slide Credit: Carlos Guestrin

K-means Ask user how many clusters they’d like. (e.g. k=5) Randomly guess k cluster Center locations Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints) (C) Dhruv Batra Slide Credit: Carlos Guestrin

K-means Ask user how many clusters they’d like. (e.g. k=5) Randomly guess k cluster Center locations Each datapoint finds out which Center it’s closest to. Each Center finds the centroid of the points it owns (C) Dhruv Batra Slide Credit: Carlos Guestrin

K-means Ask user how many clusters they’d like. (e.g. k=5) Randomly guess k cluster Center locations Each datapoint finds out which Center it’s closest to. Each Center finds the centroid of the points it owns… …and jumps there …Repeat until terminated! (C) Dhruv Batra Slide Credit: Carlos Guestrin

K-means Randomly initialize k centers Assign: Recenter: Assign each point i{1,…n} to nearest center: Recenter: j becomes centroid of its points (C) Dhruv Batra Slide Credit: Carlos Guestrin

K-means as Co-ordinate Descent Optimize objective function: Fix , optimize a (or C) (C) Dhruv Batra Slide Credit: Carlos Guestrin

K-means as Co-ordinate Descent Optimize objective function: Fix a (or C), optimize  (C) Dhruv Batra Slide Credit: Carlos Guestrin

Object Bag of ‘words’ Fei-Fei Li

Clustered Image Patches Fei-Fei et al. 2005

(One) bad case for k-means Clusters may overlap Some clusters may be “wider” than others GMM to the rescue! (C) Dhruv Batra Slide Credit: Carlos Guestrin

GMM (C) Dhruv Batra Figure Credit: Kevin Murphy

GMM (C) Dhruv Batra Figure Credit: Kevin Murphy

K-means vs GMM K-Means GMM http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html GMM http://www.socr.ucla.edu/applets.dir/mixtureem.html (C) Dhruv Batra

Hidden Data Causes Problems #1 Fully Observed (Log) Likelihood factorizes Marginal (Log) Likelihood doesn’t factorize All parameters coupled! (C) Dhruv Batra

Hidden Data Causes Problems #2 Identifiability (C) Dhruv Batra Figure Credit: Kevin Murphy

Hidden Data Causes Problems #3 Likelihood has singularities if one Gaussian “collapses” (C) Dhruv Batra

Special case: spherical Gaussians and hard assignments If P(X|Z=k) is spherical, with same  for all classes: If each xi belongs to one class C(i) (hard assignment), marginal likelihood: M(M)LE same as K-means!!! (C) Dhruv Batra Slide Credit: Carlos Guestrin

The K-means GMM assumption There are k components Component i has an associated mean vector mi m2 m1 m3 (C) Dhruv Batra Slide Credit: Carlos Guestrin

The K-means GMM assumption There are k components Component i has an associated mean vector mi Each component generates data from a Gaussian with mean mi and covariance matrix s2I Each data point is generated according to the following recipe: m2 m1 m3 (C) Dhruv Batra Slide Credit: Carlos Guestrin

The K-means GMM assumption There are k components Component i has an associated mean vector mi Each component generates data from a Gaussian with mean mi and covariance matrix s2I Each data point is generated according to the following recipe: Pick a component at random: Choose component i with probability P(y=i) m2 (C) Dhruv Batra Slide Credit: Carlos Guestrin

The K-means GMM assumption There are k components Component i has an associated mean vector mi Each component generates data from a Gaussian with mean mi and covariance matrix s2I Each data point is generated according to the following recipe: Pick a component at random: Choose component i with probability P(y=i) Datapoint ~ N(mi, s2I ) m2 x (C) Dhruv Batra Slide Credit: Carlos Guestrin

The General GMM assumption There are k components Component i has an associated mean vector mi Each component generates data from a Gaussian with mean mi and covariance matrix Si Each data point is generated according to the following recipe: Pick a component at random: Choose component i with probability P(y=i) Datapoint ~ N(mi, Si ) m2 m1 m3 (C) Dhruv Batra Slide Credit: Carlos Guestrin

K-means vs GMM K-Means GMM http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html GMM http://www.socr.ucla.edu/applets.dir/mixtureem.html (C) Dhruv Batra

EM Expectation Maximization [Dempster ‘77] Often looks like “soft” K-means Extremely general Extremely useful algorithm Essentially THE goto algorithm for unsupervised learning Plan EM for learning GMM parameters EM for general unsupervised learning problems (C) Dhruv Batra

EM for Learning GMMs Simple Update Rules E-Step: estimate P(zi = j | xi) M-Step: maximize full likelihood weighted by posterior (C) Dhruv Batra

Gaussian Mixture Example: Start (C) Dhruv Batra Slide Credit: Carlos Guestrin

After 1st iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin

After 2nd iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin

After 3rd iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin

After 4th iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin

After 5th iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin

After 6th iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin

After 20th iteration (C) Dhruv Batra Slide Credit: Carlos Guestrin