Gaussian Mixture Models and Expectation Maximization.

Slides:

Advertisements

Similar presentations

K-means and Gaussian Mixture Model 王养浩 2013 年 11 月 20 日.

Advertisements

Image Modeling & Segmentation

Mixture Models and the EM Algorithm

Unsupervised Learning

Expectation Maximization

Supervised Learning Recap

Segmentation and Fitting Using Probabilistic Methods

K-means clustering Hongning Wang

Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/15/12.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.

Gaussian Mixture Example: Start After First Iteration.

Expectation Maximization Algorithm

Slide 1 EE3J2 Data Mining EE3J2 Data Mining - revision Martin Russell.

Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.

Expectation-Maximization

Visual Recognition Tutorial

ECE 5984: Introduction to Machine Learning

EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.

A Unifying Review of Linear Gaussian Models

Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.

Biointelligence Laboratory, Seoul National University

Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 9, Friday June 15 th, 2007 (EM.

CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.

1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.

Lecture 19: More EM Machine Learning April 15, 2010.

1 Clustering: K-Means Machine Learning , Fall 2014 Bhavana Dalvi Mishra PhD student LTI, CMU Slides are based on materials from Prof. Eric Xing,

CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

First topic: clustering and pattern recognition Marc Sobel.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber

Lecture 17 Gaussian Mixture Models and Expectation Maximization

HMM - Part 2 The EM algorithm Continuous density HMM.

Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.

B AYESIAN L EARNING & G AUSSIAN M IXTURE M ODELS Jianping Fan Dept of Computer Science UNC-Charlotte.

CS Statistical Machine learning Lecture 24

1 Clustering: K-Means Machine Learning , Fall 2014 Bhavana Dalvi Mishra PhD student LTI, CMU Slides are based on materials from Prof. Eric Xing,

Lecture 2: Statistical learning primer for biologists

Flat clustering approaches

Chapter 13 (Prototype Methods and Nearest-Neighbors )

CSE 517 Natural Language Processing Winter 2015

Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

CSE 446: Expectation Maximization (EM) Winter 2012 Daniel Weld Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Expectation Maximization –Principal Component Analysis (PCA) Readings:

EM Algorithm 主講人：虞台文大同大學資工所智慧型多媒體研究室. Contents Introduction Example  Missing Data Example  Mixed Attributes Example  Mixture Main Body Mixture Model.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.

Big Data Infrastructure Week 9: Data Mining (4/4) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States.

The EM algorithm for Mixture of Gaussians & Classification with Generative models Jakob Verbeek December 2, 2011 Course website:

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Introduction to Data Science: Lecture 6

Probability Theory and Parameter Estimation I

Lecture 18 Expectation Maximization

Bayesian Rule & Gaussian Mixture Models

Statistical Models for Automatic Speech Recognition

LECTURE 10: EXPECTATION MAXIMIZATION (EM)

CS 2750: Machine Learning Expectation Maximization

Latent Variables, Mixture Models and EM

دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry

Bayesian Models in Machine Learning

Probabilistic Models with Latent Variables

Statistical Models for Automatic Speech Recognition

Lecture 11: Mixture of Gaussians

LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.

Biointelligence Laboratory, Seoul National University

EM Algorithm and its Applications

Clustering (2) & EM algorithm

Presentation transcript:

Gaussian Mixture Models and Expectation Maximization

The Problem You have data that you believe is drawn from n populations You want to identify parameters for each population You don’t know anything about the populations a priori – Except you believe that they’re gaussian…

Gaussian Mixture Models Rather than identifying clusters by “nearest” centroids Fit a Set of k Gaussians to the data Maximum Likelihood over a mixture model

GMM example

Mixture Models Formally a Mixture Model is the weighted sum of a number of pdfs where the weights are determined by a distribution,

Gaussian Mixture Models GMM: the weighted sum of a number of Gaussians where the weights are determined by a distribution,

Graphical Models with unobserved variables What if you have variables in a Graphical model that are never observed? – Latent Variables Training latent variable models is an unsupervised learning application laughing amused sweating uncomfortable

Latent Variable HMMs We can cluster sequences using an HMM with unobserved state variables We will train latent variable models using Expectation Maximization

Expectation Maximization The training of GMMs with latent variables can be accomplished using Expectation Maximization – Step 1: Expectation (E-step) Evaluate the “responsibilities” of each cluster with the current parameters – Step 2: Maximization (M-step) Re-estimate parameters using the existing “responsibilities” Similar to k-means training.

Latent Variable Representation We can represent a GMM involving a latent variable What does this give us?

GMM data and Latent variables

One last bit We have representations of the joint p(x,z) and the marginal, p(x)… The conditional of p(z|x) can be derived using Bayes rule. – The responsibility that a mixture component takes for explaining an observation x.

Maximum Likelihood over a GMM As usual: Identify a likelihood function And set partials to zero…

Maximum Likelihood of a GMM Optimization of means.

Maximum Likelihood of a GMM Optimization of covariance

Maximum Likelihood of a GMM Optimization of mixing term

MLE of a GMM

EM for GMMs Initialize the parameters – Evaluate the log likelihood Expectation-step: Evaluate the responsibilities Maximization-step: Re-estimate Parameters – Evaluate the log likelihood – Check for convergence

EM for GMMs E-step: Evaluate the Responsibilities

EM for GMMs M-Step: Re-estimate Parameters

Visual example of EM

Potential Problems Incorrect number of Mixture Components Singularities

Incorrect Number of Gaussians

Relationship to K-means K-means makes hard decisions. – Each data point gets assigned to a single cluster. GMM/EM makes soft decisions. – Each data point can yield a posterior p(z|x) Soft K-means is a special case of EM.

General form of EM Given a joint distribution over observed and latent variables: Want to maximize: 1.Initialize parameters 2.E Step: Evaluate: 3.M-Step: Re-estimate parameters (based on expectation of complete-data log likelihood) 4.Check for convergence of params or likelihood