Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Slides:



Advertisements
Similar presentations
Image Modeling & Segmentation
Advertisements

CS479/679 Pattern Recognition Dr. George Bebis
Expectation Maximization Expectation Maximization A “Gentle” Introduction Scott Morris Department of Computer Science.
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
Expectation Maximization
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Segmentation and Fitting Using Probabilistic Methods
DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University.
Mixture Language Models and EM Algorithm
Visual Recognition Tutorial
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
The EM algorithm (Part 1) LING 572 Fei Xia 02/23/06.
Lecture 5: Learning models using EM
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Gaussian Mixture Example: Start After First Iteration.
The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general.
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Expectation Maximization Algorithm
Expectation-Maximization
Visual Recognition Tutorial
What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.
Chapter 3 (part 1): Maximum-Likelihood & Bayesian Parameter Estimation  Introduction  Maximum-Likelihood Estimation  Example of a Specific Case  The.
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Gaussian Mixture Model and the EM algorithm in Speech Recognition
EM and expected complete log-likelihood Mixture of Experts
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Lecture 19: More EM Machine Learning April 15, 2010.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
HMM - Part 2 The EM algorithm Continuous density HMM.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.
Today's Specials ● Detailed look at Lagrange Multipliers ● Forward-Backward and Viterbi algorithms for HMMs ● Intro to EM as a concept [ Motivation, Insights]
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep Some.
RADFORD M. NEAL GEOFFREY E. HINTON 발표: 황규백
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
EM Algorithm 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction Example  Missing Data Example  Mixed Attributes Example  Mixture Main Body Mixture Model.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
CS479/679 Pattern Recognition Dr. George Bebis
Classification of unlabeled data:
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Latent Variables, Mixture Models and EM
Expectation-Maximization
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Bayesian Models in Machine Learning
SMEM Algorithm for Mixture Models
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
EM Algorithm 主講人:虞台文.
Clustering (2) & EM algorithm
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis

Expectation-Maximization (EM) EM is an iterative method to perform ML estimation: Starts with an initial estimate for θ. Refines the current estimate iteratively to increase the likelihood of the observed data: p(D/ θ)

Expectation-Maximization (EM) EM represents a general framework – works best in situations where the data is incomplete (or can be thought as being incomplete) Some creativity is required to recognize where the EM algorithm can be used. Standard method for estimating the parameters of Mixtures of Gaussians (MoG).

Incomplete Data Many times, it is impossible to apply ML estimation because certain features cannot be measured directly. The EM algorithm is ideal for problems with unobserved (missing) data.

Example (Moon, 1996) s Assume a trinomial distribution: x1+x2+x3=k k!

Example (Moon, 1996) (cont’d)

EM: Main Idea If x was available, we could use ML to estimate θ, i.e., Since x is not available: Maximize the expectation of ln p(Dx / θ) with respect to the unknown variables given Dy and an estimate of θ.

EM Steps (1) Initialization (2) Expectation (3) Maximization (4) Test for convergence

EM Steps (cont’d) (1) Initialization Step: initialize the algorithm with a guess θ0 (2) Expectation step: it is performed with respect to the unobserved variables, using the current estimate of parameters and conditioned upon the observations: When ln p(Dx / θ) is a linear function of the unobserved variables, the expectation step is equivalent to:

EM Steps (cont’d) (3) Maximization Step: provides a new estimate of the parameters: (4) Test for Convergence: stop; otherwise, go to Step 2. if

Example (Moon, 1996) (cont’d) x1!x2!x3! Suppose: k! k!

Example (Moon, 1996) (cont’d) Take expected value: k! Let’s look at the M-step for a minute before completing the E-step …

Example (Moon, 1996) (cont’d) 2Σi Σi We only need to estimate: Let’s go back and complete the E-step now …

Example (Moon, 1996) (cont’d) (see Moon’s paper, page 53, for a proof)

Example (Moon, 1996) (cont’d) Initialization: θ0 Expectation Step: Maximization Step: Convergence Step: 2Σi Σi

Example (Moon, 1996) (cont’d)

Convergence properties of EM The solution depends on the initial estimate θ0 At each iteration, a value of θ is computed so that the likelihood function does not decrease. There is no guarantee that it will convergence to a global maximum. The algorithm is guaranteed to be stable. i.e., there is no chance of "overshooting" or diverging from the maximum.

Expectation-Maximization (EM) EM represents a general framework – works best in situations where the data is incomplete (or can be thought as being incomplete) Some creativity is required to recognize where the EM algorithm can be used. Standard method for estimating the parameters of Mixtures of Gaussians (MoG).

Mixture of 2D Gaussians - Example

Mixture Model π1 π2 π3 πk

Mixture of 1D Gaussians - Example π2=0.2 π1=0.3 π3=0.5

Mixture Parameters

Fitting a Mixture Model to a set of observations Dx Two fundamental problems: (1) Estimate the number of mixture components K (2) Estimate mixture parameters (πk , θk), k=1,2,…,K

Mixtures of Gaussians (see Chapter 10) where each p(x/θ)= The parameters θk are (μk,Σk)

Mixtures of Gaussians (cont’d) π1 π2 π3 πk

Estimating Mixture Parameters Using ML – not easy!

Estimating Mixture Parameters Using EM: Case of Unknown Means Assumptions Observation … but we don’t!

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) Introduce hidden or unobserved variables zi

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) Main steps using EM

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) Expectation Step

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) Expectation Step

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) Expectation Step

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) Expectation Step E(zik) is just the probability that xi was generated by the k-th component:

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) Maximization Step

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) Summary

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) Summary

Estimating Mixture Parameters Using EM: General Case Need to review Lagrange Optimization first …

Lagrange Optimization solve for x and λ g(x)=0 n+1 equations / n+1 unknowns

Lagrange Optimization (cont’d) Example Maximize f(x1,x2)=x1x2 subject to the constraint g(x1,x2)=x1+x2-1=0 3 equations / 3 unknowns

Estimating Mixture Parameters Using EM: General Case Introduce hidden or unobserved variables zi

Estimating Mixture Parameters Using EM: General Case (cont’d) Expectation Step

Estimating Mixture Parameters Using EM: General Case (cont’d) Expectation Step (cont’d)

Estimating Mixture Parameters Using EM: General Case (cont’d) Expectation Step (cont’d)

Estimating Mixture Parameters Using EM: General Case (cont’d) Maximization Step use Lagrange optimization

Estimating Mixture Parameters Using EM: General Case (cont’d) Maximization Step (cont’d)

Estimating Mixture Parameters Using EM: General Case (cont’d) Summary

Estimating Mixture Parameters Using EM: General Case (cont’d) Summary

Estimating the Number of Components K