EM Algorithm 主講人:虞台文
Contents Introduction Example Missing Data Example Mixed Attributes Example Mixture Main Body Mixture Model EM-Algorithm on GMM
EM Algorithm Introduction
Introduction EM is typically used to compute maximum likelihood estimates given incomplete samples. The EM algorithm estimates the parameters of a model iteratively. Starting from some initial guess, each iteration consists of an E step (Expectation step) an M step (Maximization step)
Applications Filling in missing data in samples Discovering the value of latent variables Estimating the parameters of HMMs Estimating parameters of finite mixtures Unsupervised learning of clusters …
EM Algorithm Example Missing Data
Univariate Normal Sample Sampling
Given x, it is a function of and 2 Maximum Likelihood Sampling We want to maximize it. Given x, it is a function of and 2
Log-Likelihood Function Maximize this instead By setting and
Max. the Log-Likelihood Function
Max. the Log-Likelihood Function
Miss Data Missing data Sampling
E-Step be the estimated parameters at the initial of the tth iterations Let
E-Step be the estimated parameters at the initial of the tth iterations Let
M-Step be the estimated parameters at the initial of the tth iterations Let
Exercise n = 40 (10 data missing) Estimate using different initial conditions. 375.081556 362.275902 332.612068 351.383048 304.823174 386.438672 430.079689 395.317406 369.029845 365.343938 243.548664 382.789939 374.419161 337.289831 418.928822 364.086502 343.854855 371.279406 439.241736 338.281616 454.981077 479.685107 336.634962 407.030453 297.821512 311.267105 528.267783 419.841982 392.684770 301.910093
Example Mixed Attributes EM Algorithm Example Mixed Attributes
Multinomial Population Sampling N samples
Maximum Likelihood Sampling N samples
Maximum Likelihood Sampling N samples We want to maximize it.
Log-Likelihood
Mixed Attributes Sampling N samples x3 is not available
E-Step N samples Given (t), what can you say about x3? Sampling N samples x3 is not available Given (t), what can you say about x3?
M-Step
Exercise Estimate using different initial conditions?
EM Algorithm Example: Mixture
Binomial/Poison Mixture M : married obasong X : # Children Binomial/Poison Mixture # Children n0 n1 n2 n3 n4 n5 n6 # Obasongs Married Obasongs Unmarried Obasongs (No Children)
Binomial/Poison Mixture M : married obasong X : # Children Binomial/Poison Mixture # Children n0 n1 n2 n3 n4 n5 n6 # Obasongs Married Obasongs Unmarried Obasongs (No Children) Unobserved data: nA : # married Ob’s nB : # unmarried Ob’s
Binomial/Poison Mixture M : married obasong X : # Children Binomial/Poison Mixture # Children n0 n1 n2 n3 n4 n5 n6 # Obasongs Complete data n1 n2 n3 n4 n5 n6 Probability pA, pB p1 p2 p3 p4 p5 p6
Binomial/Poison Mixture # Children n0 n1 n2 n3 n4 n5 n6 # Obasongs Complete data n1 n2 n3 n4 n5 n6 Probability pA, pB p1 p2 p3 p4 p5 p6
Complete Data Likelihood # Children n0 n1 n2 n3 n4 n5 n6 # Obasongs Complete data n1 n2 n3 n4 n5 n6 Probability pA, pB p1 p2 p3 p4 p5 p6
Complete Data Likelihood # Children n0 n1 n2 n3 n4 n5 n6 # Obasongs Complete data n1 n2 n3 n4 n5 n6 Probability pA, pB p1 p2 p3 p4 p5 p6
Log-Likelihood
Maximization
Maximization
E-Step Given
M-Step
Example # Obasongs # Children t nA nB 3,062 587 284 103 33 4 2 0 0.750000 0.400000 2502.779 559.221 1 0.614179 1.035478 2503.591 558.409 2 0.614378 1.036013 2504.219 557.781 3 0.614532 1.036427 2504.705 557.295 4 0.614652 1.036748 2505.081 556.919 5 0.614744 1.036996 2505.371 556.629 t nA nB
EM Algorithm Main Body
Maximum Likelihood
Latent Variables Incomplete Data Complete Data
Complete Data Likelihood
Complete Data Likelihood A function of latent variable Y and parameter A function of parameter A function of random variable Y. The result is in term of random variable Y. If we are given , Computable
Expectation Step Define Let (i1) be the parameter vector obtained at the (i1)th step. Define
Maximization Step Define Let (i1) be the parameter vector obtained at the (i1)th step. Define
EM Algorithm Mixture Model
Mixture Models If there is a reason to believe that a data set is comprised of several distinct populations, a mixture model can be used. It has the following form: with
Mixture Models Let yi{1,…, M} represents the source that generates the data.
Mixture Models Let yi{1,…, M} represents the source that generates the data.
Mixture Models
Mixture Models
Given x and , the conditional density of y can be computed. Mixture Models Given x and , the conditional density of y can be computed.
Complete-Data Likelihood Function
Expectation g: Guess
Expectation g: Guess
Expectation Zero when yi l
Expectation
Expectation
Expectation 1
Maximization Given the initial guess g, We want to find , to maximize the above expectation. In fact, iteratively.
The GMM (Guassian Mixture Model) Guassian model of a d-dimensional source, say j : GMM with M sources:
EM Algorithm EM-Algorithm on GMM
Goal Mixture Model subject to To maximize:
Goal Mixture Model Correlated with l only. Correlated with l only. subject to To maximize:
Finding l Due to the constraint on l’s, we introduce Lagrange Multiplier , and solve the following equation.
Finding l 1 N 1
Finding l
Only need to maximize this term Finding l Consider GMM unrelated
Finding l How? Therefore, we want to maximize: Only need to maximize this term Finding l Therefore, we want to maximize: How? knowledge on matrix algebra is needed. unrelated
Finding l Therefore, we want to maximize:
Summary EM algorithm for GMM Given an initial guess g, find new as follows Not converge
Demonstration EM algorithm for Mixture models
Exercises Write a program to generate multidimensional Gaussian distribution. Draw the distribution for 2-dim data. Write a program to generate GMM. Write EM-algorithm to analyze GMM data. Study more EM-algorithm for mixture. Find applications for EM-algorithm.
References A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models (1998), Jeff Bilmes The Expectation Maximization Algorithm: A short tutorial, Sean Borman.