Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis
Expectation-Maximization (EM) EM is an iterative method to perform ML estimation: Starts with an initial estimate for θ. Refines the current estimate iteratively to increase the likelihood of the observed data: p(D/ θ)
Expectation-Maximization (EM) EM represents a general framework – works best in situations where the data is incomplete (or can be thought as being incomplete) Some creativity is required to recognize where the EM algorithm can be used. Standard method for estimating the parameters of Mixtures of Gaussians (MoG).
Incomplete Data Many times, it is impossible to apply ML estimation because certain features cannot be measured directly. The EM algorithm is ideal for problems with unobserved (missing) data.
Example (Moon, 1996) s Assume a trinomial distribution: x1+x2+x3=k k!
Example (Moon, 1996) (cont’d)
EM: Main Idea If x was available, we could use ML to estimate θ, i.e., Since x is not available: Maximize the expectation of ln p(Dx / θ) with respect to the unknown variables given Dy and an estimate of θ.
EM Steps (1) Initialization (2) Expectation (3) Maximization (4) Test for convergence
EM Steps (cont’d) (1) Initialization Step: initialize the algorithm with a guess θ0 (2) Expectation step: it is performed with respect to the unobserved variables, using the current estimate of parameters and conditioned upon the observations: When ln p(Dx / θ) is a linear function of the unobserved variables, the expectation step is equivalent to:
EM Steps (cont’d) (3) Maximization Step: provides a new estimate of the parameters: (4) Test for Convergence: stop; otherwise, go to Step 2. if
Example (Moon, 1996) (cont’d) x1!x2!x3! Suppose: k! k!
Example (Moon, 1996) (cont’d) Take expected value: k! Let’s look at the M-step for a minute before completing the E-step …
Example (Moon, 1996) (cont’d) 2Σi Σi We only need to estimate: Let’s go back and complete the E-step now …
Example (Moon, 1996) (cont’d) (see Moon’s paper, page 53, for a proof)
Example (Moon, 1996) (cont’d) Initialization: θ0 Expectation Step: Maximization Step: Convergence Step: 2Σi Σi
Example (Moon, 1996) (cont’d)
Convergence properties of EM The solution depends on the initial estimate θ0 At each iteration, a value of θ is computed so that the likelihood function does not decrease. There is no guarantee that it will convergence to a global maximum. The algorithm is guaranteed to be stable. i.e., there is no chance of "overshooting" or diverging from the maximum.
Expectation-Maximization (EM) EM represents a general framework – works best in situations where the data is incomplete (or can be thought as being incomplete) Some creativity is required to recognize where the EM algorithm can be used. Standard method for estimating the parameters of Mixtures of Gaussians (MoG).
Mixture of 2D Gaussians - Example
Mixture Model π1 π2 π3 πk
Mixture of 1D Gaussians - Example π2=0.2 π1=0.3 π3=0.5
Mixture Parameters
Fitting a Mixture Model to a set of observations Dx Two fundamental problems: (1) Estimate the number of mixture components K (2) Estimate mixture parameters (πk , θk), k=1,2,…,K
Mixtures of Gaussians (see Chapter 10) where each p(x/θ)= The parameters θk are (μk,Σk)
Mixtures of Gaussians (cont’d) π1 π2 π3 πk
Estimating Mixture Parameters Using ML – not easy!
Estimating Mixture Parameters Using EM: Case of Unknown Means Assumptions Observation … but we don’t!
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) Introduce hidden or unobserved variables zi
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) Main steps using EM
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) Expectation Step
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) Expectation Step
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) Expectation Step
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) Expectation Step E(zik) is just the probability that xi was generated by the k-th component:
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) Maximization Step
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) Summary
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) Summary
Estimating Mixture Parameters Using EM: General Case Need to review Lagrange Optimization first …
Lagrange Optimization solve for x and λ g(x)=0 n+1 equations / n+1 unknowns
Lagrange Optimization (cont’d) Example Maximize f(x1,x2)=x1x2 subject to the constraint g(x1,x2)=x1+x2-1=0 3 equations / 3 unknowns
Estimating Mixture Parameters Using EM: General Case Introduce hidden or unobserved variables zi
Estimating Mixture Parameters Using EM: General Case (cont’d) Expectation Step
Estimating Mixture Parameters Using EM: General Case (cont’d) Expectation Step (cont’d)
Estimating Mixture Parameters Using EM: General Case (cont’d) Expectation Step (cont’d)
Estimating Mixture Parameters Using EM: General Case (cont’d) Maximization Step use Lagrange optimization
Estimating Mixture Parameters Using EM: General Case (cont’d) Maximization Step (cont’d)
Estimating Mixture Parameters Using EM: General Case (cont’d) Summary
Estimating Mixture Parameters Using EM: General Case (cont’d) Summary
Estimating the Number of Components K