A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

Slides:



Advertisements
Similar presentations
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.
Advertisements

Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Image Modeling & Segmentation
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Expectation Maximization Expectation Maximization A “Gentle” Introduction Scott Morris Department of Computer Science.
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Visual Recognition Tutorial
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Lecture 5: Learning models using EM
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general.
Expectation-Maximization
Visual Recognition Tutorial
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Isolated-Word Speech Recognition Using Hidden Markov Models
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Lecture 19: More EM Machine Learning April 15, 2010.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
HMM - Part 2 The EM algorithm Continuous density HMM.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
CS Statistical Machine learning Lecture 24
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Lecture 2: Statistical learning primer for biologists
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep Some.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Other Models for Time Series. The Hidden Markov Model (HMM)
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Comp. Genomics Recitation 6 14/11/06 ML and EM.
Hidden Markov Models.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Classification of unlabeled data:
Extended Baum-Welch algorithm
Statistical Models for Automatic Speech Recognition
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Hidden Markov Models - Training
Computational NeuroEngineering Lab
Latent Variables, Mixture Models and EM
Expectation-Maximization
Bayesian Models in Machine Learning
Introduction to EM algorithm
Statistical Models for Automatic Speech Recognition
Hidden Markov Model LR Rabiner
Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models.
LECTURE 15: REESTIMATION, EM AND MIXTURES
Introduction to HMM (cont)
EM Algorithm 主講人:虞台文.
Qiang Huo(*) and Chorkin Chan(**)
Clustering (2) & EM algorithm
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International Computer Science Institute Berkeley CA, and Computer Science Division Department of Electrical Engineering and Computer Science U.C. Berkeley April 1998 Presenter : Hsu Ting-Wei

Outline Abstract Maximum-likelihood Basic EM Finding Maximum Likelihood Mixture Densities Parameters via EM Learning the parameters of an HMM, EM, and the Baum-Welch algorithm 2019/9/8 NTNU Speech Lab

Abstract Using Expectation-Maximization (EM) algorithm to solve the maximum-likelihood (ML) parameter estimation problem EM parameter estimation procedure for two applications: Finding the parameters of a mixture of Gaussian densities Finding the parameters of a hidden Markov model (HMM) (i.e., the Baum-Welch algorithm) for both discrete and Gaussian mixture observation models Trying to emphasize intuition rather than mathematical rigor 2019/9/8 NTNU Speech Lab

Maximum-likelihood likelihood function density function i.i.d. incomplete data easy to count We can set the derivative of to zero to find the max value. But if the equation can’t be solved by this method ,we should take EM algorithm. max value 2019/9/8 NTNU Speech Lab

Basic EM Two main applications of the EM algorithm: When the data indeed has missing values, due to problems with or limitations of the observation process. The second occurs when optimizing the likelihood function is analytically intractable but when the likelihood function can be simplified by assuming the existence of and values for additional but missing (or hidden) parameters. more common in the computational pattern recognition community 2019/9/8 NTNU Speech Lab

Basic EM (cont.) = complete-data likelihood function joint density function = complete data set observed values missing, unknow values, constant random variable incomplete-data likelihood function : 2019/9/8 NTNU Speech Lab

Basic EM (cont.) The EM algorithm first finds the expected value of the complete-data log-likelihood function E-step : evaluation of expectation new parameter current parameter constant random variable and governed by the distribution Recall: space of values y 2019/9/8 NTNU Speech Lab

Basic EM (cont.) M-Step : maximize the expectation These two step are repeated as necessary. Each iteration is guaranteed to increase the log likelihood and the algorithm is guaranteed to converge to a local maximum of the likelihood function. 2019/9/8 NTNU Speech Lab

目標函數最大化 目標函數 目前模型參數 2019/9/8 NTNU Speech Lab

找輔助函數的一般式 目標函數 輔助函數 2019/9/8 NTNU Speech Lab

在 找”最佳的”輔助函數 目標函數 “最佳的”輔助函數 2019/9/8 NTNU Speech Lab

對輔助函數求全域最大值 目標函數 2019/9/8 NTNU Speech Lab

對輔助函數求全域最大值 (Cont.) 目標函數 2019/9/8 NTNU Speech Lab

重覆剛才的步驟 目標函數 2019/9/8 NTNU Speech Lab

在 找”最佳的”輔助函數 目標函數 “最佳的”輔助函數 2019/9/8 NTNU Speech Lab

對輔助函數求全域最大值 目標函數 2019/9/8 NTNU Speech Lab

Finding Maximum Likelihood Mixture Densities Parameters via EM probabilistic model: density function M component densities mixed together with M mixing coefficients . incomplete-data log-likelihood expression : difficult to optimize because it contains the log of the sum. 2019/9/8 NTNU Speech Lab

Finding Maximum Likelihood Mixture Densities Parameters via EM (cont.) If we know the values of y , the likelihood becomes: If we do not know the values of y ,and y is a random vector : 2019/9/8 NTNU Speech Lab

Finding Maximum Likelihood Mixture Densities Parameters via EM (cont.) E-step : =1 2019/9/8 NTNU Speech Lab

Finding Maximum Likelihood Mixture Densities Parameters via EM (cont.) E-step : M-step: add Lagrange multiplier and 2019/9/8 NTNU Speech Lab

Finding Maximum Likelihood Mixture Densities Parameters via EM (cont.) E-step : M-step: Recall: (*) derivate (*) 將 代回(*),整理如下… 2019/9/8 NTNU Speech Lab

Finding Maximum Likelihood Mixture Densities Parameters via EM (cont.) E-step : M-step: (*) derivate (*) 2019/9/8 NTNU Speech Lab

Learning the parameters of an HMM, EM, and the Baum-Welch algorithm A Hidden Markov Model is a probabilistic model of the joint probability of a collection of random variables The model is and Two assumption First-order assumption Output independent assumption Three basic problems continuous or discrete observations “hidden” and discrete 2019/9/8 NTNU Speech Lab

Learning the parameters of an HMM, EM, and the Baum-Welch algorithm (cont.) Estimation formula using the Q function. incomplete-data likelihood function E-step: complete-data likelihood function discrete We know 2019/9/8 NTNU Speech Lab

Learning the parameters of an HMM, EM, and the Baum-Welch algorithm (cont.) M-step in discrete: add Lagrange multiplier and 2019/9/8 NTNU Speech Lab

Learning the parameters of an HMM, EM, and the Baum-Welch algorithm (cont.) M-step in discrete: add Lagrange multiplier and 2019/9/8 NTNU Speech Lab

Learning the parameters of an HMM, EM, and the Baum-Welch algorithm (cont.) M-step in discrete: add Lagrange multiplier and 2019/9/8 NTNU Speech Lab

Learning the parameters of an HMM, EM, and the Baum-Welch algorithm (cont.) M-step in continuous: the mixture component for each state at each time 2019/9/8 NTNU Speech Lab