EM Algorithm 主講人:虞台文.

Slides:



Advertisements
Similar presentations
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.
Advertisements

Image Modeling & Segmentation
EMNLP, June 2001Ted Pedersen - EM Panel1 A Gentle Introduction to the EM Algorithm Ted Pedersen Department of Computer Science University of Minnesota.
Expectation Maximization Expectation Maximization A “Gentle” Introduction Scott Morris Department of Computer Science.
Expectation Maximization
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Supervised Learning Recap
Hidden Markov Model 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction – Markov Chain – Hidden Markov Model (HMM) Formal Definition of HMM & Problems Estimate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Visual Recognition Tutorial
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.
Visual Recognition Tutorial
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Gaussian Mixture Model and the EM algorithm in Speech Recognition
EM and expected complete log-likelihood Mixture of Experts
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
B AYESIAN L EARNING & G AUSSIAN M IXTURE M ODELS Jianping Fan Dept of Computer Science UNC-Charlotte.
Lecture 19: More EM Machine Learning April 15, 2010.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
HMM - Part 2 The EM algorithm Continuous density HMM.
Intro. ANN & Fuzzy Systems Lecture 23 Clustering (4)
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
CS Statistical Machine learning Lecture 24
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Lecture 2: Statistical learning primer for biologists
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep Some.
RADFORD M. NEAL GEOFFREY E. HINTON 발표: 황규백
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
EM Algorithm 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction Example  Missing Data Example  Mixed Attributes Example  Mixture Main Body Mixture Model.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Lecture 18 Expectation Maximization
Parameter Estimation 主講人:虞台文.
Classification of unlabeled data:
Statistical Models for Automatic Speech Recognition
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Latent Variables, Mixture Models and EM
Expectation-Maximization
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Igor V. Cadez, Padhraic Smyth, Geoff J. Mclachlan, Christine and E
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
Statistical Models for Automatic Speech Recognition
SMEM Algorithm for Mixture Models
EM for Inference in MV Data
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Topic Models in Text Processing
Learning From Observed Data
EM for Inference in MV Data
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Clustering (2) & EM algorithm
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

EM Algorithm 主講人:虞台文

Contents Introduction Example  Missing Data Example  Mixed Attributes Example  Mixture Main Body Mixture Model EM-Algorithm on GMM

EM Algorithm Introduction

Introduction EM is typically used to compute maximum likelihood estimates given incomplete samples. The EM algorithm estimates the parameters of a model iteratively. Starting from some initial guess, each iteration consists of an E step (Expectation step) an M step (Maximization step)

Applications Filling in missing data in samples Discovering the value of latent variables Estimating the parameters of HMMs Estimating parameters of finite mixtures Unsupervised learning of clusters …

EM Algorithm Example  Missing Data

Univariate Normal Sample   Sampling

Given x, it is a function of  and 2 Maximum Likelihood   Sampling We want to maximize it. Given x, it is a function of  and 2

Log-Likelihood Function Maximize this instead By setting and

Max. the Log-Likelihood Function

Max. the Log-Likelihood Function

Miss Data Missing data   Sampling

E-Step be the estimated parameters at the initial of the tth iterations Let

E-Step be the estimated parameters at the initial of the tth iterations Let

M-Step be the estimated parameters at the initial of the tth iterations Let

Exercise n = 40 (10 data missing) Estimate using different initial conditions. 375.081556 362.275902 332.612068 351.383048 304.823174 386.438672 430.079689 395.317406 369.029845 365.343938 243.548664 382.789939 374.419161 337.289831 418.928822 364.086502 343.854855 371.279406 439.241736 338.281616 454.981077 479.685107 336.634962 407.030453 297.821512 311.267105 528.267783 419.841982 392.684770 301.910093

Example  Mixed Attributes EM Algorithm Example  Mixed Attributes

Multinomial Population Sampling N samples

Maximum Likelihood Sampling N samples

Maximum Likelihood Sampling N samples We want to maximize it.

Log-Likelihood

Mixed Attributes Sampling N samples x3 is not available

E-Step N samples Given (t), what can you say about x3? Sampling N samples x3 is not available Given (t), what can you say about x3?

M-Step

Exercise Estimate  using different initial conditions?

EM Algorithm Example: Mixture

Binomial/Poison Mixture M : married obasong X : # Children Binomial/Poison Mixture # Children n0 n1 n2 n3 n4 n5 n6 # Obasongs Married Obasongs Unmarried Obasongs (No Children)

Binomial/Poison Mixture M : married obasong X : # Children Binomial/Poison Mixture # Children n0 n1 n2 n3 n4 n5 n6 # Obasongs Married Obasongs Unmarried Obasongs (No Children) Unobserved data: nA : # married Ob’s nB : # unmarried Ob’s

Binomial/Poison Mixture M : married obasong X : # Children Binomial/Poison Mixture # Children n0 n1 n2 n3 n4 n5 n6 # Obasongs Complete data n1 n2 n3 n4 n5 n6 Probability pA, pB p1 p2 p3 p4 p5 p6

Binomial/Poison Mixture # Children n0 n1 n2 n3 n4 n5 n6 # Obasongs Complete data n1 n2 n3 n4 n5 n6 Probability pA, pB p1 p2 p3 p4 p5 p6

Complete Data Likelihood # Children n0 n1 n2 n3 n4 n5 n6 # Obasongs Complete data n1 n2 n3 n4 n5 n6 Probability pA, pB p1 p2 p3 p4 p5 p6

Complete Data Likelihood # Children n0 n1 n2 n3 n4 n5 n6 # Obasongs Complete data n1 n2 n3 n4 n5 n6 Probability pA, pB p1 p2 p3 p4 p5 p6

Log-Likelihood

Maximization

Maximization

E-Step Given

M-Step

Example # Obasongs # Children t   nA nB 3,062 587 284 103 33 4 2 0 0.750000 0.400000 2502.779 559.221 1 0.614179 1.035478 2503.591 558.409 2 0.614378 1.036013 2504.219 557.781 3 0.614532 1.036427 2504.705 557.295 4 0.614652 1.036748 2505.081 556.919 5 0.614744 1.036996 2505.371 556.629 t   nA nB

EM Algorithm Main Body

Maximum Likelihood 

Latent Variables Incomplete Data  Complete Data

Complete Data Likelihood 

Complete Data Likelihood A function of latent variable Y and parameter  A function of parameter  A function of random variable Y. The result is in term of random variable Y. If we are given , Computable

Expectation Step Define Let (i1) be the parameter vector obtained at the (i1)th step. Define

Maximization Step Define Let (i1) be the parameter vector obtained at the (i1)th step. Define

EM Algorithm Mixture Model

Mixture Models If there is a reason to believe that a data set is comprised of several distinct populations, a mixture model can be used. It has the following form: with

Mixture Models  Let yi{1,…, M} represents the source that generates the data.

Mixture Models  Let yi{1,…, M} represents the source that generates the data.

Mixture Models 

Mixture Models

Given x and , the conditional density of y can be computed. Mixture Models Given x and , the conditional density of y can be computed.

Complete-Data Likelihood Function 

Expectation g: Guess

Expectation g: Guess

Expectation Zero when yi  l

Expectation

Expectation

Expectation 1

Maximization Given the initial guess g, We want to find , to maximize the above expectation. In fact, iteratively.

The GMM (Guassian Mixture Model) Guassian model of a d-dimensional source, say j : GMM with M sources:

EM Algorithm EM-Algorithm on GMM

Goal Mixture Model subject to To maximize:

Goal Mixture Model Correlated with l only. Correlated with l only. subject to To maximize:

Finding l Due to the constraint on l’s, we introduce Lagrange Multiplier , and solve the following equation.

Finding l 1 N 1

Finding l

Only need to maximize this term Finding l Consider GMM unrelated

Finding l How? Therefore, we want to maximize: Only need to maximize this term Finding l Therefore, we want to maximize: How? knowledge on matrix algebra is needed. unrelated

Finding l Therefore, we want to maximize:

Summary EM algorithm for GMM Given an initial guess g, find new as follows Not converge

Demonstration EM algorithm for Mixture models

Exercises Write a program to generate multidimensional Gaussian distribution. Draw the distribution for 2-dim data. Write a program to generate GMM. Write EM-algorithm to analyze GMM data. Study more EM-algorithm for mixture. Find applications for EM-algorithm.

References A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models (1998), Jeff Bilmes The Expectation Maximization Algorithm: A short tutorial, Sean Borman.