Expectation Maximization Expectation Maximization A “Gentle” Introduction Scott Morris Department of Computer Science.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Image Modeling & Segmentation
EMNLP, June 2001Ted Pedersen - EM Panel1 A Gentle Introduction to the EM Algorithm Ted Pedersen Department of Computer Science University of Minnesota.
Copula Regression By Rahul A. Parsa Drake University &
Unsupervised Learning
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Expectation Maximization
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Visual Recognition Tutorial
EE-148 Expectation Maximization Markus Weber 5/11/99.
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
Maximum likelihood (ML) and likelihood ratio (LR) test
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Parametric Inference.
Gaussian Mixture Example: Start After First Iteration.
Today Today: Chapter 9 Assignment: 9.2, 9.4, 9.42 (Geo(p)=“geometric distribution”), 9-R9(a,b) Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Expectation-Maximization
Visual Recognition Tutorial
What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Linear and generalised linear models
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE)
EM Algorithm Likelihood, Mixture Models and Clustering.
Maximum likelihood (ML)
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Likelihood probability of observing the data given a model with certain parameters Maximum Likelihood Estimation (MLE) –find the parameter combination.
EM and expected complete log-likelihood Mixture of Experts
Model Inference and Averaging
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
HMM - Part 2 The EM algorithm Continuous density HMM.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
1 Standard error Estimated standard error,s,. 2 Example 1 While measuring the thermal conductivity of Armco iron, using a temperature of 100F and a power.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Other Models for Time Series. The Hidden Markov Model (HMM)
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
EM Algorithm 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction Example  Missing Data Example  Mixed Attributes Example  Mixture Main Body Mixture Model.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Conditional Expectation
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Data Modeling Patrice Koehl Department of Biological Sciences
Probability Theory and Parameter Estimation I
Lecture 18 Expectation Maximization
Classification of unlabeled data:
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Maximum Likelihood Estimation
Probabilistic Models for Linear Regression
Estimation Maximum Likelihood Estimates Industrial Engineering
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
EM Algorithm 主講人:虞台文.
Clustering (2) & EM algorithm
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Expectation Maximization Expectation Maximization A “Gentle” Introduction Scott Morris Department of Computer Science

Basic Premise Given a set of observed data, X, what is the underlying model that produced X?Given a set of observed data, X, what is the underlying model that produced X? –Example: distributions – Gaussian, Poisson, Uniform Assume we know (or can intuit) what type of model produced dataAssume we know (or can intuit) what type of model produced data Model has m parameters (Θ1..Θm)Model has m parameters (Θ1..Θm) –Parameters are unknown, we would like to estimate them

Maximum Likelihood Estimators (MLE) P(Θ|X) = Probability that a set of given parameters are “correct” ??P(Θ|X) = Probability that a set of given parameters are “correct” ?? Instead define “likelihood” of the parameters given the data, L(Θ|X)Instead define “likelihood” of the parameters given the data, L(Θ|X) What if data is continuous?

MLE continued We are solving an optimization problem Often solve log() of Likelihood instead. –Why is this the same? Any method that maximizes the likelihood function is called a Maximum Likelihood Estimator

Simple Example: Least Squares Fit Input: N points in R^2Input: N points in R^2 Model: A single line, y = ax+bModel: A single line, y = ax+b –Parameters: a, b Origin? Maximum Likelihood EstimatorOrigin? Maximum Likelihood Estimator Input: N points in R^2Input: N points in R^2 Model: A single line, y = ax+bModel: A single line, y = ax+b –Parameters: a, b Origin? Maximum Likelihood EstimatorOrigin? Maximum Likelihood Estimator

Expectation Maximization An elaborate technique for maximizing the likelihood functionAn elaborate technique for maximizing the likelihood function Often used when observed data is incompleteOften used when observed data is incomplete –Due to problems in observation process –Due to unknown or difficult distribution function(s) Iterative ProcessIterative Process Still a local techniqueStill a local technique

EM likelihood function Observed data X, assume missing data Y.Observed data X, assume missing data Y. Let Z be the complete dataLet Z be the complete data –Joint density function –P(z|Θ) = p(x,y|Θ) = p(y|x,Θ)p(x|Θ) Define new likelihood function L(Θ|Z) = p(X,Y|Θ)Define new likelihood function L(Θ|Z) = p(X,Y|Θ) X,Θ are constants, so L() is a random variable dependent on the random variable Y.X,Θ are constants, so L() is a random variable dependent on the random variable Y.

“E” Step of EM Algorithm Since L(Θ|Z) is itself a random variable, we can compute its expected value:Since L(Θ|Z) is itself a random variable, we can compute its expected value: Can be thought of as computing the expected value of Y given the current estimate of Θ.Can be thought of as computing the expected value of Y given the current estimate of Θ.

“M” step of EM Algorithm Once we have expectation computed, optimize Θ using the MLE. Convergence – Various results proving convergence cited. Generalized EM – Instead of finding optimal Θ, choose one that increases the MLE

Mixture Models Assume “mixture” of probability distributions: Log-likelihood function is difficult to optimize, use a trick: –Assume unobserved data items Y whose values inform us which distribution generated each item in X.

Update Equations After much derivation, estimates for new parameters in terms of old result: –Θ = (μ,Σ) Where μ is the mean and Σ is the variance of a d- dimensional normal distribution