Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 8, Friday June 8 th, 2007 (introduction.

Slides:



Advertisements
Similar presentations
Image Modeling & Segmentation
Advertisements

EMNLP, June 2001Ted Pedersen - EM Panel1 A Gentle Introduction to the EM Algorithm Ted Pedersen Department of Computer Science University of Minnesota.
Mixture Models and the EM Algorithm
An Introduction to the EM Algorithm Naala Brewer and Kehinde Salau.
Unsupervised Learning
Expectation Maximization Expectation Maximization A “Gentle” Introduction Scott Morris Department of Computer Science.
Expectation Maximization
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Mixture Language Models and EM Algorithm
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
Midterm Review. The Midterm Everything we have talked about so far Stuff from HW I won’t ask you to do as complicated calculations as the HW Don’t need.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
The EM algorithm (Part 1) LING 572 Fei Xia 02/23/06.
Lecture 5: Learning models using EM
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
Don't Compare Averages Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbrücken, Germany joint work with Ingmar Weber WEA 2005 May 10 – May 13,
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general.
EM Algorithm Presented By: Haiguang Li Computer Science Department University of Vermont Fall 2011.
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.
Expectation-Maximization
What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.
Unsupervised, Cont’d Expectation Maximization. Presentation tips Practice! Work on knowing what you’re going to say at each point. Know your own presentation.
EM Algorithm Likelihood, Mixture Models and Clustering.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 9, Friday June 15 th, 2007 (EM.
Machine Learning Saarland University, SS 2007 Holger Bast [with input from Ingmar Weber] Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture.
Probability theory: (lecture 2 on AMLbook.com)
Math 5364 Notes Chapter 8: Cluster Analysis Jesse Crawford Department of Mathematics Tarleton State University.
EM and expected complete log-likelihood Mixture of Experts
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Lecture 19: More EM Machine Learning April 15, 2010.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Lecture 17 Gaussian Mixture Models and Expectation Maximization
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,
HMM - Part 2 The EM algorithm Continuous density HMM.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
CS Statistical Machine learning Lecture 24
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,
Information Bottleneck versus Maximum Likelihood Felix Polyakov.
RADFORD M. NEAL GEOFFREY E. HINTON 발표: 황규백
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Information Bottleneck versus Maximum Likelihood Felix Polyakov.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Comp. Genomics Recitation 6 14/11/06 ML and EM.
Lecture 18 Expectation Maximization
Classification of unlabeled data:
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Latent Variables, Mixture Models and EM
Expectation-Maximization
Expectation Maximization Mixture Models HMMs
Introduction to EM algorithm
Important Distinctions in Learning BNs
Machine Learning for Signal Processing Expectation Maximization Mixture Models Bhiksha Raj Week Apr /18797.
EM Algorithm 主講人:虞台文.
Clustering (2) & EM algorithm
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 8, Friday June 8 th, 2007 (introduction to the EM algorithm)

Overview of this Lecture Example Application of the EM algorithm –concept-based text search (from Lecture 1) The maximum likelihood method –Example 1: heads and tails –Example 2: single Gaussian –Example 3: mixture of two Gaussians Idea of the EM algorithm –analogy to k-means –outline of algorithm –demo

Example: Concept-Based Search via EM Model –each document generated from one of k concepts –probability distribution over the concepts p 1, …,p k p 1 + … + p k = 1 –for each concept i probability distribution over the m words q i1, …, q im q i1 + … + q im = 1 Goal –compute p 1, …, p k, q 11, …, q 1m, …, q k1, …, q km which are “most likely” for the given data

Maximum Likelihood: Example 1 Sequence of coin flips HHTTTTTTHTTTTTHTTHHT –say 5 times H and 15 times T –which Prob(H) and Prob(T) are most likely? Formalization –Data X = (x 1, …, x n ), x i in {H,T} –Parameters Θ = (p H, p T ), p H + p T = 1 –Likelihood L(X,Θ) =p H h · p T t, h = #{i : x i = H}, t = #{i : x i = T} –Log Likelihood Q(X,Θ) = log L(X,Θ) = p H · log h + p T · log t –find Θ* = argmax Θ L(X,Θ) = argmax Θ Q(X,Θ) Solution –here p H = h / (h + t) and p T = t / (h + t) looks like Prob(H) = ¼ Prob(T) = ¾ simple calculus (see blackboard)

Maximum Likelihood: Example 2 Sequence of reals drawn from N(μ, σ) –which μ and σ are most likely? Formalization –Data X = (x 1, …, x n ), x i real number –Parameters Θ = (μ, σ) –Likelihood L(X,Θ) = π i 1/(sqrt(2 π) σ) · exp( - (x i - μ) 2 / 2σ 2 ) –Log Likelihood Q(X,Θ) = - n/2 · log(2 π) - n · log σ – Σ i (x i - μ) 2 / 2σ 2 –find Θ* = argmax Θ L(X,Θ) = argmax Θ Q(X,Θ) Solution –here μ = 1/n * Σ i x i and σ 2 = 1/n * Σ i (x i - μ) 2 simple calculus (see blackboard) normal distribution with mean μ and standard deviation σ

Maximum Likelihood: Example 3 Sequence of real numbers –each drawn from either N 1 (μ 1, σ 1 ) or N 2 (μ 2, σ 2 ) –from N 1 with prob p 1, and from N 2 with prob p 2 –which μ 1, σ 1, μ 2, σ 2, p 1, p 2 are most likely? Formalization –Data X = (x 1, …, x n ), x i real number –Hidden data Z = (z 1, …, z n ), z i = j iff x i drawn from N j –Parameters Θ = (μ 1, σ 1, μ 2, σ 2, p 1, p 2 ), p 1 + p 2 = 1 –Likelihood L(X,Θ) = [blackboard] –Log Likelihood Q(X,Θ) = [blackboard] –find Θ* = argmax Θ L(X,Θ) = argmax Θ Q(X,Θ) standard calculus fails (derivative of sum of logs of sum)

The EM algorithm EM = Expectation / Maximization –a generic method to solve maximum likelihood problems complicated by hidden variables … like in Example 3 Original paper –Maximum Likelihood from Incomplete Data via the EM algorithmMaximum Likelihood from Incomplete Data via the EM algorithm –Journal of the Royal Statistical Society, 39(1):1 – 38, 1977 –Arthur Dempster, Nan Laird, Donald RubinArthur DempsterNan LairdDonald Rubin –one of the most cited papers in computer sciencemost cited papers

The EM algorithm High level idea: k-means algorithm –in Example 3, assume we want to find only the means μ 1, …, μ k –start with a guess of the μ 1, …, μ k –“assign” each data point x i to the closest μ i –μ i  average of points assigned to μ i –and so on … DEMODEMO High level idea: EM –start with a guess of the parameters Θ –find “plausible” values for the hidden variables –compute maximum likelihood as in examples 1 and 2 –and so on … DEMODEMO

Literature The original paper –via JSTOR + I will make it accessible via the Wikivia JSTOR Wikipedia – Tutorials –Gentle Tutorial by Jeff BilmesGentle Tutorial by Jeff Bilmes –Explanation by Frank DellaertExplanation by Frank Dellaert Demos –k-means algorithm for clustering pointsk-means algorithm for clustering points –EM algorithm for mixture of Gaussians in 2DEM algorithm for mixture of Gaussians in 2D