Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 8, Friday June 8 th, 2007 (introduction.

Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 8, Friday June 8 th, 2007 (introduction to the EM algorithm)

Overview of this Lecture Example Application of the EM algorithm –concept-based text search (from Lecture 1) The maximum likelihood method –Example 1: heads and tails –Example 2: single Gaussian –Example 3: mixture of two Gaussians Idea of the EM algorithm –analogy to k-means –outline of algorithm –demo

Example: Concept-Based Search via EM Model –each document generated from one of k concepts –probability distribution over the concepts p 1, …,p k p 1 + … + p k = 1 –for each concept i probability distribution over the m words q i1, …, q im q i1 + … + q im = 1 Goal –compute p 1, …, p k, q 11, …, q 1m, …, q k1, …, q km which are “most likely” for the given data

Maximum Likelihood: Example 1 Sequence of coin flips HHTTTTTTHTTTTTHTTHHT –say 5 times H and 15 times T –which Prob(H) and Prob(T) are most likely? Formalization –Data X = (x 1, …, x n ), x i in {H,T} –Parameters Θ = (p H, p T ), p H + p T = 1 –Likelihood L(X,Θ) =p H h · p T t, h = #{i : x i = H}, t = #{i : x i = T} –Log Likelihood Q(X,Θ) = log L(X,Θ) = p H · log h + p T · log t –find Θ* = argmax Θ L(X,Θ) = argmax Θ Q(X,Θ) Solution –here p H = h / (h + t) and p T = t / (h + t) looks like Prob(H) = ¼ Prob(T) = ¾ simple calculus (see blackboard)

Maximum Likelihood: Example 2 Sequence of reals drawn from N(μ, σ) –which μ and σ are most likely? Formalization –Data X = (x 1, …, x n ), x i real number –Parameters Θ = (μ, σ) –Likelihood L(X,Θ) = π i 1/(sqrt(2 π) σ) · exp( - (x i - μ) 2 / 2σ 2 ) –Log Likelihood Q(X,Θ) = - n/2 · log(2 π) - n · log σ – Σ i (x i - μ) 2 / 2σ 2 –find Θ* = argmax Θ L(X,Θ) = argmax Θ Q(X,Θ) Solution –here μ = 1/n * Σ i x i and σ 2 = 1/n * Σ i (x i - μ) 2 simple calculus (see blackboard) normal distribution with mean μ and standard deviation σ

Maximum Likelihood: Example 3 Sequence of real numbers –each drawn from either N 1 (μ 1, σ 1 ) or N 2 (μ 2, σ 2 ) –from N 1 with prob p 1, and from N 2 with prob p 2 –which μ 1, σ 1, μ 2, σ 2, p 1, p 2 are most likely? Formalization –Data X = (x 1, …, x n ), x i real number –Hidden data Z = (z 1, …, z n ), z i = j iff x i drawn from N j –Parameters Θ = (μ 1, σ 1, μ 2, σ 2, p 1, p 2 ), p 1 + p 2 = 1 –Likelihood L(X,Θ) = [blackboard] –Log Likelihood Q(X,Θ) = [blackboard] –find Θ* = argmax Θ L(X,Θ) = argmax Θ Q(X,Θ) standard calculus fails (derivative of sum of logs of sum)

The EM algorithm EM = Expectation / Maximization –a generic method to solve maximum likelihood problems complicated by hidden variables … like in Example 3 Original paper –Maximum Likelihood from Incomplete Data via the EM algorithmMaximum Likelihood from Incomplete Data via the EM algorithm –Journal of the Royal Statistical Society, 39(1):1 – 38, 1977 –Arthur Dempster, Nan Laird, Donald RubinArthur DempsterNan LairdDonald Rubin –one of the most cited papers in computer sciencemost cited papers

The EM algorithm High level idea: k-means algorithm –in Example 3, assume we want to find only the means μ 1, …, μ k –start with a guess of the μ 1, …, μ k –“assign” each data point x i to the closest μ i –μ i  average of points assigned to μ i –and so on … DEMODEMO High level idea: EM –start with a guess of the parameters Θ –find “plausible” values for the hidden variables –compute maximum likelihood as in examples 1 and 2 –and so on … DEMODEMO

Literature The original paper –via JSTOR + I will make it accessible via the Wikivia JSTOR Wikipedia –http://en.wikipedia.org/wiki/EM_Algorithmhttp://en.wikipedia.org/wiki/EM_Algorithm Tutorials –Gentle Tutorial by Jeff BilmesGentle Tutorial by Jeff Bilmes –Explanation by Frank DellaertExplanation by Frank Dellaert Demos –k-means algorithm for clustering pointsk-means algorithm for clustering points –EM algorithm for mixture of Gaussians in 2DEM algorithm for mixture of Gaussians in 2D

Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 8, Friday June 8 th, 2007 (introduction.

Similar presentations

Presentation on theme: "Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 8, Friday June 8 th, 2007 (introduction."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 8, Friday June 8 th, 2007 (introduction.

Similar presentations

Presentation on theme: "Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 8, Friday June 8 th, 2007 (introduction."— Presentation transcript:

Similar presentations

About project

Feedback