Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 8, Friday June 8 th, 2007 (introduction.

Similar presentations


Presentation on theme: "Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 8, Friday June 8 th, 2007 (introduction."— Presentation transcript:

1 Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 8, Friday June 8 th, 2007 (introduction to the EM algorithm)

2 Overview of this Lecture Example Application of the EM algorithm –concept-based text search (from Lecture 1) The maximum likelihood method –Example 1: heads and tails –Example 2: single Gaussian –Example 3: mixture of two Gaussians Idea of the EM algorithm –analogy to k-means –outline of algorithm –demo

3 Example: Concept-Based Search via EM Model –each document generated from one of k concepts –probability distribution over the concepts p 1, …,p k p 1 + … + p k = 1 –for each concept i probability distribution over the m words q i1, …, q im q i1 + … + q im = 1 Goal –compute p 1, …, p k, q 11, …, q 1m, …, q k1, …, q km which are “most likely” for the given data

4 Maximum Likelihood: Example 1 Sequence of coin flips HHTTTTTTHTTTTTHTTHHT –say 5 times H and 15 times T –which Prob(H) and Prob(T) are most likely? Formalization –Data X = (x 1, …, x n ), x i in {H,T} –Parameters Θ = (p H, p T ), p H + p T = 1 –Likelihood L(X,Θ) =p H h · p T t, h = #{i : x i = H}, t = #{i : x i = T} –Log Likelihood Q(X,Θ) = log L(X,Θ) = p H · log h + p T · log t –find Θ* = argmax Θ L(X,Θ) = argmax Θ Q(X,Θ) Solution –here p H = h / (h + t) and p T = t / (h + t) looks like Prob(H) = ¼ Prob(T) = ¾ simple calculus (see blackboard)

5 Maximum Likelihood: Example 2 Sequence of reals drawn from N(μ, σ) –which μ and σ are most likely? Formalization –Data X = (x 1, …, x n ), x i real number –Parameters Θ = (μ, σ) –Likelihood L(X,Θ) = π i 1/(sqrt(2 π) σ) · exp( - (x i - μ) 2 / 2σ 2 ) –Log Likelihood Q(X,Θ) = - n/2 · log(2 π) - n · log σ – Σ i (x i - μ) 2 / 2σ 2 –find Θ* = argmax Θ L(X,Θ) = argmax Θ Q(X,Θ) Solution –here μ = 1/n * Σ i x i and σ 2 = 1/n * Σ i (x i - μ) 2 simple calculus (see blackboard) normal distribution with mean μ and standard deviation σ

6 Maximum Likelihood: Example 3 Sequence of real numbers –each drawn from either N 1 (μ 1, σ 1 ) or N 2 (μ 2, σ 2 ) –from N 1 with prob p 1, and from N 2 with prob p 2 –which μ 1, σ 1, μ 2, σ 2, p 1, p 2 are most likely? Formalization –Data X = (x 1, …, x n ), x i real number –Hidden data Z = (z 1, …, z n ), z i = j iff x i drawn from N j –Parameters Θ = (μ 1, σ 1, μ 2, σ 2, p 1, p 2 ), p 1 + p 2 = 1 –Likelihood L(X,Θ) = [blackboard] –Log Likelihood Q(X,Θ) = [blackboard] –find Θ* = argmax Θ L(X,Θ) = argmax Θ Q(X,Θ) standard calculus fails (derivative of sum of logs of sum)

7 The EM algorithm EM = Expectation / Maximization –a generic method to solve maximum likelihood problems complicated by hidden variables … like in Example 3 Original paper –Maximum Likelihood from Incomplete Data via the EM algorithmMaximum Likelihood from Incomplete Data via the EM algorithm –Journal of the Royal Statistical Society, 39(1):1 – 38, 1977 –Arthur Dempster, Nan Laird, Donald RubinArthur DempsterNan LairdDonald Rubin –one of the most cited papers in computer sciencemost cited papers

8 The EM algorithm High level idea: k-means algorithm –in Example 3, assume we want to find only the means μ 1, …, μ k –start with a guess of the μ 1, …, μ k –“assign” each data point x i to the closest μ i –μ i  average of points assigned to μ i –and so on … DEMODEMO High level idea: EM –start with a guess of the parameters Θ –find “plausible” values for the hidden variables –compute maximum likelihood as in examples 1 and 2 –and so on … DEMODEMO

9 Literature The original paper –via JSTOR + I will make it accessible via the Wikivia JSTOR Wikipedia –http://en.wikipedia.org/wiki/EM_Algorithmhttp://en.wikipedia.org/wiki/EM_Algorithm Tutorials –Gentle Tutorial by Jeff BilmesGentle Tutorial by Jeff Bilmes –Explanation by Frank DellaertExplanation by Frank Dellaert Demos –k-means algorithm for clustering pointsk-means algorithm for clustering points –EM algorithm for mixture of Gaussians in 2DEM algorithm for mixture of Gaussians in 2D


Download ppt "Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 8, Friday June 8 th, 2007 (introduction."

Similar presentations


Ads by Google