Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 2750: Machine Learning Expectation Maximization

Similar presentations


Presentation on theme: "CS 2750: Machine Learning Expectation Maximization"— Presentation transcript:

1 CS 2750: Machine Learning Expectation Maximization
Prof. Adriana Kovashka University of Pittsburgh April 11, 2017

2 Plan for this lecture EM for Hidden Markov Models (last lecture)
EM for Gaussian Mixture Models EM in general K-means ~ GMM EM for Naïve Bayes

3 The Gaussian Distribution
Chris Bishop

4 Mixture of two Gaussians
Mixtures of Gaussians Old Faithful data set Single Gaussian Mixture of two Gaussians Chris Bishop

5 Mixtures of Gaussians Combine simple models into a complex model:
K=3 Component Mixing coefficient Chris Bishop

6 Mixtures of Gaussians Introducing latent variables z, one for each x
1-of-K representation: Also: Then p(x) = Σz p(x, z) = Σz p(z) p(x|z) = Figures from Chris Bishop

7 Mixtures of Gaussians Responsibility of component k for explaining x:
Figures from Chris Bishop

8 Generating samples Sample a value z* from p(z) then sample a value for x from p(x|z*) Color generated samples using z* (left) Color samples using responsibilities (right) Figures from Chris Bishop

9 Finding parameters of mixture
Want to maximize: Set derivative with respect to means to 0: Figures from Chris Bishop

10 Finding parameters of mixture
Set derivative wrt covariances to 0: Set derivative wrt mixing coefficients to 0: Figures from Chris Bishop

11 Reminder Responsibilities:
So parameters of GMM depend on responsibilities and vice versa… What can we do? Figures from Chris Bishop

12 Reminder: K-means clustering
Basic idea: randomly initialize the k cluster centers, and iterate: Randomly initialize the cluster centers, c1, ..., cK Given cluster centers, determine points in each cluster For each point p, find the closest ci. Put p into cluster i Given points in each cluster, solve for ci Set ci to be the mean of points in cluster i If ci have changed, repeat Step 2 Steve Seitz CS 376 Lecture 7 Segmentation

13 Iterative algorithm Figures from Chris Bishop

14 Figures from Chris Bishop

15 Initialization E step M step Figures from Chris Bishop

16 EM in general For models with hidden variables, the probability of the data X = {xn} depends on the hidden variables Z = {zn} Complete data set {X, Z} Incomplete data set (observed) X, have We want to maximize: Figures from Chris Bishop

17 EM in general We don’t know the values for the hidden variables Z
We also don’t know the model parameters θ Set one, infer the other Keep model parameters θ fixed, infer the hidden variables Z (compute conditional probability) Keep Z fixed, infer the model parameters θ (maximize expectation)

18 EM in general Figures from Chris Bishop

19 GMM in the context of general EM
Complete data log likelihood: Expectation of this likelihood: Choose initial values μold, Σold, πold, evaluate responsibilities γ(znk) Fix responsibilities, maximize expectation wrt μ, Σ, π Figures from Chris Bishop

20 K-means ~ GMM Consider GMM where all components have covariance matrix εI (I = the identity matrix): Responsibilities are then: As ε  0, most responsibilities  0, except γ(znj)  1 for whichever j has the smallest Figures from Chris Bishop

21 K-means ~ GMM Expectation of the complete-data log likelihood, which we want to maximize: Recall in K-means we wanted to minimize: where rnk = 1 if instance n belongs to cluster k, 0 otherwise Figures from Chris Bishop

22 Example: EM for Naïve Bayes
We have a graphical model: Flavor = class (hidden) Shape + color = features (observed) flavor shape color P(round, brown) 0.282 P(round, red) 0.139 P(square, brown) 0.141 P(square, red) 0.438 Total candies 1000 Log likelihood P(X|Θ) = 282*log(0.282) + … = Example from Rebecca Hwa

23 Example: EM for Naïve Bayes
Model parameters = conditional probability table, initialize it randomly P(straw) 0.59 P(choc) 0.41 P(round|straw) 0.96 P(square|straw) 0.04 P(brown|straw) 0.08 P(red|straw) 0.92 P(round|choc) 0.66 P(square|choc) 0.34 P(brown|choc) 0.72 P(red|choc) 0.28 Example from Rebecca Hwa

24 Example: EM for Naïve Bayes
Expectation step Keeping model (CPT) fixed, estimate values for joint probability distribution by multiplying P(flavor) * P(shape|flavor) * P(color|flavor) Then use these to compute conditional probability of the hidden variable flavor: P(straw|shape,color) = P(straw,shape,color) / P(shape,color) = P(straw,shape,color) / ΣflavorP(flavor,shape,color) Example from Rebecca Hwa

25 Example: EM for Naïve Bayes
Expectation step p(straw,round,brown) 0.05 p(straw,square,brown) 0.00 p(straw,round,red) 0.52 p(straw,square,red) 0.02 p(choc,round,brown) 0.19 p(choc,square,brown) 0.10 p(choc,round,red) 0.08 p(choc,square,red) 0.04 p(straw | round, brown) 0.19 p(straw | square, brown) 0.02 p(straw | round, red) 0.87 p(straw | square, red) 0.36 p(choc | round, brown) 0.81 p(choc | square, brown) 0.98 p(choc | round, red) 0.13 p(choc | square, red) 0.64 P(round, brown) 0.24 P(round, red) 0.60 P(square, brown) 0.10 P(square, red) 0.06 Log likelihood P(X|Θ) = 282*log(0.24) + … = Example from Rebecca Hwa

26 Example: EM for Naïve Bayes
Maximization step Expected # “strawberry, round, brown” candies = P(strawberry | round, brown) // from E step * # (round, brown) = 0.19 * 282 = 53.6 Can find other expected counts needed for CPT Example from Rebecca Hwa

27 Example: EM for Naïve Bayes
Maximization step exp # straw, round 174.56 exp. # straw, square 159.16 exp # straw, red 277.91 exp # straw, brown 55.81 exp # straw 333.72 exp # choc, round 246.44 exp # choc, square 419.84 exp # choc, red 299.09 exp # choc, brown 367.19 exp # choc 666.28 P(straw) 0.33 P(choc) 0.67 P(round|straw) 0.52 P(square|straw) 0.48 P(brown|straw) 0.17 P(red|straw) 0.83 P(round|choc) 0.37 P(square|choc) 0.63 P(brown|choc) 0.55 P(red|choc) 0.45 Example from Rebecca Hwa

28 Example: EM for Naïve Bayes
Maximization step Compute with new parameters (CPT) P(round, brown) P(round, red) P(square, brown) P(square, red) Compute likelihood P(X|Θ) Repeat until convergence Example from Rebecca Hwa


Download ppt "CS 2750: Machine Learning Expectation Maximization"

Similar presentations


Ads by Google