Download presentation
Presentation is loading. Please wait.
Published byἹερεμίας Ιωαννίδης Modified over 6 years ago
1
CS 2750: Machine Learning Expectation Maximization
Prof. Adriana Kovashka University of Pittsburgh April 11, 2017
2
Plan for this lecture EM for Hidden Markov Models (last lecture)
EM for Gaussian Mixture Models EM in general K-means ~ GMM EM for Naïve Bayes
3
The Gaussian Distribution
Chris Bishop
4
Mixture of two Gaussians
Mixtures of Gaussians Old Faithful data set Single Gaussian Mixture of two Gaussians Chris Bishop
5
Mixtures of Gaussians Combine simple models into a complex model:
K=3 Component Mixing coefficient Chris Bishop
6
Mixtures of Gaussians Introducing latent variables z, one for each x
1-of-K representation: Also: Then p(x) = Σz p(x, z) = Σz p(z) p(x|z) = Figures from Chris Bishop
7
Mixtures of Gaussians Responsibility of component k for explaining x:
Figures from Chris Bishop
8
Generating samples Sample a value z* from p(z) then sample a value for x from p(x|z*) Color generated samples using z* (left) Color samples using responsibilities (right) Figures from Chris Bishop
9
Finding parameters of mixture
Want to maximize: Set derivative with respect to means to 0: Figures from Chris Bishop
10
Finding parameters of mixture
Set derivative wrt covariances to 0: Set derivative wrt mixing coefficients to 0: Figures from Chris Bishop
11
Reminder Responsibilities:
So parameters of GMM depend on responsibilities and vice versa… What can we do? Figures from Chris Bishop
12
Reminder: K-means clustering
Basic idea: randomly initialize the k cluster centers, and iterate: Randomly initialize the cluster centers, c1, ..., cK Given cluster centers, determine points in each cluster For each point p, find the closest ci. Put p into cluster i Given points in each cluster, solve for ci Set ci to be the mean of points in cluster i If ci have changed, repeat Step 2 Steve Seitz CS 376 Lecture 7 Segmentation
13
Iterative algorithm Figures from Chris Bishop
14
Figures from Chris Bishop
15
Initialization E step M step Figures from Chris Bishop
16
EM in general For models with hidden variables, the probability of the data X = {xn} depends on the hidden variables Z = {zn} Complete data set {X, Z} Incomplete data set (observed) X, have We want to maximize: Figures from Chris Bishop
17
EM in general We don’t know the values for the hidden variables Z
We also don’t know the model parameters θ Set one, infer the other Keep model parameters θ fixed, infer the hidden variables Z (compute conditional probability) Keep Z fixed, infer the model parameters θ (maximize expectation)
18
EM in general Figures from Chris Bishop
19
GMM in the context of general EM
Complete data log likelihood: Expectation of this likelihood: Choose initial values μold, Σold, πold, evaluate responsibilities γ(znk) Fix responsibilities, maximize expectation wrt μ, Σ, π Figures from Chris Bishop
20
K-means ~ GMM Consider GMM where all components have covariance matrix εI (I = the identity matrix): Responsibilities are then: As ε 0, most responsibilities 0, except γ(znj) 1 for whichever j has the smallest Figures from Chris Bishop
21
K-means ~ GMM Expectation of the complete-data log likelihood, which we want to maximize: Recall in K-means we wanted to minimize: where rnk = 1 if instance n belongs to cluster k, 0 otherwise Figures from Chris Bishop
22
Example: EM for Naïve Bayes
We have a graphical model: Flavor = class (hidden) Shape + color = features (observed) flavor shape color P(round, brown) 0.282 P(round, red) 0.139 P(square, brown) 0.141 P(square, red) 0.438 Total candies 1000 Log likelihood P(X|Θ) = 282*log(0.282) + … = Example from Rebecca Hwa
23
Example: EM for Naïve Bayes
Model parameters = conditional probability table, initialize it randomly P(straw) 0.59 P(choc) 0.41 P(round|straw) 0.96 P(square|straw) 0.04 P(brown|straw) 0.08 P(red|straw) 0.92 P(round|choc) 0.66 P(square|choc) 0.34 P(brown|choc) 0.72 P(red|choc) 0.28 Example from Rebecca Hwa
24
Example: EM for Naïve Bayes
Expectation step Keeping model (CPT) fixed, estimate values for joint probability distribution by multiplying P(flavor) * P(shape|flavor) * P(color|flavor) Then use these to compute conditional probability of the hidden variable flavor: P(straw|shape,color) = P(straw,shape,color) / P(shape,color) = P(straw,shape,color) / ΣflavorP(flavor,shape,color) Example from Rebecca Hwa
25
Example: EM for Naïve Bayes
Expectation step p(straw,round,brown) 0.05 p(straw,square,brown) 0.00 p(straw,round,red) 0.52 p(straw,square,red) 0.02 p(choc,round,brown) 0.19 p(choc,square,brown) 0.10 p(choc,round,red) 0.08 p(choc,square,red) 0.04 p(straw | round, brown) 0.19 p(straw | square, brown) 0.02 p(straw | round, red) 0.87 p(straw | square, red) 0.36 p(choc | round, brown) 0.81 p(choc | square, brown) 0.98 p(choc | round, red) 0.13 p(choc | square, red) 0.64 P(round, brown) 0.24 P(round, red) 0.60 P(square, brown) 0.10 P(square, red) 0.06 Log likelihood P(X|Θ) = 282*log(0.24) + … = Example from Rebecca Hwa
26
Example: EM for Naïve Bayes
Maximization step Expected # “strawberry, round, brown” candies = P(strawberry | round, brown) // from E step * # (round, brown) = 0.19 * 282 = 53.6 Can find other expected counts needed for CPT Example from Rebecca Hwa
27
Example: EM for Naïve Bayes
Maximization step exp # straw, round 174.56 exp. # straw, square 159.16 exp # straw, red 277.91 exp # straw, brown 55.81 exp # straw 333.72 exp # choc, round 246.44 exp # choc, square 419.84 exp # choc, red 299.09 exp # choc, brown 367.19 exp # choc 666.28 P(straw) 0.33 P(choc) 0.67 P(round|straw) 0.52 P(square|straw) 0.48 P(brown|straw) 0.17 P(red|straw) 0.83 P(round|choc) 0.37 P(square|choc) 0.63 P(brown|choc) 0.55 P(red|choc) 0.45 Example from Rebecca Hwa
28
Example: EM for Naïve Bayes
Maximization step Compute with new parameters (CPT) P(round, brown) P(round, red) P(square, brown) P(square, red) Compute likelihood P(X|Θ) Repeat until convergence Example from Rebecca Hwa
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.