Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clustering (2) & EM algorithm

Similar presentations


Presentation on theme: "Clustering (2) & EM algorithm"— Presentation transcript:

1 Clustering (2) & EM algorithm
Model-based clustering EM algorithm Data Clustering by Gan et al. Machine Learning, a Probabilistic Perspective, The Expectation Maximization Algorithm, A short tutorial, by Borman

2 Model-based clustering
Impose certain model assumptions on potential clusters; try to optimize the fit between data and model. The data is viewed as coming from a mixture of probability distributions; each of the distributions represents a cluster.

3 Model-based clustering
For example, if we believe the data come from a mixture of several Gaussian densities, the likelihood that data point i is from cluster j is: Classification likelihood approach: find cluster assignments and parameters that maximize

4 Model-based clustering
Mixture likelihood approach: The most commonly used method is the EM algorithm. It iterates between soft cluster assignment and parameter estimation.

5 EM algorithm In maximum likelihood estimation, the likelihood function is a function of the parameter θ given the data X, EM algorithm is an iterative procedure for maximizing L(θ) After the nth iteration, the current estimate for is θn. We want an update θn+1 that maximizes In many problems, there are unobserved variables - hidden random vector Z. Then In clustering, z is the soft cluster assignment.

6 EM algorithm

7 EM algorithm -ln() is convex

8 EM algorithm This is proportional to the expectation of ln[P(X, z|θ)], over the distribution of z|X, θn

9 EM algorithm Thus at every θn, we find the conditional distribution of the hidden variables z, the taking expectation over this distribution to find the θn+1 that maximizes the likelihood.

10 EM algorithm Convergence of EM algorithm. At every step,
θn+1 is the maximizer of So, Thus the likelihood L(θ) is strictly non-decreasing. Most of the time, EM will converge to a local maximum. But it can jump out of the closest local maximum.

11 EM algorithm Nature Biotechnology volume 26, pages 897–899 (2008)

12 EM algorithm Example: the 2 coin problem.
Scenario 1: no missing value: Nature Biotechnology volume 26, pages 897–899 (2008)

13 EM algorithm Scenario 2: missing which coin is tossed:
Nature Biotechnology volume 26, pages 897–899 (2008)

14 Model-based clustering
EM algorithm in the simplest case: two component Gaussian in 1D

15 Model-based clustering

16 Model-based clustering

17 Model-based clustering

18 Model-based clustering
Gaussian cluster models. E step: M step:

19 Model-based clustering
Common assumptions: From 1 to 4, the model becomes more flexible, yet more parameters need to be estimated. May become less stable.

20 Model-based clustering
Example: Mixture of multinoullis


Download ppt "Clustering (2) & EM algorithm"

Similar presentations


Ads by Google