Presentation is loading. Please wait.

Presentation is loading. Please wait.

Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.

Similar presentations


Presentation on theme: "Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture."— Presentation transcript:

1 Incomplete Graphical Models Nan Hu

2 Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture Regression and classification EM on conditional mixture A general formulation of EM Algorithm

3 K-means clustering Problem: Given a set of observations how to group them into a set of K clustering, supposing the value of K is given. First Phase Second Phase

4 K-means clustering Original Set First Iteration Second Iteration Third Iteration

5 K-means clustering Coordinate descent algorithm The algorithm is trying to minimize distortion measure J by setting the partial derivatives to zero

6 Unconditional Mixture Problem: If the given sample data demonstrate multimodal densities, how to estimate the true density? Fit a single density with this bimodal case. Although algorithm converges, the results bear little relationship to the truth.

7 Unconditional Mixture A “divide-and-conquer” way to solve this problem Introducing latent variable Z Z X Multinomial node taking on one of K values Assign a density model for each subpopulation, overall density is Back

8 Unconditional Mixture Gaussian Mixture Models In this model, the mixture components are Gaussian distributions with parameters Probability model for a Gaussian mixture

9 Unconditional Mixture Posterior probability of latent variable Z: Log likelihood:

10 Unconditional Mixture Partial derivative of over using Lagrange Multipliers Solve it, we have

11 Unconditional Mixture Partial derivative of over Setting it to zero, we have

12 Unconditional Mixture Partial derivative of over Setting it to zero, we have

13 Unconditional Mixture The EM Algorithm First Phase Second Phase Back

14 Unconditional Mixture EM algorithm from expected complete log likelihood point of view Suppose we observed the latent variables, the data set becomes completely observed, the likelihood is defined as the complete log likelihood

15 Unconditional Mixture We treat the as random variables and take expectations conditioned on X and. Note are binary r.v., where Use this as the “best guess” for, we have Expected complete log likelihood

16 Unconditional Mixture Minimizing expected complete log likelihood by setting the derivatives to zero, we have

17 Conditional Mixture Graphical Model X Z Y Latent variable Z, multinomial node taking on one of K values For regression and classification The relationship between X and Z can be modeled in a discriminative classification way, e.g. softmax func. Back

18 Conditional Mixture By marginalizing over Z, X is taken to be always observed. The posterior probability is defined as

19 Conditional Mixture Some specific choice of mixture components Gaussian components Logistic components Where is the logistic function:

20 Conditional Mixture Parameter estimation via EM Complete log likelihood : Use expectation as the “best guess”, we have

21 Conditional Mixture The expected complete log likelihood can then be written as Taking partial derivatives and setting them to zero to find the update formula for EM

22 Conditional Mixture Summary of EM algorithm for conditional mixture (E step): Calculate the posterior probabilities (M step): Use the IRLS algorithm to update the parameter, base on data pairs. (M step): Use the weighted IRLS algorithm to update the parameters, based on the data points, with weights. Back

23 General Formulation - all observable variables - all latent variables - all parameters Suppose is observed, the ML estimate is However, is in fact not observed Complete log likelihood Incomplete log likelihood

24 General Formulation Suppose factors in some way, complete log likelihood turns to be Since is unknown, it’s not clear how to solve this ML estimation. However, if we average over the r.v. of

25 General Formulation Use as an estimate of, complete log likelihood becomes expected complete log likelihood This expected complete log likelihood becomes solvable, and hopefully, it’ll also improve the complete log likelihood in some way. (The basic idea behind EM.)

26 General Formulation EM maximizes incomplete log likelihood Jensen’s Inequality Auxiliary Function

27 General Formulation Given, maximizing is equal to maximizing the expected complete log likelihood

28 General Formulation Given, the choice yields the maximum of. Note: is the upper bound of

29 General Formulation From above, at every step of EM, we maximized. However, how do we know whether the finally maximized also maximized incomplete log likelihood ?

30 General Formulation The different between and KL divergence non-negative and uniquely minimized at

31 General Formulation EM and alternating minimization Recall the maximization of the likelihood is exactly the same as minimization of KL divergence between the empirical distribution and the model. Including the latent variable, KL divergence comes to be a “complete KL divergence” between joint distributions on.complete KL divergence

32 General Formulation Back

33 General Formulation Reformulated EM algorithm (E step) (M step) Alternating minimization algorithm

34 Summary Unconditional Mixture Graphic model EM algorithm Conditional Mixture Graphic model EM algorithm A general formulation of EM algorithm Maximizing auxiliary function Minimizing “complete KL divergence”

35 Incomplete Graphical Models Thank You!


Download ppt "Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture."

Similar presentations


Ads by Google