Lecture 11 Generalizations of EM.

Slides:



Advertisements
Similar presentations
Part 2: Unsupervised Learning
Advertisements

Image Modeling & Segmentation
Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Biointelligence Laboratory, Seoul National University
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
K-means clustering Hongning Wang
Variational Inference for Dirichlet Process Mixture Daniel Klein and Soravit Beer Changpinyo October 11, 2011 Applied Bayesian Nonparametrics Special Topics.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Mixture Language Models and EM Algorithm
Visual Recognition Tutorial
Midterm Review. The Midterm Everything we have talked about so far Stuff from HW I won’t ask you to do as complicated calculations as the HW Don’t need.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Lecture 5: Learning models using EM
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Maximum Likelihood (ML), Expectation Maximization (EM)
Expectation-Maximization
Visual Recognition Tutorial
. Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables Tal El-Hay & Nir Friedman School of Computer Science & Engineering.
Gaussian Mixture Models and Expectation Maximization.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
EM and expected complete log-likelihood Mixture of Experts
Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Lecture 2: Statistical learning primer for biologists
CSE 517 Natural Language Processing Winter 2015
Maximum Likelihood Estimation
Machine Learning 5. Parametric Methods.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
RADFORD M. NEAL GEOFFREY E. HINTON 발표: 황규백
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Univariate Gaussian Case (Cont.)
Probability Theory and Parameter Estimation I
ICS 280 Learning in Graphical Models
Generalized Iterative Scaling Exponential Family Distributions
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
CSC 594 Topics in AI – Natural Language Processing
Special Topics In Scientific Computing
K-means conditional mixture models
Latent Variables, Mixture Models and EM
Expectation-Maximization
Akio Utsugi National Institute of Bioscience and Human-technology,
Probabilistic Models with Latent Variables
Collapsed Variational Dirichlet Process Mixture Models
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models.
Expectation Maximization
Stochastic Optimization Maximization for Latent Variable Models
Lecture 11: Mixture of Gaussians
10701 Recitation Pengtao Xie
Bayesian Learning Chapter
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
LECTURE 09: BAYESIAN LEARNING
LECTURE 07: BAYESIAN ESTIMATION
Topic Models in Text Processing
Independent Factor Analysis
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Learning Bayesian networks
Presentation transcript:

Lecture 11 Generalizations of EM

Last Time Example of Gaussian mixture model. E-step: compute sufficient statistics w.r.t. posterior M-step: maximize Q. MoG_demo

Generalizations Map-EM: include prior for parameters. EM computes maximum a-posteriori distribution. By interchanging the role of X and the parameters we can also compute the maximum likely configuration for P(x). “Generalized EM” (GEM) we only need to do partial M-steps. We can apply EM to maximize positive functions of a special form. We can do partial E-steps as well !

Variational EM (VEM) EM can be viewed as coordinate ascent on Q(theta,q), where q(y) is a parameterized family of distributions. Optimal value for q=p(y|x,theta). But, we don’t even have to be able to include that optimal solution in the allowed family. In this case we maximize a bound on the log-likelihood which still makes sense. This approximate EM algorithm can be very helpful in making an intractable E-step tractable (at the expense of accuracy). A simple example is k-means, where we choose q(y) to be a delta peak at a certain mean.