Lecture 11 Generalizations of EM.

Slides:

Advertisements

Similar presentations

Part 2: Unsupervised Learning

Advertisements

Image Modeling & Segmentation

Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

Biointelligence Laboratory, Seoul National University

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.

K-means clustering Hongning Wang

Variational Inference for Dirichlet Process Mixture Daniel Klein and Soravit Beer Changpinyo October 11, 2011 Applied Bayesian Nonparametrics Special Topics.

Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.

Mixture Language Models and EM Algorithm

Visual Recognition Tutorial

Midterm Review. The Midterm Everything we have talked about so far Stuff from HW I won’t ask you to do as complicated calculations as the HW Don’t need.

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.

Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.

Lecture 5: Learning models using EM

Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.

Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.

A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.

Maximum Likelihood (ML), Expectation Maximization (EM)

Expectation-Maximization

Visual Recognition Tutorial

. Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables Tal El-Hay & Nir Friedman School of Computer Science & Engineering.

Gaussian Mixture Models and Expectation Maximization.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

EM and expected complete log-likelihood Mixture of Experts

Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.

A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Computer Vision Lecture 6. Probabilistic Methods in Segmentation.

Lecture 2: Statistical learning primer for biologists

CSE 517 Natural Language Processing Winter 2015

Maximum Likelihood Estimation

Machine Learning 5. Parametric Methods.

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

RADFORD M. NEAL GEOFFREY E. HINTON 발표: 황규백

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.

CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.

Univariate Gaussian Case (Cont.)

Probability Theory and Parameter Estimation I

ICS 280 Learning in Graphical Models

Generalized Iterative Scaling Exponential Family Distributions

LECTURE 10: EXPECTATION MAXIMIZATION (EM)

CSC 594 Topics in AI – Natural Language Processing

Special Topics In Scientific Computing

K-means conditional mixture models

Latent Variables, Mixture Models and EM

Expectation-Maximization

Akio Utsugi National Institute of Bioscience and Human-technology,

Probabilistic Models with Latent Variables

Collapsed Variational Dirichlet Process Mixture Models

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models.

Expectation Maximization

Stochastic Optimization Maximization for Latent Variable Models

Lecture 11: Mixture of Gaussians

10701 Recitation Pengtao Xie

Bayesian Learning Chapter

LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.

LECTURE 09: BAYESIAN LEARNING

LECTURE 07: BAYESIAN ESTIMATION

Topic Models in Text Processing

Independent Factor Analysis

Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.

Learning Bayesian networks

Presentation transcript:

Lecture 11 Generalizations of EM

Last Time Example of Gaussian mixture model. E-step: compute sufficient statistics w.r.t. posterior M-step: maximize Q. MoG_demo

Generalizations Map-EM: include prior for parameters. EM computes maximum a-posteriori distribution. By interchanging the role of X and the parameters we can also compute the maximum likely configuration for P(x). “Generalized EM” (GEM) we only need to do partial M-steps. We can apply EM to maximize positive functions of a special form. We can do partial E-steps as well !

Variational EM (VEM) EM can be viewed as coordinate ascent on Q(theta,q), where q(y) is a parameterized family of distributions. Optimal value for q=p(y|x,theta). But, we don’t even have to be able to include that optimal solution in the allowed family. In this case we maximize a bound on the log-likelihood which still makes sense. This approximate EM algorithm can be very helpful in making an intractable E-step tractable (at the expense of accuracy). A simple example is k-means, where we choose q(y) to be a delta peak at a certain mean.