Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Expectation Maximization

Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Supervised Learning Recap

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.

Segmentation and Fitting Using Probabilistic Methods

Variational Inference for Dirichlet Process Mixture Daniel Klein and Soravit Beer Changpinyo October 11, 2011 Applied Bayesian Nonparametrics Special Topics.

Visual Recognition Tutorial

EE-148 Expectation Maximization Markus Weber 5/11/99.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.

Bayesian network inference

6/10/ Visual Recognition1 Radial Basis Function Networks Computer Science, KAIST.

Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.

First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.

Expectation Maximization Algorithm

G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.

Maximum Likelihood (ML), Expectation Maximization (EM)

Expectation-Maximization

Visual Recognition Tutorial

What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.

Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

EM Algorithm Likelihood, Mixture Models and Clustering.

Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.

Gaussian Mixture Models and Expectation Maximization.

Biointelligence Laboratory, Seoul National University

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

EM and expected complete log-likelihood Mixture of Experts

1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.

Lecture 19: More EM Machine Learning April 15, 2010.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Lecture 17 Gaussian Mixture Models and Expectation Maximization

Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:

HMM - Part 2 The EM algorithm Continuous density HMM.

CY3A2 System identification1 Maximum Likelihood Estimation: Maximum Likelihood is an ancient concept in estimation theory. Suppose that e is a discrete.

Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.

CS Statistical Machine learning Lecture 24

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819

Lecture 2: Statistical learning primer for biologists

ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: MLLR For Two Gaussians Mean and Variance Adaptation MATLB Example Resources:

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.

Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.

EM Algorithm 主講人：虞台文大同大學資工所智慧型多媒體研究室. Contents Introduction Example  Missing Data Example  Mixed Attributes Example  Mixture Main Body Mixture Model.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.

Lecture 18 Expectation Maximization

Ch3: Model Building through Regression

Classification of unlabeled data:

LECTURE 10: EXPECTATION MAXIMIZATION (EM)

Multimodal Learning with Deep Boltzmann Machines

Latent Variables, Mixture Models and EM

Expectation-Maximization

Statistical Learning Dong Liu Dept. EEIS, USTC.

دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry

Bayesian Models in Machine Learning

Probabilistic Models with Latent Variables

Lecture 11: Mixture of Gaussians

LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.

Biointelligence Laboratory, Seoul National University

EM Algorithm 主講人：虞台文.

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

Presentation transcript:

Incomplete Graphical Models Nan Hu

Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture Regression and classification EM on conditional mixture A general formulation of EM Algorithm

K-means clustering Problem: Given a set of observations how to group them into a set of K clustering, supposing the value of K is given. First Phase Second Phase

K-means clustering Original Set First Iteration Second Iteration Third Iteration

K-means clustering Coordinate descent algorithm The algorithm is trying to minimize distortion measure J by setting the partial derivatives to zero

Unconditional Mixture Problem: If the given sample data demonstrate multimodal densities, how to estimate the true density? Fit a single density with this bimodal case. Although algorithm converges, the results bear little relationship to the truth.

Unconditional Mixture A “divide-and-conquer” way to solve this problem Introducing latent variable Z Z X Multinomial node taking on one of K values Assign a density model for each subpopulation, overall density is Back

Unconditional Mixture Gaussian Mixture Models In this model, the mixture components are Gaussian distributions with parameters Probability model for a Gaussian mixture

Unconditional Mixture Posterior probability of latent variable Z: Log likelihood:

Unconditional Mixture Partial derivative of over using Lagrange Multipliers Solve it, we have

Unconditional Mixture Partial derivative of over Setting it to zero, we have

Unconditional Mixture Partial derivative of over Setting it to zero, we have

Unconditional Mixture The EM Algorithm First Phase Second Phase Back

Unconditional Mixture EM algorithm from expected complete log likelihood point of view Suppose we observed the latent variables, the data set becomes completely observed, the likelihood is defined as the complete log likelihood

Unconditional Mixture We treat the as random variables and take expectations conditioned on X and. Note are binary r.v., where Use this as the “best guess” for, we have Expected complete log likelihood

Unconditional Mixture Minimizing expected complete log likelihood by setting the derivatives to zero, we have

Conditional Mixture Graphical Model X Z Y Latent variable Z, multinomial node taking on one of K values For regression and classification The relationship between X and Z can be modeled in a discriminative classification way, e.g. softmax func. Back

Conditional Mixture By marginalizing over Z, X is taken to be always observed. The posterior probability is defined as

Conditional Mixture Some specific choice of mixture components Gaussian components Logistic components Where is the logistic function:

Conditional Mixture Parameter estimation via EM Complete log likelihood : Use expectation as the “best guess”, we have

Conditional Mixture The expected complete log likelihood can then be written as Taking partial derivatives and setting them to zero to find the update formula for EM

Conditional Mixture Summary of EM algorithm for conditional mixture (E step): Calculate the posterior probabilities (M step): Use the IRLS algorithm to update the parameter, base on data pairs. (M step): Use the weighted IRLS algorithm to update the parameters, based on the data points, with weights. Back

General Formulation - all observable variables - all latent variables - all parameters Suppose is observed, the ML estimate is However, is in fact not observed Complete log likelihood Incomplete log likelihood

General Formulation Suppose factors in some way, complete log likelihood turns to be Since is unknown, it’s not clear how to solve this ML estimation. However, if we average over the r.v. of

General Formulation Use as an estimate of, complete log likelihood becomes expected complete log likelihood This expected complete log likelihood becomes solvable, and hopefully, it’ll also improve the complete log likelihood in some way. (The basic idea behind EM.)

General Formulation EM maximizes incomplete log likelihood Jensen’s Inequality Auxiliary Function

General Formulation Given, maximizing is equal to maximizing the expected complete log likelihood

General Formulation Given, the choice yields the maximum of. Note: is the upper bound of

General Formulation From above, at every step of EM, we maximized. However, how do we know whether the finally maximized also maximized incomplete log likelihood ?

General Formulation The different between and KL divergence non-negative and uniquely minimized at

General Formulation EM and alternating minimization Recall the maximization of the likelihood is exactly the same as minimization of KL divergence between the empirical distribution and the model. Including the latent variable, KL divergence comes to be a “complete KL divergence” between joint distributions on.complete KL divergence

General Formulation Back

General Formulation Reformulated EM algorithm (E step) (M step) Alternating minimization algorithm

Summary Unconditional Mixture Graphic model EM algorithm Conditional Mixture Graphic model EM algorithm A general formulation of EM algorithm Maximizing auxiliary function Minimizing “complete KL divergence”

Incomplete Graphical Models Thank You!