Clustering (2) & EM algorithm

Slides:



Advertisements
Similar presentations
Image Modeling & Segmentation
Advertisements

EMNLP, June 2001Ted Pedersen - EM Panel1 A Gentle Introduction to the EM Algorithm Ted Pedersen Department of Computer Science University of Minnesota.
Unsupervised Learning
Clustering Beyond K-means
Expectation Maximization Expectation Maximization A “Gentle” Introduction Scott Morris Department of Computer Science.
Expectation Maximization
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Supervised Learning Recap
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University.
Clustering (1) Clustering Similarity measure Hierarchical clustering Model-based clustering Figures from the book Data Clustering by Gan et al.
Visual Recognition Tutorial
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.
Expectation-Maximization
Visual Recognition Tutorial
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
ECE 5984: Introduction to Machine Learning
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
Gaussian Mixture Models and Expectation Maximization.
Semi-Supervised Learning
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Lecture 19: More EM Machine Learning April 15, 2010.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
G AUSSIAN M IXTURE M ODELS David Sears Music Information Retrieval October 8, 2009.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Lecture 17 Gaussian Mixture Models and Expectation Maximization
Mixture of Gaussians This is a probability distribution for random variables or N-D vectors such as… –intensity of an object in a gray scale image –color.
HMM - Part 2 The EM algorithm Continuous density HMM.
Intro. ANN & Fuzzy Systems Lecture 23 Clustering (4)
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Lecture 2: Statistical learning primer for biologists
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Flat clustering approaches
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
For multivariate data of a continuous nature, attention has focussed on the use of multivariate normal components because of their computational convenience.
Information Bottleneck versus Maximum Likelihood Felix Polyakov.
EM Algorithm 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction Example  Missing Data Example  Mixed Attributes Example  Mixture Main Body Mixture Model.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Clustering (1) Clustering Similarity measure Hierarchical clustering
Deep Feedforward Networks
Lecture 18 Expectation Maximization
Classification of unlabeled data:
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
10701 / Machine Learning.
Clustering (3) Center-based algorithms Fuzzy k-means
Latent Variables, Mixture Models and EM
Probabilistic Models for Linear Regression
Expectation-Maximization
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
Wavelet-Based Denoising Using Hidden Markov Models
Introduction to EM algorithm
Wavelet-Based Denoising Using Hidden Markov Models
Gaussian Mixture Models And their training with the EM algorithm
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Topic Models in Text Processing
EM Algorithm 主講人:虞台文.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Clustering (2) & EM algorithm Model-based clustering EM algorithm Data Clustering by Gan et al. Machine Learning, a Probabilistic Perspective, The Expectation Maximization Algorithm, A short tutorial, by Borman

Model-based clustering Impose certain model assumptions on potential clusters; try to optimize the fit between data and model. The data is viewed as coming from a mixture of probability distributions; each of the distributions represents a cluster.

Model-based clustering For example, if we believe the data come from a mixture of several Gaussian densities, the likelihood that data point i is from cluster j is: Classification likelihood approach: find cluster assignments and parameters that maximize

Model-based clustering Mixture likelihood approach: The most commonly used method is the EM algorithm. It iterates between soft cluster assignment and parameter estimation.

EM algorithm In maximum likelihood estimation, the likelihood function is a function of the parameter θ given the data X, EM algorithm is an iterative procedure for maximizing L(θ) After the nth iteration, the current estimate for is θn. We want an update θn+1 that maximizes In many problems, there are unobserved variables - hidden random vector Z. Then In clustering, z is the soft cluster assignment.

EM algorithm

EM algorithm -ln() is convex

EM algorithm This is proportional to the expectation of ln[P(X, z|θ)], over the distribution of z|X, θn

EM algorithm Thus at every θn, we find the conditional distribution of the hidden variables z, the taking expectation over this distribution to find the θn+1 that maximizes the likelihood.

EM algorithm Convergence of EM algorithm. At every step, θn+1 is the maximizer of So, Thus the likelihood L(θ) is strictly non-decreasing. Most of the time, EM will converge to a local maximum. But it can jump out of the closest local maximum.

EM algorithm Nature Biotechnology volume 26, pages 897–899 (2008)

EM algorithm Example: the 2 coin problem. Scenario 1: no missing value: Nature Biotechnology volume 26, pages 897–899 (2008)

EM algorithm Scenario 2: missing which coin is tossed: Nature Biotechnology volume 26, pages 897–899 (2008)

Model-based clustering EM algorithm in the simplest case: two component Gaussian in 1D

Model-based clustering

Model-based clustering

Model-based clustering

Model-based clustering Gaussian cluster models. E step: M step:

Model-based clustering Common assumptions: From 1 to 4, the model becomes more flexible, yet more parameters need to be estimated. May become less stable.

Model-based clustering Example: Mixture of multinoullis