Clustering (2) & EM algorithm

Slides:

Advertisements

Similar presentations

Image Modeling & Segmentation

Advertisements

EMNLP, June 2001Ted Pedersen - EM Panel1 A Gentle Introduction to the EM Algorithm Ted Pedersen Department of Computer Science University of Minnesota.

Unsupervised Learning

Clustering Beyond K-means

Expectation Maximization Expectation Maximization A “Gentle” Introduction Scott Morris Department of Computer Science.

Expectation Maximization

Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Supervised Learning Recap

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.

DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University.

Clustering (1) Clustering Similarity measure Hierarchical clustering Model-based clustering Figures from the book Data Clustering by Gan et al.

Visual Recognition Tutorial

EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.

First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.

Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.

A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.

Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.

Expectation-Maximization

Visual Recognition Tutorial

Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.

Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

ECE 5984: Introduction to Machine Learning

Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.

Gaussian Mixture Models and Expectation Maximization.

Semi-Supervised Learning

Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.

Lecture 19: More EM Machine Learning April 15, 2010.

CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

G AUSSIAN M IXTURE M ODELS David Sears Music Information Retrieval October 8, 2009.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Lecture 17 Gaussian Mixture Models and Expectation Maximization

Mixture of Gaussians This is a probability distribution for random variables or N-D vectors such as… –intensity of an object in a gray scale image –color.

HMM - Part 2 The EM algorithm Continuous density HMM.

Intro. ANN & Fuzzy Systems Lecture 23 Clustering (4)

Computer Vision Lecture 6. Probabilistic Methods in Segmentation.

Lecture 2: Statistical learning primer for biologists

ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.

Flat clustering approaches

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

For multivariate data of a continuous nature, attention has focussed on the use of multivariate normal components because of their computational convenience.

Information Bottleneck versus Maximum Likelihood Felix Polyakov.

EM Algorithm 主講人：虞台文大同大學資工所智慧型多媒體研究室. Contents Introduction Example  Missing Data Example  Mixed Attributes Example  Mixture Main Body Mixture Model.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Clustering (1) Clustering Similarity measure Hierarchical clustering

Deep Feedforward Networks

Lecture 18 Expectation Maximization

Classification of unlabeled data:

LECTURE 10: EXPECTATION MAXIMIZATION (EM)

10701 / Machine Learning.

Clustering (3) Center-based algorithms Fuzzy k-means

Latent Variables, Mixture Models and EM

Probabilistic Models for Linear Regression

Expectation-Maximization

Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry

Bayesian Models in Machine Learning

Probabilistic Models with Latent Variables

Wavelet-Based Denoising Using Hidden Markov Models

Introduction to EM algorithm

Wavelet-Based Denoising Using Hidden Markov Models

Gaussian Mixture Models And their training with the EM algorithm

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

Topic Models in Text Processing

EM Algorithm 主講人：虞台文.

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

Presentation transcript:

Clustering (2) & EM algorithm Model-based clustering EM algorithm Data Clustering by Gan et al. Machine Learning, a Probabilistic Perspective, The Expectation Maximization Algorithm, A short tutorial, by Borman

Model-based clustering Impose certain model assumptions on potential clusters; try to optimize the fit between data and model. The data is viewed as coming from a mixture of probability distributions; each of the distributions represents a cluster.

Model-based clustering For example, if we believe the data come from a mixture of several Gaussian densities, the likelihood that data point i is from cluster j is: Classification likelihood approach: find cluster assignments and parameters that maximize

Model-based clustering Mixture likelihood approach: The most commonly used method is the EM algorithm. It iterates between soft cluster assignment and parameter estimation.

EM algorithm In maximum likelihood estimation, the likelihood function is a function of the parameter θ given the data X, EM algorithm is an iterative procedure for maximizing L(θ) After the nth iteration, the current estimate for is θn. We want an update θn+1 that maximizes In many problems, there are unobserved variables - hidden random vector Z. Then In clustering, z is the soft cluster assignment.

EM algorithm

EM algorithm -ln() is convex

EM algorithm This is proportional to the expectation of ln[P(X, z|θ)], over the distribution of z|X, θn

EM algorithm Thus at every θn, we find the conditional distribution of the hidden variables z, the taking expectation over this distribution to find the θn+1 that maximizes the likelihood.

EM algorithm Convergence of EM algorithm. At every step, θn+1 is the maximizer of So, Thus the likelihood L(θ) is strictly non-decreasing. Most of the time, EM will converge to a local maximum. But it can jump out of the closest local maximum.

EM algorithm Nature Biotechnology volume 26, pages 897–899 (2008)

EM algorithm Example: the 2 coin problem. Scenario 1: no missing value: Nature Biotechnology volume 26, pages 897–899 (2008)

EM algorithm Scenario 2: missing which coin is tossed: Nature Biotechnology volume 26, pages 897–899 (2008)

Model-based clustering EM algorithm in the simplest case: two component Gaussian in 1D

Model-based clustering

Model-based clustering

Model-based clustering

Model-based clustering Gaussian cluster models. E step: M step:

Model-based clustering Common assumptions: From 1 to 4, the model becomes more flexible, yet more parameters need to be estimated. May become less stable.

Model-based clustering Example: Mixture of multinoullis