DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University.

Slides:



Advertisements
Similar presentations
Image Modeling & Segmentation
Advertisements

Mixture Models and the EM Algorithm
EM Algorithm Jur van den Berg.
Expectation Maximization
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Segmentation and Fitting Using Probabilistic Methods
K-means clustering Hongning Wang
Visual Recognition Tutorial
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Gaussian Mixture Example: Start After First Iteration.
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Expectation Maximization Algorithm
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.
Maximum Likelihood (ML), Expectation Maximization (EM)
Expectation-Maximization
Visual Recognition Tutorial
What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
EM Algorithm Likelihood, Mixture Models and Clustering.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
© 2007 John M. Abowd, Lars Vilhuber, all rights reserved Estimating m and u Probabilities Using EM Based on Winkler 1988 "Using the EM Algorithm for Weight.
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
EM and expected complete log-likelihood Mixture of Experts
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Lecture 19: More EM Machine Learning April 15, 2010.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Mixture of Gaussians This is a probability distribution for random variables or N-D vectors such as… –intensity of an object in a gray scale image –color.
Clustering Algorithms Presented by Michael Smaili CS 157B Spring
HMM - Part 2 The EM algorithm Continuous density HMM.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Roghayeh parsaee  These approaches assume that the study sample arises from a homogeneous population  focus is on relationships among variables 
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Flat clustering approaches
Model-based Clustering
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
EM Algorithm 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction Example  Missing Data Example  Mixed Attributes Example  Mixture Main Body Mixture Model.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Classification of unlabeled data:
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Latent Variables, Mixture Models and EM
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Probabilistic Models with Latent Variables
Expectation Maximization
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Biointelligence Laboratory, Seoul National University
EM Algorithm 主講人:虞台文.
Clustering (2) & EM algorithm
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University

PROBABILISTIC MODEL-BASED CLUSTERING USING MIXTURE MODELS Data Mining Lecture VI [4.5, 8.4, 9.2, 9.6, Hand, Manilla, Smyth ]

Probabilistic Model-Based Clustering using Mixture Models A probability mixture model A mixture model is a formalism for modeling a probability density function as a sum of parameterized functions. In mathematical terms:

A probability mixture model where p X (x) is the modeled probability distribution function, K is the number of components in the mixture model, and a k is mixture proportion of component k. By definition, 0 < a k < 1 for all k = 1…K and:

A probability mixture model h(x | λ k ) is a probability distribution parameterized by λ k. Mixture models are often used when we know h(x) and we can sample from p X (x), but we would like to determine the a k and λ k values. Such situations can arise in studies in which we sample from a population that is composed of several distinct subpopulations.

A common approach for ‘decomposing’ a mixture model It is common to think of mixture modeling as a missing data problem. One way to understand this is to assume that the data points under consideration have "membership" in one of the distributions we are using to model the data. When we start, this membership is unknown, or missing. The job of estimation is to devise appropriate parameters for the model functions we choose, with the connection to the data points being represented as their membership in the individual model distributions.

Probabilistic Model-Based Clustering using Mixture Models The EM-algorithm [book section 8.4]

Mixture Decomposition: The ‘Expectation-Maximization’ Algorithm The Expectation-maximization algorithm computes the missing memberships of data points in our chosen distribution model. It is an iterative procedure, where we start with initial parameters for our model distribution (the a k 's and λ k 's of the model listed above). The estimation process proceeds iteratively in two steps, the Expectation Step, and the Maximization Step.

The ‘Expectation-Maximization’ Algorithm The expectation step With initial guesses for the parameters in our mixture model, we compute "partial membership" of each data point in each constituent distribution. This is done by calculating expectation values for the membership variables of each data point.

The ‘Expectation-Maximization’ Algorithm The maximization step With the expectation values in hand for group membership, we can recompute plug-in estimates of our distribution parameters. For the mixing coefficient of this is simply the fractional membership of all data points in the second distribution.

EM-algorithm for Clustering The Suppose we have data D with a model with parameters  and hidden parameters H Interpretation: H = the class label Log-likelihood of observed data:

EM-algorithm for Clustering With p the probability over the data D. Let Q be the unknown distribution over the hidden parameters H Then the log-likelihood is:

[*Jensen’s inequality]

Jensen’s inequality for a concave-down function, the expected value of the function is less than the function of the expected value. The gray rectangle along the horizontal axis represents the probability distribution of x, which is uniform for simplicity, but the general idea applies for any distribution

EM-algorithm So: F(Q,  ) is a lower-bound on the log-likelihood function l(Q,  ). EM alternates between: E-step: maximising F to Q with fixed , and: M-step: maximising F to  with fixed Q.

EM-algorithm E-step: M-step:

Probabilistic Model-Based Clustering using Gaussian Mixtures

Probabilistic Model-Based Clustering using Mixture Models

Gaussian Mixture Decomposition Gaussian mixture Decomposition is a good classificator. It allows supervised as well as unsupervised learning (find how many classes is optimal, how they should be defined,...). But training is iterative and time consuming. Idea is to set position and width of gaussian distribution(s) to optimize the coverage of learning samples.

Probabilistic Model-Based Clustering using Mixture Models

The End