University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Gaussian Mixture.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Pattern Recognition and Machine Learning
Part 2: Unsupervised Learning
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax K-MST -based.
University of Eastern Finland School of Computing P.O. Box 111 FIN Joensuu FINLAND Tel fax K-means*:
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Demonstration.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax K-means example.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Comparison.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Department.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Department.
Random Swap EM algorithm for GMM and Image Segmentation
Image Modeling & Segmentation
Mixture Models and the EM Algorithm
Unsupervised Learning
Pattern Recognition and Machine Learning
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CS479/679 Pattern Recognition Dr. George Bebis
Visual Recognition Tutorial
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Visual Recognition Tutorial
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
Biointelligence Laboratory, Seoul National University
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
EM and expected complete log-likelihood Mixture of Experts
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
HMM - Part 2 The EM algorithm Continuous density HMM.
Lecture 2: Statistical learning primer for biologists
Flat clustering approaches
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: MLLR For Two Gaussians Mean and Variance Adaptation MATLB Example Resources:
Review of statistical modeling and probability theory Alan Moses ML4bio.
Computacion Inteligente Least-Square Methods for System Identification.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Estimation Econometría. ADE.. Estimation We assume we have a sample of size T of: – The dependent variable (y) – The explanatory variables (x 1,x 2, x.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Chapter 3: Maximum-Likelihood Parameter Estimation
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
Probability Theory and Parameter Estimation I
Classification of unlabeled data:
Statistical Models for Automatic Speech Recognition
Special Topics In Scientific Computing
Latent Variables, Mixture Models and EM
Unsupervised-learning Methods for Image Clustering
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
Statistical Models for Automatic Speech Recognition
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Pattern Recognition and Machine Learning
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
#21 Marginalize vs. Condition Uninteresting Fitted Parameters
Biointelligence Laboratory, Seoul National University
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
EM Algorithm 主講人:虞台文.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Probabilistic Surrogate Models
Presentation transcript:

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture Models Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND Ville Hautamäki Clustering Methods: Part 8

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Preliminaries We assume that the dataset X has been generated by a parametric distribution p(X). Estimation of the parameters of p is known as density estimation. We consider Gaussian distribution. Figures taken from:

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Typical parameters (1) Mean (μ): average value of p(X), also called expectation. Variance (σ): provides a measure of variability in p(X) around the mean.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Typical parameters (2) Covariance: measures how much two variables vary together. Covariance matrix: collection of covariances between all dimensions. –Diagonal of the covariance matrix contains the variances of each attribute.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax One-dimensional Gaussian Parameters to be estimated are the mean (μ) and variance (σ)

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Multivariate Gaussian (1) In multivariate case we have covariance matrix instead of variance

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Multivariate Gaussian (2) Full covariance Diagonal Single Complete data log likelihood:

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Maximum Likelihood (ML) parameter estimation Maximize the log likelihood formulation Setting the gradient of the complete data log likelihood to zero we can find the closed form solution. –Which in the case of mean, is the sample average.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax When one Gaussian is not enough Real world datasets are rarely unimodal!

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Mixtures of Gaussians

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Mixtures of Gaussians (2) In addition to mean and covariance parameters (now M times), we have mixing coefficients π k. Following properties hold for the mixing coefficients: It can be seen as the prior probability of the component k

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Responsibilities (1) Component labels (red, green and blue) cannot be observed. We have to calculate approximations (responsibilities). Complete data Incomplete dataResponsibilities

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Responsibilities (2) Responsibility describes, how probably observation vector x is from component k. In clustering, responsibilities take values 0 and 1, and thus, it defines the hard partitioning.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax We can express the marginal density p(x) as: From this, we can find the responsibility of the k th component of x using Bayesian theorem: Responsibilities (3)

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Expectation Maximization (EM) Goal: Maximize the log likelihood of the whole data When responsibilities are calculated, we can maximize individually for the means, covariances and the mixing coefficients!

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Exact update equations New mean estimates: Covariance estimates Mixing coefficient estimates

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax EM Algorithm Initialize parameters while not converged –E step: Calculate responsibilities. –M step: Estimate new parameters –Calculate log likelihood of the new parameters

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Example of EM

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Computational complexity Hard clustering with MSE criterion is NP-complete. Can we find optimal GMM in polynomial time? Finding optimal GMM is in class NP

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Some insights In GMM we need to estimate the parameters, which all are real numbers –Number of parameters: M+M(D) + M(D(D-1)/2) Hard clustering has no parameters, just set partitioning (remember optimality criteria!)

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Some further insights (2) Both optimization functions are mathematically rigorous! Solutions minimizing MSE are always meaningful Maximization of log likelihood might lead to singularity!

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Example of singularity