EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr)
EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.
Pattern Recognition and Machine Learning
K Means Clustering , Nearest Cluster and Gaussian Mixture
Lecture Pose Estimation – Gaussian Process Tae-Kyun Kim 1 EE4-62 MLCV.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Segmentation and Fitting Using Probabilistic Methods
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Visual Recognition Tutorial
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
6/10/ Visual Recognition1 Radial Basis Function Networks Computer Science, KAIST.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Machine Learning CMPT 726 Simon Fraser University
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.
Visual Recognition Tutorial
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
(1) A probability model respecting those covariance observations: Gaussian Maximum entropy probability distribution for a given covariance observation.
Gaussian Mixture Models and Expectation Maximization.
Today Wrap up of probability Vectors, Matrices. Calculus
Biointelligence Laboratory, Seoul National University
: Appendix A: Mathematical Foundations 1 Montri Karnjanadecha ac.th/~montri Principles of.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
First topic: clustering and pattern recognition Marc Sobel.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber
Lecture 17 Gaussian Mixture Models and Expectation Maximization
HMM - Part 2 The EM algorithm Continuous density HMM.
B AYESIAN L EARNING & G AUSSIAN M IXTURE M ODELS Jianping Fan Dept of Computer Science UNC-Charlotte.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Lecture 2: Statistical learning primer for biologists
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Flat clustering approaches
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Machine Learning CUNY Graduate Center Lecture 2: Math Primer.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: MLLR For Two Gaussians Mean and Variance Adaptation MATLB Example Resources:
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
The EM algorithm for Mixture of Gaussians & Classification with Generative models Jakob Verbeek December 2, 2011 Course website:
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Chapter 3: Maximum-Likelihood Parameter Estimation
Probability Theory and Parameter Estimation I
Bayesian Rule & Gaussian Mixture Models
Classification of unlabeled data:
Statistical Models for Automatic Speech Recognition
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
CS 2750: Machine Learning Expectation Maximization
Special Topics In Scientific Computing
Latent Variables, Mixture Models and EM
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Generally Discriminant Analysis
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Biointelligence Laboratory, Seoul National University
Multivariate Methods Berlin Chen, 2005 References:
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.
Presentation transcript:

EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim

EE462 MLCV 2 Data points (green), 2D vectors, are grouped to two homogenous clusters (blue and red). Clustering is achieved by an iterative algorithm (left to right). The cluster centers are marked x. Vector Clustering

EE462 MLCV 3 ` RGBRGB Pixel Clustering (Image Quantisation) Image pixels are represented by 3D vectors of R,G,B values. The vectors are grouped to K=10,3,2 clusters, and represented by the mean values of the respective clusters.

EE462 MLCV 4 dimension D … …… or raw pixels … K codewords Patch Clustering Image patches are harvested around interest points from a large number of images. They are represented by finite dimensional vectors, and clustered to form a visual dictionary. SIFT 20 D=400 Lecture 9-10 (BoW)

EE462 MLCV 5 Image Clustering Whole images are represented as finite dimensional vectors. Homogenous vectors are grouped together in Euclidean space. Lecture (BoW) ……

EE462 MLCV 6 K-means vs GMM Hard clustering: a data point is assigned a cluster. Soft clustering: a data point is explained by a mix of multiple Gaussians probabilistically. Two standard methods are k-means and Gaussian Mixture Model (GMM). K-means assigns data points to the nearest clusters, while GMM represents data by multiple Gaussian densities.

EE462 MLCV 7 Matrix and Vector Derivatives Matrix and vector derivatives are obtained first by element-wise derivatives and then reforming them into matrices and vectors.

EE462 MLCV 8 Matrix and Vector Derivatives

EE462 MLCV 9 K-means Clustering Given a data set {x 1,…, x N } of N observations in a D- dimensional space, our goal is to partition the data set into K clusters or groups. The vectors μ k, where k = 1,...,K, represent k-th cluster, e.g. the centers of the clusters. Binary indicator variables are defined for each data point x n, r nk ∈ {0, 1}, where k = 1,...,K. 1-of-K coding scheme: x n is assigned to cluster k then r nk = 1, and r nj = 0 for j ≠ k.

EE462 MLCV 10 The objective function that measures distortion is We ought to find {r nk } and {μ k } that minimise J.

EE462 MLCV 11 till converge Iterative solution: Step 1: We minimise J with respect to r nk, keeping μ k fixed. J is a linear function of r nk, we have a closed form solution Step 2: We minimise J with respect to μ k keeping r nk fixed. J is a quadratic of μ k. We set its derivative with respect to μ k to zero, First we choose some initial values for μ k.

EE462 MLCV 12 K=2 μ1 μ 2 r nk

EE462 MLCV 13 It provides convergence proof. Local minimum: its result depends on initial values of μ k.

EE462 MLCV 14 Generalisation of K-means using a more generic dissimilarity measure V (x n, μ k ). The objective function to minimise is Circles in the same size V = ( x n - u k ) T Σ k -1 ( x n - u k ) Generalisation of K-means Cluster shapes by different Σ k:, where Σ k denotes the covariance matrix. Σ k: = I

EE462 MLCV 15 Generalisation of K-means Σ k: a diagonal matrix Σ k: an isotropic matrix Different sized circles Ellipses Σ k: a full matrix Rotated ellipses

EE462 MLCV 16 Statistical Pattern Recognition Toolbox for Matlab rtool/ …\stprtool\probab\cmeans.m

EE462 MLCV 17 Mixture of Gaussians Denote z as 1-of-K representation: z k ∈ {0, 1} and Σ k z k = 1. We define the joint distribution p(x, z) by a marginal distribution p(z) and a conditional distribution p(x|z). Lecture (Prob. Graphical models) Hidden variable Observable variable: data

EE462 MLCV 18 The marginal distribution over z is written by the mixing coefficients π k where The marginal distribution is in the form of Similarly,

EE462 MLCV 19 The marginal distribution of x is, which is as a linear superposition of Gaussians.

EE462 MLCV 20 The conditional probability p(z k = 1|x) denoted by γ(z k ) is obtained by Bayes' theorem, We view π k as the prior probability of z k = 1, and γ(z k ) as the posterior probability. γ(z k ) is the responsibility that k-component takes for explaining the observation x.

EE462 MLCV 21 Maximum Likelihood Estimation s.t. Given a data set of X = {x 1,…, x N }, the log of the likelihood function is

EE462 MLCV 22 Setting the derivatives of ln p(X|π, μ, Σ) with respect to μ k to zero, we obtain

EE462 MLCV 23 objective ftn. f(x) constraints g(x) max f(x) s.t. g(x)=0 Refer to Optimisation course or Finally, we maximise ln p(X|π, μ, Σ) with respect to the mixing coefficients π k. We use a Largrange multiplier

EE462 MLCV 24 which gives we find λ = -N and

EE462 MLCV 25 EM (Expectation Maximisation) for Gaussian Mixtures 1.Initialise the means μ k, covariances Σ k and mixing coefficients π k. 2.Ε step: Evaluate the responsibilities using the current parameter values 3.M step: RE-estimate the parameters using the current responsibilities

EE462 MLCV Evaluate the log likelihood and check for convergence of either the parameters or the log likelihood. If the convergence criterion is not satisfied, return to step 2. EM (Expectation Maximisation) for Gaussian Mixtures

EE462 MLCV 27

EE462 MLCV 28 Statistical Pattern Recognition Toolbox for Matlab rtool/ …\stprtool\visual\pgmm.m …\stprtool\demos\demo_emgmm.m

EE462 MLCV 29 Information Theory The amount of information can be viewed as the degree of surprise on the value of x. If we have two events x and y that are unrelated, h(x,y) = h(x) + h(y). As p(x,y) = p(x)p(y), thus h(x) takes the logarithm of p(x) as where the minus sign ensures that information is positive or zero. Lecture 7 (Random forest) 0 1

EE462 MLCV 30 The average amount of information (called entropy) is given by The differential entropy for a multivariate continuous variable x is