EM Algorithm and its Applications slides adapted from Yi Li, WU
Outline Introduction of EM K-Means EM EM Applications Image Segmentation using EM in Clustering Algorithms
Clustering How can we separate unsupervised the data into different classes?
Example
Clustering There are K clusters C1,…, CK with means m1,…, mK. The least-squares error is defined as Out of all possible partitions into K clusters, choose the one that minimizes D. K 2 D = || xi - mk || . k=1 xi Ck Why don’t we just do this? If we could, would we get meaningful objects?
Color Clustering by K-means Algorithm Form K-means clusters from a set of n-dimensional vectors 1. Set ic (iteration count) to 1 2. Choose randomly a set of K means m1(1), …, mK(1). 3. For each vector xi, compute distance D(xi,mk(ic)), k=1,…K and assign xi to the cluster Cj with nearest mean. 4. Update the means to get m1(ic),…,mK(ic); increment ic by 1 5. Repeat steps 3 and 4 until Ck(ic) = Ck(ic+1) for all k.
Classification Results K-Means Classifier Classification Results x1C(x1) x2C(x2) … xiC(xi) x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} Classifier (K-Means) Cluster Parameters m1 for C1 m2 for C2 … mk for Ck
K-Means Classifier (Cont.) Input (Known) Output (Unknown) x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} Classification Results x1C(x1) x2C(x2) … xiC(xi) Cluster Parameters m1 for C1 m2 for C2 … mk for Ck
Classification Results (1) C(x1), C(x2), …, C(xi) Output (Unknown) Input (Known) Initial Guess of Cluster Parameters m1 , m2 , …, mk x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} Classification Results (1) C(x1), C(x2), …, C(xi) Cluster Parameters(1) m1 , m2 , …, mk Classification Results (2) C(x1), C(x2), …, C(xi) Cluster Parameters(2) m1 , m2 , …, mk Classification Results (ic) C(x1), C(x2), …, C(xi) Cluster Parameters(ic) m1 , m2 , …, mk
K-Means (Cont.) Boot Step: Iteration Step: Initialize K clusters: C1, …, CK Each Cluster is represented by its mean mj Iteration Step: Estimate the cluster of each data Re-estimate the cluster parameters xi C(xi)
K-Means Example
K-Means Example
Image Segmentation by K-Means Select a value of K Select a feature vector for every pixel (color, texture, position, or combination of these etc.) Define a similarity measure between feature vectors (Usually Euclidean Distance). Apply K-Means Algorithm. Apply Connected Components Algorithm. Merge any components of size less than some threshold to an adjacent component that is most similar to it. * From Marc Pollefeys COMP 256 2003
Results of K-Means Clustering: I gave each pixel the mean intensity or mean color of its cluster --- this is basically just vector quantizing the image intensities/colors. Notice that there is no requirement that clusters be spatially localized and they’re not. Image Clusters on intensity Clusters on color K-means clustering using intensity alone and color alone * From Marc Pollefeys COMP 256 2003
K means: Challenges Will converge But not to the global minimum of objective function Variations: search for appropriate number of clusters by applying k- means with different k and comparing the results
K-means Variants Different ways to initialize the means Different stopping criteria Dynamic methods for determining the right number of clusters (K) for a given image The EM Algorithm: a probabilistic formulation of K-means
E M algorithm Like K-means with soft assignment. (vs hard assignment) Assign point partly to all clusters based on probability it belongs to each. Compute weighted averages (centers, and covariances)
K-means vs. EM K-means EM Cluster mean mean, variance, Representation and weight Cluster randomly select initialize K Initialization K means Gaussian distributions Expectation assign each point soft-assign each point to closest mean to each distribution Maximization compute means compute new params of current clusters of each distribution
Notation: Normal distribution 1D case N( , ) is a 1D normal (Gaussian) distribution with mean and standard deviation (so the variance is 2.
Normal distribution: Multi-dimensional case N(, ) is a multivariate Gaussian distribution with mean and covariance matrix . What is a covariance matrix? R G B R R2 RG RB G GR G2 GB B BR BG B2 variance(X): X2 = (xi - )2 (1/N) cov(X,Y) = (xi - x)(yi - y) (1/N)
K –Means -> EM with GMM : The Intuition (1) Instead of making a “hard” decision on to which class a pixel belongs to we use probability theory and assign pixels to classes probabilistically. Using Bayes rule We want to maximize the Posterior Likelihood Prior Posterior
K-Means -> EM : The Intuition (2) We model the Likelihood as a Gaussian/Normal distribution
K-Means-> EM: The Intuition (3) Instead of modeling the color distribution with one Gaussian, we model it as a weighted sum of Gaussians We want to find the parameters to maximize the correctness of We use the general trick of maximizing the logarithm of the probability function and maximize that. This is called Maximum Likelihood Estimation (MLE) We solve the minimization in 2 steps (E and M)
K-Means EM with Gaussian Mixture Models (GMM) Boot Step: Initialize K clusters: C1, …, CK Iteration Step: Estimate the (soft) cluster assignment of each data Re-estimate the cluster parameters (j, j) and P(Cj) for each cluster j. Expectation Maximization For each cluster j
Classification Results EM Classifier Classification Results p(C1|x1) p(Cj|x2) … p(Cj|xi) x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} Classifier (EM) Cluster Parameters (1,1),p(C1) for C1 (2,2),p(C2) for C2 … (k,k),p(Ck) for Ck
Classification Results EM Classifier (Cont.) Input (Known) Output (Unknown) x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} Cluster Parameters (1,1), p(C1) for C1 (2,2), p(C2) for C2 … (k,k), p(Ck) for Ck Classification Results p(C1|x1) p(Cj|x2) … p(Cj|xi)
Classification Results Expectation Step Input (Known) Input (Estimation) Output x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} Cluster Parameters (1,1), p(C1) for C1 (2,2), p(C2) for C2 … (k,k), p(Ck) for Ck Classification Results p(C1|x1) p(Cj|x2) … p(Cj|xi) +
Classification Results Maximization Step Input (Known) Input (Estimation) Output x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} Classification Results p(C1|x1) p(Cj|x2) … p(Cj|xi) Cluster Parameters (1,1), p(C1) for C1 (2,2), p(C2) for C2 … (k,k), p(Ck) for Ck +
EM Algorithm Boot Step: Iteration Step: Initialize K clusters: C1, …, CK Iteration Step: Expectation Step Maximization Step (j, j) and P(Cj) for each cluster j.
EM Demo Video of Image Classification with EM Online Tutorial
EM Applications Blobworld: Image Segmentation Using Expectation-Maximization and its Application to Image Querying
Image Segmentation using EM Step 1: Feature Extraction Step 2: Image Segmentation using EM In the literature often EM is used to segment the image into multiple regions. Usually each region corresponds to a gaussian model In our Project we segment one foreground region from the background. In the training, we crop the foreground pixels and train a Gaussian Mixture Model on those pixels. In the testing, we evaluate the cond. probability of each pixel and we threshold on that value.
Symbols The feature vector for pixel i is called xi. There are going to be K Gaussian models; K is given. The j-th model has a Gaussian distribution with parameters j=(j,j). j's are the weights (which sum to 1) of Gaussians. is the collection of parameters: =(1, …, k, 1, …, k)
Initialization Each of the K Gaussians will have parameters j=(j,j), where j is the mean of the j-th Gaussian. j is the covariance matrix of the j-th Gaussian. The covariance matrices are initialized to be the identity matrix. The means can be initialized by finding the average feature vectors in each of K windows in the image; this is a data-driven initialization.
E-Step (referred to as responsibilities)
M-Step
Sample Results for Scene Segmentation
Finding good feature vectors
Fitting Gaussians in RGB space
Fitting Gaussians in other spaces Experiment with other color spaces. Experiment with dimensionality reduction (PCA)