Download presentation
Presentation is loading. Please wait.
1
Problem Definition Input: Output: Requirement:
2018/12/7 Problem Definition Input: : A data set in d-dim. space m: Number of clusters (we don’t use k here to avoid confusion with summation index…) Output: Cluster centers: Assignment of each xi to one of the m clusters: Requirement: The output should minimize the objective function… 2018/12/7
2
Objective Function Objective function (distortion)
2018/12/7 Objective Function Objective function (distortion) d*m (for matrix C) plus n*m (for matrix A) tunable parameters with certain constraints on matrix A Np-hard problem if exact solution is required 2018/12/7
3
Task 1: How to Find Assignment in A?
2018/12/7 Task 1: How to Find Assignment in A? Goal Find A to minimize J(X; C, A) with fixed C Facts Close-form solution exists: 2018/12/7
4
Task 2: How to Find Centers in C?
2018/12/7 Task 2: How to Find Centers in C? Goal Find C to minimize J(X; C, A) with fixed A Facts Close-form solution exists: 2018/12/7
5
Algorithm Initialize Select initial m cluster centers Find clusters
2018/12/7 Algorithm Initialize Select initial m cluster centers Find clusters For each xi, assign the cluster with nearest center Find A to minimize J(X; C, A) with fixed C Find centers Recompute each cluster center as the mean of data in the cluster Find C to minimize J(X; C, A) with fixed A Stopping criterion Stop if clusters stay the same. Otherwise go to step 2. 2018/12/7
6
Stopping Criteria Two stopping criteria Facts
2018/12/7 Stopping Criteria Two stopping criteria Repeating until no more change in cluster assignment Repeat until distortion improvement is less than a threshold Facts Convergence is assured since J is reduced repeatedly. 2018/12/7
7
Properties of K-means Clustering
2018/12/7 Properties of K-means Clustering K-means can find the approximate solution efficiently. The distortion (squared error) is a monotonically non-increasing function of iterations. The goal is to minimize the square error, but it could end up in a local minimum. To increase the probability of finding the global maximum, try to start k-means with different initial conditions. “Cluster validation” refers to a set of methods which try to determine the best value of k. Other distance measures can be used in place of the Euclidean distance, with corresponding change in center identification. 2018/12/7
8
2018/12/7 K-means Snapshots 2018/12/7
9
Demo of K-means Clustering
2018/12/7 Demo of K-means Clustering Toolbox download Utility Toolbox Machine Learning Toolbox Demos kMeansClustering.m vecQuzntize.m 2018/12/7
10
Demos of K-means Clustering
2018/12/7 Demos of K-means Clustering kMeansClustering.m 2018/12/7
11
Application: Image Compression
2018/12/7 Application: Image Compression Goal Convert a image from true colors to indexed colors with minimum distortion. Steps Collect data from a true-color image Perform k-means clustering to obtain cluster centers as the indexed colors Compute the compression rate 2018/12/7
12
Example: Image Compression
2018/12/7 Example: Image Compression Original image Dimension: 480x640 Data size: 480*640*3*8 bits = 0.92 MB 2018/12/7
13
Example: Image Compression
2018/12/7 Example: Image Compression 2018/12/7
14
Example: Image Compression
2018/12/7 Example: Image Compression 2018/12/7
15
Code X = imread('annie19980405.jpg'); image(X) [m, n, p]=size(X);
2018/12/7 Code X = imread('annie jpg'); image(X) [m, n, p]=size(X); index=(1:m*n:m*n*p)'*ones(1, m*n)+ones(p,1)*(0:m*n-1); data=double(X(index)); maxI=6; for i=1:maxI centerNum=2^i; fprintf('i=%d/%d: no. of centers=%d\n', i, maxI, centerNum); center=kMeansClustering(data, centerNum); distMat=distPairwise(center, data); [minValue, minIndex]=min(distMat); X2=reshape(minIndex, m, n); map=center'/255; figure; image(X2); colormap(map); colorbar; axis image; end 2018/12/7
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.