Clustering Methods: Part 2d Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based algorithms
Part I: Random Swap algorithm P. Fränti and J. Kivijärvi Randomised local search algorithm for the clustering problem Pattern Analysis and Applications, 3 (4), , 2000.
Pseudo code of Random Swap
Demonstration of the algorithm
Centroid swap
Local repartition
Fine-tuning by K-means 1st iteration
Fine-tuning by K-means 2nd iteration
Fine-tuning by K-means 3rd iteration
Fine-tuning by K-means 16th iteration
Fine-tuning by K-means 17th iteration
Fine-tuning by K-means 18th iteration
Fine-tuning by K-means 19th iteration
Fine-tuning by K-means Final result after 25 iterations
Implementation of the swap 1. Random swap: 2. Re-partition vectors from old cluster: 3. Create new cluster:
Random swap as local search Study neighbor solutions
Select one and move Random swap as local search
Fine-tune solution by hill-climbing technique! Role of K-means
Consider only local optima! Role of K-means
Effective search space Role of swap: reduce search space
Chain reaction by K-means after swap
Independency of initialization Results for T = 5000 iterations Worst Best Initial
Part II: Efficiency of Random Swap
Probability of good swap Select a proper centroid for removal: –There are M clusters in total: p removal =1/M. Select a proper new location: –There are N choices: p add =1/N –Only M are significantly different: p add =1/M In total: –M 2 significantly different swaps. –Probability of each different swap is p swap =1/M 2 –Open question: how many of these are good?
Number of neighbors Open question: what is the size of neighborhood ( )? Voronoi neighbors Neighbors by distance
Observed number of neighbors Data set S 2
Average number of neighbors
Probability of not finding good swap: Expected number of iterations Estimated number of iterations:
Estimated number of iterations depending on T S1S1 S2S2 S3S3 S4S4 Observed = Number of iterations needed in practice. Estimated = Estimate of the number of iterations needed for given q
Probability of success (p) depending on T
Probability of failure (q) depending on T
Observed probabilities depending on dimensionality
Bounds for the number of iterations Upper limit: Lower limit similarly; resulting in:
Multiple swaps (w) Probability for performing less than w swaps: Expected number of iterations:
Number of swaps needed Example from image quantization
Efficiency of the random swap Total time to find correct clustering: –Time per iteration Number of iterations Time complexity of a single step: –Swap: O(1) –Remove cluster: 2M N/M = O(N) –Add cluster: 2N = O(N) –Centroids: 2 (2N/M) + 2 + 2 = O(N/M) –(Fast) K-means iteration: 4 N = O( N) * * See Fast K-means for analysis.
Time complexity and the observed number of steps
Time spent by K-means iterations
Effect of K-means iterations
Total time complexity Number of iterations needed (T): t = O(αN) Total time: Time complexity of a single step (t):
Time complexity: conclusions 1.Logarithmic dependency on q 2.Linear dependency on N 3.Quadratic dependency on M (With large number of clusters, can be too slow) 4.Inverse dependency on (worst case = 2) (Higher the dimensionality and higher the cluster overlap, faster the method)
Time-distortion performance
References Random swap algorithm: P. Fränti and J. Kivijärvi, "Randomised local search algorithm for the clustering problem", Pattern Analysis and Applications, 3 (4), , P. Fränti, J. Kivijärvi and O. Nevalainen, "Tabu search algorithm for codebook generation in VQ", Pattern Recognition, 31 (8), 1139 ‑ 1148, August Pseudo code: Efficiency of Random swap algorithm: P. Fränti, O. Virmajoki and V. Hautamäki, “Efficiency of random swap based clustering", IAPR Int. Conf. on Pattern Recognition (ICPR’08), Tampa, FL, Dec 2008.
Part III: Example when 4 swaps needed
MSE = 4.2 * 10 9 MSE = 3.4 * st swap
MSE = 3.1* 10 9 MSE = 3.0 * nd swap
MSE = 2.3 * 10 9 MSE = 2.1 * rd swap
MSE = 1.9 * 10 9 MSE = 1.7 * th swap
MSE = 1.3 * 10 9 Final result
Part IV: Deterministic Swap
Deterministic swap Costs for the swap: From where to where?
Merge two existing clusters [Frigui 1997, Kaukoranta 1998] following the spirit of agglomerative clustering. Local optimization: remove the prototype that increases the cost function value least [Fritzke 1997, Likas 2003, Fränti 2006]. Smart swap: find two nearest prototypes, and remove one of them randomly [Chen, 2010]. Pairwise swap: locate a pair of inconsistent prototypes in two solutions [Zhao, 2012]. Cluster removal
1.Select an existing cluster –Depending on strategy: 1..M choices. –Each choice takes O(N) time to test. 2. Select a location within this cluster –Add new prototype –Consider only existing points Cluster addition
Select the cluster Cluster with the biggest MSE –Intuitive heuristic [Fritzke 1997, Chen 2010] –Computationally demanding: Local optimization –Try all clusters for the addition [Likas et al, 2003] –Computationally demanding: O(NM)-O(N 2 )
Select the location 1.Current prototype + ε [Fritzke 1997] 2.Furthest vector [Fränti et al 1997] 3.Any other split heuristic [Fränti et al, 1997] 4.Random location 5.Every possible location [Likas et al, 2003]
Complexity of swaps
Furthest point in cluster Prototype removed Cluster where added Furthest point selected
Initialization: O(MN) Swap Iteration –Finding nearest pair: O(M 2 ) –Calculating distortion: O(N) –Sorting clusters: O(M ∙ logM) –Evaluation of result: O(N) –Repartition and fine-tuning: O( N) Total: O(MN+M 2 +I ∙ N) Number of iteration expected: < 2∙M Estimated total time: O(2M 2 N) Smart swap
Nearest prototypes Cluster with largest distortion
SmartSwap(X,M) → C,P C ← InitializeCentroids(X); P ←PartitionDataset(X, C); Maxorder ← log 2 M; order ← 1; WHILE order < Maxorder c i, c j ←FindNearestPair(C); S ← SortClustersByDistortion(P, C); c swap ←RandomSelect(c i, c j ); c location ← s order ; C new ← Swap(c swap, c location ); P new ← LocalRepartition(P, C new ); KmeansIteration(P new, C new ); IF f(C new ) < f(C), THEN order ← 1; C ←C new ; ELSE order ← order + 1; KmeansIteration(P, C); Smart swap pseudo code
Pairwise swap Unpaired prototypes Nearest neighbors of each other Nearest neighbor of the other set further than in the same set → Subject to swap
Combinations of random and deterministic swap VariantRemovalAddition RRRandom RDRandomDeterministic DRDeterministicRandom DDDeterministic D2RD2R + data update Random D2DD2D Deterministic + data update Deterministic
Summary of the time complexities Random removal Deterministic removal RRRDDRDDD2RD2RD2DD2D Removal O(1) O(MN) O(αN) Addition O(1)O(N)O(1)O(N)O(1)O(N) Repartition O(N) K-means O(αN) O(MN) O(αN)
Profiles of the processing time
Test data sets
Birch data sets Birch1Birch2Birch3
Experiments Bridge RD DD DR Random Swap
Experiments Bridge
Experiments Birch 2 Random Swap DD DR RD
Experiments Miss America
Quality comparisons (MSE) with 10 second time constraint 18:14:16:15:14:12:1 Average speed-up from RR to RD RD-variant Random Swap Repeated K-means Repeated Random Birch 2 ×10 6 Birch 1 ×10 8 Europe ×10 7 Miss America HouseBridge
Literature 1.P. Fränti and J. Kivijärvi, "Randomised local search algorithm for the clustering problem", Pattern Analysis and Applications, 3 (4), , P. Fränti, J. Kivijärvi and O. Nevalainen, "Tabu search algorithm for codebook generation in VQ", Pattern Recognition, 31 (8), 1139 ‑ 1148, August P. Fränti, O. Virmajoki and V. Hautamäki, “Efficiency of random swap based clustering", IAPR Int. Conf. on Pattern Recognition (ICPR’08), Tampa, FL, Dec P. Fränti, M. Tuononen and O. Virmajoki, "Deterministic and randomized local search algorithms for clustering", IEEE Int. Conf. on Multimedia and Expo, (ICME'08), Hannover, Germany, , June P. Fränti and O. Virmajoki, "On the efficiency of swap-based clustering", Int. Conf. on Adaptive and Natural Computing Algorithms (ICANNGA'09), Kuopio, Finland, LNCS 5495, , April 2009.
5.J. Chen, Q. Zhao, and P. Fränti, "Smart swap for more efficient clustering", Int. Conf. Green Circuits and Systems (ICGCS’10), Shanghai, China, , June B. Fritzke, The LBG-U method for vector quantization – an improvement over LBG inspired from neural networks. Neural Processing Letters 5(1) (1997) P. Fränti and O. Virmajoki, "Iterative shrinking method for clustering problems", Pat. Rec., 39 (5), , May T. Kaukoranta, P. Fränti and O. Nevalainen "Iterative split-and- merge algorithm for VQ codebook generation", Optical Engineering, 37 (10), , October H. Frigui and R. Krishnapuram, "Clustering by competitive agglomeration". Pattern Recognition, 30 (7), , July Literature
10.A. Likas, N. Vlassis and J.J. Verbeek, "The global k-means clustering algorithm", Pattern Recognition 36, , PAM (Kaufman and Rousseeuw, 1987) 12.CLARA (Kaufman and Rousseeuw in 1990) 13.CLARANS (A Clustering Algorithm based on Randomized Search) (Ng and Han 1994) 14.R.T. Ng and J. Han, “CLARANS: A method for clustering objects for spatial data mining,” IEEE Transactions on knowledge and data engineering, 14 (5), September/October Literature