Download presentation
Presentation is loading. Please wait.
1
Pasi Fränti and Sami Sieranoja
How much k-means can be improved by using better initialization and repeats? Pasi Fränti and Sami Sieranoja P. Fränti and S. Sieranoja, ”How much k-means can be improved by using better initialization and repeats?", Pattern Recognition, 2019.
2
Introduction
3
SSE = sum-of-squared errors
Goal of k-means Input N points: X={x1, x2, …, xN} Output partition and k centroids: P={p1, p2, …, pk} C={c1, c2, …, ck} Objective function: SSE = sum-of-squared errors
4
Goal of k-means Output partition and k centroids: Objective function:
Input N points: X={x1, x2, …, xN} Output partition and k centroids: P={p1, p2, …, pk} C={c1, c2, …, ck} Objective function: Assumptions: SSE is suitable k is known
5
Using SSE objective function
6
K-means algorithm 6
7
K-means optimization steps
Assignment step: Centroid step: Before After
8
Examples
9
Iterate by k-means 1st iteration
10
Iterate by k-means 2nd iteration
11
Iterate by k-means 3rd iteration
12
Iterate by k-means 16th iteration
13
Iterate by k-means 17th iteration
14
Iterate by k-means 18th iteration
15
Iterate by k-means 19th iteration
16
Final result 25 iterations
17
Problems of k-means Distance of clusters
Cannot move centroids between clusters far away 17
18
Data and methodology
19
Clustering basic benchmark
Fränti and Sieranoja K-means properties on six clustering benchmark datasets Applied Intelligence, 2018. S1 S2 S3 S4 Unbalance 100 100 2000 100 2000 2000 100 100 Birch1 Birch2 DIM32 A1 A2 A3 19
20
Centroid index Requires ground truth
CI = Centroid index: P. Fränti, M. Rezaei and Q. Zhao "Centroid index: cluster level similarity measure” Pattern Recognition, 47 (9), , September 2014. 20
21
Centroid index example
CI=4 Missing centroids Too many centroids 21
22
Success rate How often CI=0?
17% CI=1 CI=2 CI=1 CI=0 CI=2 CI=2 22
23
Properties of k-means
24
Cluster overlap Definition
d1 = distance to nearest centroid d2 = distance to nearest in other cluster
25
Dependency on overlap S datasets
Success rates and CI-values: overlap increases S S S S 1 2 3 4 3% 11% 12% 26% CI=1.8 CI=1.4 CI=1.3 CI=0.9
26
Why overlap helps? Overlap = 7% Overlap = 22% 13 iterations
27
Linear dependency on clusters (k) Birch2 dataset
16%
28
Dependency on dimensions DIM datasets
Dimensions increases 32 64 128 256 512 1024 CI: 3.6 3.5 3.8 3.8 3.9 3.7 Success rate: 0%
29
Lack of overlap is the cause! G2 datasets
Dimensions increases Overlap Correlation: Overlap increases 0.91 Success
30
Effect of unbalance Unbalabce datasets
Success: Average CI: Problem originates from the random initialization. 0% 3.9
31
Summary of k-means properties
1. Overlap 3. Dimensionality Good! No direct effect 2. Number of clusters 4. Unbalance of cluster sizes Linear dependency Bad!
32
How to improve?
33
Repeated k-means (RKM)
Must include randomness Initialize Repeat 100 times K-means 33
34
How to initialize? Some obvious heuristics: Furthest point Sorting
Density Projection Clear state-of-the-art is missing: No single technique outperforms others in all cases. Initialization not significantly easier than the clustering itself. K-means can be used as fine-tuner with almost anything. Another desperate effort: Repeat it LOTS OF times 34
35
Initialization techniques Criteria
Simple to implement Lower (or equal) time complexity than k-means No additional parameters Include randomness 35
36
Requirements for initialization
Simple to implement Random centroids has 2 functions + 26 lines of code Repeated k-means 5 functions lines of code Simpler than k-means itself 2. Faster than k-means K-means real-time (<1 s) up to N10,000 O(IkN) 25N assuming I25 and k=N Must be faster than O(N2) 36
37
Simplicity of algorithms
Parameters Functions Lines of code Kinnunen, Sidoroff, Tuononen and Fränti, "Comparison of clustering methods: a case study of text-independent speaker modeling" Pattern Recognition Letters, 2011.
38
Genetic Algorithm (GA)
Alternative: A better algorithm Random Swap (RS) Genetic Algorithm (GA) CI = 0 P. Fränti, "Efficiency of random swap clustering", Journal of Big Data, 2018 CI = 0 P. Fränti, "Genetic algorithm with deterministic crossover for vector quantization", Pattern Recognition Letters, 2000.
39
Initialization techniques
40
Techniques considered
41
Random centroids
42
Random partitions
43
Random partitions
44
Furthest points (maxmin)
45
Furthest points (maxmin)
46
Projection-based initialization
Most common projection axis: Diagonal Principal axis (PCA) Principle curves Initialization: Uniform partition along axis Used in divisive clustering: Iterative split PCA = O(DN)-O(D2N) 46
47
Furthest point projection
48
Projection example (1) Good projection! Birch2 48
49
Projection example (2) Bad projection! Birch1 49
50
Projection example (3) Bad projection! Unbalance 50
51
More complex projections
I. Cleju, P. Fränti, X. Wu, "Clustering based on principal curve", Scandinavian Conf. on Image Analysis, LNCS vol. 3540, June 2005. 51
52
More complex projections Travelling salesman problem!
I. Cleju, P. Fränti, X. Wu, "Clustering based on principal curve", Scandinavian Conf. on Image Analysis, LNCS vol. 3540, June 2005. Travelling salesman problem! 52
53
Sorting heuristic Sorting Reference point 2 3 1 4
54
Density-based heuristics Luxburg
Luxburg’s technique: Selects k log(k) preliminary clusters using k-means Eliminate the smallest. Furthest point heuristic to select k centroids.
55
Splitting algorithm Two random points K-means Split
Select biggest cluster Select two random points Re-allocate points Tune by k-means locally K-means Split
56
Results
57
Success rates K-means (without repeats)
Average success rate No. of datasets never solved Most problems: a2, a3, unbalance, Birch1, Birch2
58
Cluster overlap High cluster overlap
After K-means Initial 58
59
Cluster overlap Low cluster overlap
G2 Cluster overlap Low cluster overlap After K-means Initial
60
Number of clusters Birch2 subsets
60
61
Dimensions 61
62
Unbalance 62
63
Success rates Repeated k-means
Average success rate Furthest point approaches solve unbalance No. of datasets never solved Still problems: Birch1, Birch2 63
64
How many repeats needed?
65
How many repeats needed?
Unbalance
66
Summary of results CI-values
K-means: CI= % Repeated K-means: CI= % Maxmin initialization: CI= % Both: CI= % Most application: Good enough! Accuracy vital: Find better method! (random swap) Cluster overlap most important factor
67
Effect of different factors
68
Conclusions How effective: Repeats + Maxmin reduces error 4.5 0.7
Is it enough: For most applications: YES If accuracy important: NO Important factors: Cluster overlap critical for k-means Dimensions does not matter
69
Random swap
70
Random swap (RS) Initialize Swap K-means Repeat 5000 times
Only 2 iterations P. Fränti, "Efficiency of random swap clustering", Journal of Big Data, 2018 CI=0 70
71
How many repeats? P. Fränti, "Efficiency of random swap clustering", Journal of Big Data, 2018
72
The end
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.