Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pasi Fränti and Sami Sieranoja

Similar presentations


Presentation on theme: "Pasi Fränti and Sami Sieranoja"— Presentation transcript:

1 Pasi Fränti and Sami Sieranoja
How much k-means can be improved by using better initialization and repeats? Pasi Fränti and Sami Sieranoja P. Fränti and S. Sieranoja, ”How much k-means can be improved by using better initialization and repeats?", Pattern Recognition, 2019.

2 Introduction

3 SSE = sum-of-squared errors
Goal of k-means Input N points: X={x1, x2, …, xN} Output partition and k centroids: P={p1, p2, …, pk} C={c1, c2, …, ck} Objective function: SSE = sum-of-squared errors

4 Goal of k-means Output partition and k centroids: Objective function:
Input N points: X={x1, x2, …, xN} Output partition and k centroids: P={p1, p2, …, pk} C={c1, c2, …, ck} Objective function: Assumptions: SSE is suitable k is known

5 Using SSE objective function

6 K-means algorithm 6

7 K-means optimization steps
Assignment step: Centroid step: Before After

8 Examples

9 Iterate by k-means 1st iteration

10 Iterate by k-means 2nd iteration

11 Iterate by k-means 3rd iteration

12 Iterate by k-means 16th iteration

13 Iterate by k-means 17th iteration

14 Iterate by k-means 18th iteration

15 Iterate by k-means 19th iteration

16 Final result 25 iterations

17 Problems of k-means Distance of clusters
Cannot move centroids between clusters far away 17

18 Data and methodology

19 Clustering basic benchmark
Fränti and Sieranoja K-means properties on six clustering benchmark datasets Applied Intelligence, 2018. S1 S2 S3 S4 Unbalance 100 100 2000 100 2000 2000 100 100 Birch1 Birch2 DIM32 A1 A2 A3 19

20 Centroid index Requires ground truth
CI = Centroid index: P. Fränti, M. Rezaei and Q. Zhao "Centroid index: cluster level similarity measure” Pattern Recognition, 47 (9), , September 2014. 20

21 Centroid index example
CI=4 Missing centroids Too many centroids 21

22 Success rate How often CI=0?
17% CI=1 CI=2 CI=1 CI=0 CI=2 CI=2 22

23 Properties of k-means

24 Cluster overlap Definition
d1 = distance to nearest centroid d2 = distance to nearest in other cluster

25 Dependency on overlap S datasets
Success rates and CI-values: overlap increases S S S S 1 2 3 4 3% 11% 12% 26% CI=1.8 CI=1.4 CI=1.3 CI=0.9

26 Why overlap helps? Overlap = 7% Overlap = 22% 13 iterations

27 Linear dependency on clusters (k) Birch2 dataset
16%

28 Dependency on dimensions DIM datasets
Dimensions increases 32 64 128 256 512 1024 CI: 3.6 3.5 3.8 3.8 3.9 3.7 Success rate: 0%

29 Lack of overlap is the cause! G2 datasets
Dimensions increases Overlap Correlation: Overlap increases 0.91 Success

30 Effect of unbalance Unbalabce datasets
Success: Average CI: Problem originates from the random initialization. 0% 3.9

31 Summary of k-means properties
1. Overlap 3. Dimensionality Good! No direct effect 2. Number of clusters 4. Unbalance of cluster sizes Linear dependency Bad!

32 How to improve?

33 Repeated k-means (RKM)
Must include randomness Initialize Repeat 100 times K-means 33

34 How to initialize? Some obvious heuristics: Furthest point Sorting
Density Projection Clear state-of-the-art is missing: No single technique outperforms others in all cases. Initialization not significantly easier than the clustering itself. K-means can be used as fine-tuner with almost anything. Another desperate effort: Repeat it LOTS OF times 34

35 Initialization techniques Criteria
Simple to implement Lower (or equal) time complexity than k-means No additional parameters Include randomness 35

36 Requirements for initialization
Simple to implement Random centroids has 2 functions + 26 lines of code Repeated k-means 5 functions lines of code Simpler than k-means itself 2. Faster than k-means K-means real-time (<1 s) up to N10,000 O(IkN)  25N assuming I25 and k=N Must be faster than O(N2) 36

37 Simplicity of algorithms
Parameters Functions Lines of code Kinnunen, Sidoroff, Tuononen and Fränti, "Comparison of clustering methods: a case study of text-independent speaker modeling" Pattern Recognition Letters, 2011.

38 Genetic Algorithm (GA)
Alternative: A better algorithm Random Swap (RS) Genetic Algorithm (GA) CI = 0 P. Fränti, "Efficiency of random swap clustering", Journal of Big Data, 2018 CI = 0 P. Fränti, "Genetic algorithm with deterministic crossover for vector quantization", Pattern Recognition Letters, 2000.

39 Initialization techniques

40 Techniques considered

41 Random centroids

42 Random partitions

43 Random partitions

44 Furthest points (maxmin)

45 Furthest points (maxmin)

46 Projection-based initialization
Most common projection axis: Diagonal Principal axis (PCA) Principle curves Initialization: Uniform partition along axis Used in divisive clustering: Iterative split PCA = O(DN)-O(D2N) 46

47 Furthest point projection

48 Projection example (1) Good projection! Birch2 48

49 Projection example (2) Bad projection! Birch1 49

50 Projection example (3) Bad projection! Unbalance 50

51 More complex projections
I. Cleju, P. Fränti, X. Wu, "Clustering based on principal curve", Scandinavian Conf. on Image Analysis, LNCS vol. 3540, June 2005. 51

52 More complex projections Travelling salesman problem!
I. Cleju, P. Fränti, X. Wu, "Clustering based on principal curve", Scandinavian Conf. on Image Analysis, LNCS vol. 3540, June 2005. Travelling salesman problem! 52

53 Sorting heuristic Sorting Reference point 2 3 1 4

54 Density-based heuristics Luxburg
Luxburg’s technique: Selects k log(k) preliminary clusters using k-means Eliminate the smallest. Furthest point heuristic to select k centroids.

55 Splitting algorithm Two random points K-means Split
Select biggest cluster Select two random points Re-allocate points Tune by k-means locally K-means Split

56 Results

57 Success rates K-means (without repeats)
Average success rate No. of datasets never solved Most problems: a2, a3, unbalance, Birch1, Birch2

58 Cluster overlap High cluster overlap
After K-means Initial 58

59 Cluster overlap Low cluster overlap
G2 Cluster overlap Low cluster overlap After K-means Initial

60 Number of clusters Birch2 subsets
60

61 Dimensions 61

62 Unbalance 62

63 Success rates Repeated k-means
Average success rate Furthest point approaches solve unbalance No. of datasets never solved Still problems: Birch1, Birch2 63

64 How many repeats needed?

65 How many repeats needed?
Unbalance

66 Summary of results CI-values
K-means: CI= % Repeated K-means: CI= % Maxmin initialization: CI= % Both: CI= % Most application: Good enough! Accuracy vital: Find better method! (random swap) Cluster overlap most important factor

67 Effect of different factors

68 Conclusions How effective: Repeats + Maxmin reduces error 4.5  0.7
Is it enough: For most applications: YES If accuracy important: NO Important factors: Cluster overlap critical for k-means Dimensions does not matter

69 Random swap

70 Random swap (RS) Initialize Swap K-means Repeat 5000 times
Only 2 iterations P. Fränti, "Efficiency of random swap clustering", Journal of Big Data, 2018 CI=0 70

71 How many repeats? P. Fränti, "Efficiency of random swap clustering", Journal of Big Data, 2018

72 The end


Download ppt "Pasi Fränti and Sami Sieranoja"

Similar presentations


Ads by Google