Presentation is loading. Please wait.

Presentation is loading. Please wait.

K-MEANS ALGORITHM Jelena Vukovic 53/07

Similar presentations


Presentation on theme: "K-MEANS ALGORITHM Jelena Vukovic 53/07"— Presentation transcript:

1 K-MEANS ALGORITHM Jelena Vukovic 53/07 jeca.zr@gmail.com

2 Introduction Basic idea of k-means algorithm Detailed explenation Most common problems of the algorithm Applications Possible improvements Elektrotehnički fakultet u Beogradu 2/16

3 Bassic principles of algorithm Elektrotehnički fakultet u Beogradu 3/16 Given the set of points (x 1, x 2, …, x n ) Partition n points into k sets (n>k) (S 1, S 2, …, S k ) The goal is to minimize within-cluster sum of squares µ i is the mean of points in S i

4 The algorithm Initialize the number of means (k) Iterate: 1. Assign each point to the nearest mean 2. Move mean to center of its cluster Elektrotehnički fakultet u Beogradu 4/16

5 The algorithm Elektrotehnički fakultet u Beogradu 5/16 Assign points to nearest mean Move means

6 The algorithm The complexity is O(n * k * I * d) n – number of points k – number of clusters I – number of iterations d – number of attributes Elektrotehnički fakultet u Beogradu 6/16 Re-assign points

7 The algorithm Elektrotehnički fakultet u Beogradu 7/16

8 K nearest neighbors Very similar algorithm The decision is made based on the simple majority of the closest k neighbors In k-means the Euclidian distant measure is used Elektrotehnički fakultet u Beogradu 8/16

9 Some limitations of algorithm The number of clusters needs to be known in advance Initialization of means position Problems appear when clusters have different Shapes Sizes Density Elektrotehnički fakultet u Beogradu 9/16

10 Initial centroids problem Random distribution (the most common) Multiple runs Testing on a data sample Analyze the data Elektrotehnički fakultet u Beogradu 10/16

11 Different density Elektrotehnički fakultet u Beogradu 11/16 Original points3 Clusters

12 Non-globular shapes Elektrotehnički fakultet u Beogradu 12/16 Original points2 Clusters

13 Pros and cons Pros Simple to implement Fast Not highly demanding Cons K needs to be known Ellipsoid shape is assumed Requires some knowledge about data in advance Possibility of many loop turns, without significant changes in clusters Elektrotehnički fakultet u Beogradu 13/16

14 Applications of the algorithm Many different uses Computer vision Market segmentation Geostatic Astronomy etc Elektrotehnički fakultet u Beogradu 14/16

15 Improvements Pre-processing of the data in order to better estimate k Run multiple iteration in parallel with different centroid initialization Ignore possible errors to avoid non-standard cluster shapes Elektrotehnički fakultet u Beogradu 15/16

16 Thank you! Elektrotehnički fakultet u Beogradu 16/16


Download ppt "K-MEANS ALGORITHM Jelena Vukovic 53/07"

Similar presentations


Ads by Google