Download presentation
Presentation is loading. Please wait.
1
K-MEANS ALGORITHM Jelena Vukovic 53/07 jeca.zr@gmail.com
2
Introduction Basic idea of k-means algorithm Detailed explenation Most common problems of the algorithm Applications Possible improvements Elektrotehnički fakultet u Beogradu 2/16
3
Bassic principles of algorithm Elektrotehnički fakultet u Beogradu 3/16 Given the set of points (x 1, x 2, …, x n ) Partition n points into k sets (n>k) (S 1, S 2, …, S k ) The goal is to minimize within-cluster sum of squares µ i is the mean of points in S i
4
The algorithm Initialize the number of means (k) Iterate: 1. Assign each point to the nearest mean 2. Move mean to center of its cluster Elektrotehnički fakultet u Beogradu 4/16
5
The algorithm Elektrotehnički fakultet u Beogradu 5/16 Assign points to nearest mean Move means
6
The algorithm The complexity is O(n * k * I * d) n – number of points k – number of clusters I – number of iterations d – number of attributes Elektrotehnički fakultet u Beogradu 6/16 Re-assign points
7
The algorithm Elektrotehnički fakultet u Beogradu 7/16
8
K nearest neighbors Very similar algorithm The decision is made based on the simple majority of the closest k neighbors In k-means the Euclidian distant measure is used Elektrotehnički fakultet u Beogradu 8/16
9
Some limitations of algorithm The number of clusters needs to be known in advance Initialization of means position Problems appear when clusters have different Shapes Sizes Density Elektrotehnički fakultet u Beogradu 9/16
10
Initial centroids problem Random distribution (the most common) Multiple runs Testing on a data sample Analyze the data Elektrotehnički fakultet u Beogradu 10/16
11
Different density Elektrotehnički fakultet u Beogradu 11/16 Original points3 Clusters
12
Non-globular shapes Elektrotehnički fakultet u Beogradu 12/16 Original points2 Clusters
13
Pros and cons Pros Simple to implement Fast Not highly demanding Cons K needs to be known Ellipsoid shape is assumed Requires some knowledge about data in advance Possibility of many loop turns, without significant changes in clusters Elektrotehnički fakultet u Beogradu 13/16
14
Applications of the algorithm Many different uses Computer vision Market segmentation Geostatic Astronomy etc Elektrotehnički fakultet u Beogradu 14/16
15
Improvements Pre-processing of the data in order to better estimate k Run multiple iteration in parallel with different centroid initialization Ignore possible errors to avoid non-standard cluster shapes Elektrotehnički fakultet u Beogradu 15/16
16
Thank you! Elektrotehnički fakultet u Beogradu 16/16
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.