Dr. Unnikrishnan P.C. Professor, EEE

Dr. Unnikrishnan P.C. Professor, EEE
EE368 Soft Computing Dr. Unnikrishnan P.C. Professor, EEE

Module III Data Clustering Algorithms K-Means Clustering

Data Clustering with the K-Means Algorithm K=3

K-Means Clustering

KTU Clusters

Introduction ● Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait often according to some defined distance measure. What is Clustering? ● Clustering is dividing data points into homogeneous classes or clusters: ● Points in the same group are as similar as possible ● Points in different group are as dissimilar as possible ● When a collection of objects is given, we put objects into group based on similarity.

Applications of Clustering
● Marketing: finding groups of customers with similar behavior given a large database of customer data containing their properties and past buying records; ● Biology: classification of plants and animals given their features; ● Libraries: book ordering; ● Insurance: identifying groups of motor insurance policy holders with a high average claim cost; identifying frauds; ● City planning: identifying groups of houses according to their house type, value and geographical location; Clustering helps in identification of groups of houses on the basis of their value, type and geographical locations.

Applications of Clustering ……..
● Earthquake studies: clustering observed earthquake epicenters to identifydangerous zones; Clustering is used to study earthquake. Based on the areas hit by an earthquake in a region, clustering can help analyse the next probable location where earthquake can occur. ● Document classification: clustering weblog data to discover groups of similar access patterns.

Clustering Algorithms:

K-Means Clustering ● The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, where k < n. ● It is similar to the expectation maximization algorithm for mixtures of Gaussians in that they both attempt to find the centers of natural clusters in the data. ● It assumes that the object attributes form a vector space.

K-Means Clustering …… An algorithm for partitioning (or clustering) N data points into K disjoint subsets Sj containing data points so as to minimize the sum of squares criterion where xn is a vector representing the nth data point and j is the geometric centroid of the data points in Sj.

K-Means Clustering ● Simply speaking kmeans clustering is an algorithm to classify or to group the objects based on attributes/features into K number of group. ● K is positive integer number. ● The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid.

Common Distance measures
Distance measure will determine how the similarity of two elements is calculated and it will influence the shape of the clusters. They include: Euclidean distance Manhattan distance Maximum norm Inner product space Hamming distance

Euclidean distance

Manhattan distance is a rectilinear distance, named after the number of blocks north, south, east, or west a taxicab must travel on to reach its destination on the grid of streets in parts of New York City.

K-Means Clustering

How to choose k (no: of clusters)?

How the K-Mean Clustering algorithm works?

K-Means Algorithm-Example K=2

Example …… Step 1: Initialization: Randomly we choose following two centroids (k=2) for two clusters. m1=(1.0,1.0) and m2=(5.0,7.0).

Example …… Step 2: Thus we obtain two clusters containing: {1,2,3} and {4,5,6,7}

Example ……

Dr. Unnikrishnan P.C. Professor, EEE

Similar presentations

Presentation on theme: "Dr. Unnikrishnan P.C. Professor, EEE"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dr. Unnikrishnan P.C. Professor, EEE

Similar presentations

Presentation on theme: "Dr. Unnikrishnan P.C. Professor, EEE"— Presentation transcript:

Similar presentations

About project

Feedback