DATA MINING CLUSTERING ANALYSIS
Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested in clustering of balls of the three different colours into three different groups. The balls of same colour are clustered into a group as shown below : Concept Definition (Cluster, Cluster analysis)
Data Mining (by R.S.K. Baber) 3 CLUSTERING Which is a good cluster? Data structures in data mining / clustering Types of data in cluster analysis Types of clustering K-means: Concept Algorithm Example Comments
Data Mining (by R.S.K. Baber) 4 The K-Means Clustering Method Example K=2 Arbitrarily choose K object as initial cluster center Assign each objects to most similar center
Data Mining (by R.S.K. Baber) 5 The K-Means Clustering Method Example K=2 Arbitrarily choose K object as initial cluster center Assign each objects to most similar center Update the cluster means reassign
Data Mining (by R.S.K. Baber) 6 The K-Means Clustering Method Example K=2 Arbitrarily choose K object as initial cluster center Assign each objects to most similar center Update the cluster means reassign
Data Mining (by R.S.K. Baber) 7 The K-Means Clustering Method How it works? Suppose, we have 8 points A1(2, 10) A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9) and 3 clusters initially. Initial cluster centers are A1(2, 10), A4(5, 8) and A7(1, 2). Distance function between two points a=(x1, y1) and b=(x2, y2) is d(a, b) = |x2 – x1| + |y2 – y1|.
Data Mining (by R.S.K. Baber) 8 The K-Means Clustering Method Iteration # 1: (2, 10) (5, 8) (1, 2) PointDist Mean 1Dist Mean 2Dist Mean 3Cluster A1(2, 10) A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9)
Data Mining (by R.S.K. Baber) 9 The K-Means Clustering Method Iteration # 1: Pointmean1 x1, y1x2, y2 (2, 10) (2, 10) ρ(a, b) = |x2 – x1| + |y2 – y1| ρ(point, mean1) = |x2 – x1| + |y2 – y1| = |2 – 2| + |10 – 10| = = 0 Pointmean2 x1, y1x2, y2 (2, 10) (5, 8) ρ(a, b) = |x2 – x1| + |y2 – y1| ρ(point, mean2) = |x2 – x1| + |y2 – y1| = |5 – 2| + |8 – 10| = = 5 Pointmean3 x1, y1x2, y2 (2, 10) (1, 2) ρ(a, b) = |x2 – x1| + |y2 – y1| ρ(point, mean2) = |x2 – x1| + |y2 – y1| = |1 – 2| + |2 – 10| = = 9
Data Mining (by R.S.K. Baber) 10 The K-Means Clustering Method Iteration # 1: (2, 10) (5, 8) (1, 2) PointDist Mean 1Dist Mean 2Dist Mean 3Cluster A1(2, 10)0591 A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9)
Data Mining (by R.S.K. Baber) 11 The K-Means Clustering Method Iteration # 1: (2, 10) (5, 8) (1, 2) PointDist Mean 1Dist Mean 2Dist Mean 3Cluster A1(2, 10)0591 A2(2, 5)5643 A3(8, 4)12792 A4(5, 8)50102 A5(7, 5)10592 A6(6, 4)10572 A7(1, 2)91003 A8(4, 9)32102
Data Mining (by R.S.K. Baber) 12 The K-Means Clustering Method Iteration # 1: New clusters: Cluster 1: (2, 10) Cluster 2: (8, 4) (5, 8) (7, 5) (6, 4) (4, 9) Cluster 3: (2, 5) (1, 2) New means: For Cluster 1, we only have one point A1(2, 10), which was the old mean, so the cluster center remains the same. Cluster 2: ( ( )/5, ( )/5 ) = (6, 6) Cluster 3: ( (2+1)/2, (5+2)/2 ) = (1.5, 3.5)
Data Mining (by R.S.K. Baber) 13 The K-Means Clustering Method After Iteration 1:
Data Mining (by R.S.K. Baber) 14 The K-Means Clustering Method After Iteration 2 & 3:
Data Mining (by R.S.K. Baber) 15