Download presentation
Presentation is loading. Please wait.
Published byToni Bonny Modified over 9 years ago
1
SEEM4630 2011-2012 Tutorial 4 – Clustering
2
2 What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related to one another and different from (or unrelated to) the objects in other groups. A good clustering method will produce high quality clusters high intra-class similarity: cohesive within clusters low inter-class similarity: distinctive between clusters
3
3 Notion of a Cluster can be Ambiguous How many clusters? Four ClustersTwo Clusters Six Clusters
4
4 K-Means Clusteringfixed Euclidean Distance etc.
5
5 K-Means Clustering: Example Given: Means of the cluster k i, m i = (t i1 + t i2 + … + t im )/m Data {2, 4, 10, 12, 3, 20, 30, 11, 25} K = 2 Solution: m 1 = 2, m 2 = 4, K 1 = {2, 3}, and K 2 = {4, 10, 12, 20, 30, 11, 25} m 1 = 2.5, m 2 = 16 K 1 = {2, 3, 4}, and K 2 = {10, 12, 20, 30, 11, 25} m 1 = 3, m 2 = 18 K 1 = {2, 3, 4, 10}, and K 2 = {12, 20, 30, 11, 25} m 1 = 4.75, m 2 = 19.6 K 1 = {2, 3, 4, 10, 11, 12}, and K 2 = {20, 30, 25} m 1 = 7, m 2 = 25 K 1 = {2, 3, 4, 10, 11, 12}, and K 2 = {20, 30, 25}
6
6 K-Means Clustering: Evaluation Evaluation Sum of Squared Error (SSE) Given clusters, choose the one with the smallest error Data point in cluster C i Centroid of cluster C i
7
7 Limitations of K-means It is hard to determine a good K value The initial K centroids K-means has problems when the data contains outliers. Outliers can be handled better by hierarchical clustering and density-based clustering
8
8 Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree like diagram that records the sequences of merges or splits
9
9 Strengths of Hierarchical Clustering Do not have to assume any particular number of clusters Any desired number of clusters can be obtained by ‘cutting’ the dendrogram at the proper level Partition direction Agglomerative: starting with single elements and aggregating them into clusters Divisive: starting with the complete data set and dividing it into partitions
10
10 Agglomerative Hierarchical Clustering Basic algorithm is straightforward 1. Compute the proximity matrix 2. Let each data point be a cluster 3. Repeat 4. Merge the two closest clusters 5. Update the proximity matrix 6. Until only a single cluster remains Key operation is the computation of the proximity of two clusters Different approaches to define the distance between clusters
11
11 Hierarchical Clustering Define Inter-Cluster Similarity Min Max Group Average Distance between Centroids
12
12 Hierarchical Clustering: Min or Single Link I1I2I3I4I5 I10.000.240.220.370.34 I20.240.000.150.200.14 I30.220.150.000.150.28 I40.370.200.150.000.29 I50.340.140.280.290.00 I6 0.23 0.250.110.220.39 0.23 0.25 0.11 0.22 0.39 0.00 362541 0 0.05 0.1 0.15 0.2 I1I2{I3, I6}I4I5 I10.000.240.220.370.34 I20.240.000.150.200.14 {I3, I6}0.220.150.000.150.28 I40.370.200.150.000.29 I50.340.140.280.290.00 I1{I2, I5}{I3, I6}I4 I10.000.240.220.37 {I2, I5}0.240.000.150.20 {I3, I6}0.220.150.000.15 I40.370.200.150.00 I1{I2, I5,I3, I6}I4 I10.000.220.37 {I2, I5, I3, I6} {I4} 0.220.000.15 0.370.150.00 I1{I2, I5,I3, I6, I4} I10.000.22 {I2, I5, I3, I6, I4} 0.220.00 Euclidean distance
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.