Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clustering Deviance From CART Analysis and Silhouette Widths

Similar presentations


Presentation on theme: "Clustering Deviance From CART Analysis and Silhouette Widths"— Presentation transcript:

1 Clustering Deviance From CART Analysis and Silhouette Widths
Michael A. Lindgren EWHALE Laboratory Institute of Arctic Biology University of Alaska Fairbanks March 09, 2011

2

3 Clustering Level Solution
Table of Deviance Summary Statistics **The previous graph was created with this data** Clustering Level Solution Min 1st Quart Median Mean 3rd Quart Max clus03_Dev clus04_Dev clus05_Dev 0.1195 0.1214 0.1604 0.8492 0.4993 clus06_Dev clus07_Dev clus08_Dev clus09_Dev clus10_Dev clus11_Dev clus12_Dev 0.6806 clus13_Dev clus14_Dev clus15_Dev 0.1096 0.3779 0.3986 1.3743 0.64 clus16_Dev 0.1084 0.3344 0.409 1.4127 0.5554 clus17_Dev 0.1198 0.3431 0.4768 1.5284 0.9333 clus18_Dev 0.1157 0.3089 0.4171 1.4201 0.5849 clus19_Dev 0.1404 0.4873 1.565 1.0354 16.163 clus20_Dev 0.2698 clus25_Dev clus30_Dev clus40_Dev 1.8659 clus50_Dev LOWEST VALUE -->

4

5 Average Silhouette Width
Overall Average Silhouette Widths **this is the data used in previous bar graphic** Cluster Level Average Silhouette Width 3 0.26 4 0.21 5 0.19 6 0.18 7 8 0.17 9 10 0.16 11 12 13 14 15 16 17 18 19 20 25 30 40 0.2 50

6 PAM Metrics Across Clustering Solutions
Michael A. Lindgren EWHALE Laboratory Institute of Arctic Biology University of Alaska Fairbanks March 02, 2011

7 Recap on PAM The algorithm searches for k representative objects (medoids) which are centrally located in the clusters they define. The representative object of a cluster, the medoid, is the object for which the average dissimilarity to all the objects in the cluster is minimal. Actually, the PAM algorithm minimizes the sum of dissimilarities instead of the average dissimilarity. Average distance (dissimilarity) to a medoid - If j is the medoid of cluster C, the average distance of all objects of C to j is calculated as follows: This measure gives an insight to the similarity of the cases that have been clustered around a given medoid are to that particular medoid. Therefore, as this number decreases in a clustering solution, the dissimilarity is minimized.

8 Red-Orange: indicates a higher dissimilarity
Clustering Solution (k) 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30 40 50 1 2 0.3239 0.3564 0.3909 0.3472 21 22 23 0.289 24 26 27 28 29 31 32 33 34 35 36 37 38 39 41 42 43 44 45 46 47 48 49 Number of Returned Clusters This graphic is a matrix of the “Average Dissimilarity ” measure for each cluster from all examined clustering levels. Using conditional formatting, the cell color is based on the dissimilarity value of that cluster. Cell Color Range: Red-Orange: indicates a higher dissimilarity Yellow-Green: indicates a lower dissimilarity

9 Red-Orange: indicates a higher average silhouette
Clustering Solution (k) 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30 40 50 1 2 21 22 23 24 26 27 28 29 31 32 33 34 35 36 37 38 39 41 42 43 44 45 46 47 48 0.1534 49 Number of Returned Clusters This graphic is a matrix of the “Average Silhouette Widths” quantified through the averaging of the silhouette widths for every sample within a given cluster. This is performed for each cluster in each of the clustering levels. The total average silhouette is the average of these values for each clustering level. Using conditional formatting, the cell color is based on the dissimilarity value of that cluster. Cell Color Range: Red-Orange: indicates a higher average silhouette Yellow-Green: indicates a lower average silhouette

10 This table is a way of interpreting the Average Silhouette Width for the entire solution, Also known as Silhouette Coefficient (SC) Range of SC Interpretation A strong structure has been found A reasonable structure has been found The structure is weak and could be artificial < 0.25 No substantial structure has been found


Download ppt "Clustering Deviance From CART Analysis and Silhouette Widths"

Similar presentations


Ads by Google