Download presentation
Presentation is loading. Please wait.
Published byEustace Hampton Modified over 5 years ago
1
Jagdish Gangolly State University of New York at Albany
Clustering Jagdish Gangolly State University of New York at Albany Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018
2
Clustering Clustering in S-Plus Objectives of Clustering Methods
Hierarchical Partitioning (iterative-relocation) Model-based methods Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018
3
Clustering in S-Plus You need to load the S-Plus cluster library
library(cluster) Data can be either in np matrix of measurement on each of the p variables for each object, or nn matrix of dissimilarities where d(i,j) in the matrix represents dissimilarity between object i and object j. daisy in the library cluster constructs the dissimilarity matrix. Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018
4
Objectives of Clustering
To classify data set into groups that are internally cohesive and externally isolated (loosely coupled) dataset (matrix, dataframe) distance measure optimisation criterion number of clusters (partitioning) shape of clusters, probability distribution (model-based) Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018
5
Clustering methods: Hierarchical I
Hierarchical Methods: Agglomerative methods: Start with each observation forming a separate group. Observations close to each other are successively merged. The results are displayed in the form of a dendrogram Divisive methods: Initial cluster consists of one cluster containing the whole dataset. This is successively split into ntwo smaller clusters until each cluster contains exactly a single object Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018
6
Clustering methods: Hierarchical II
Agglomerative Nesting: agnes(x, diss, metric, stand, method,…) Methods: average (group average) single (linkage), nearest neighbour method complete (linkage), furthest neighbour method ward (Ward’s method) weighted (weighted average linkage) Evaluation criterion: Agglomeration coefficient (AC) Results display: Dendrogram, Banner plot Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018
7
Clustering methods: Hierarchical III
hclust: hierarchical clustering hclust(dist, method, sim) dist: distances method: compact (complete linkage) average connected (single linkage) results displayed using plclust Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018
8
Clustering methods: Hierarchical IV
Divisive Analysis: diana(x, diss, metric, stand, …) Evaluation criterion: Divisive coefficient (DC) Results display: Dendrogram, Banner plot Monothetic Analysis: For binary data matrix. For each split, mona uses a single (well-chosen) variable mona(x) Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018
9
Clustering methods: Partitioning Methods I
Method for dividing the set of objects into k clusters; k needs to be specified by the user. k-means: Partitioning among Medoids: accepts a dissimilarity matrix, minimises the sum of dissimilarities (rather than distances) and so is more robust, and displays a silhoutte plot pam(data, k, diss, metric, stand,…) data: matrix or dataframe diss: T or F metric: euclidean or manhattan stand: T or F Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018
10
Clustering methods: Partitioning Methods II
Clustering large applications: Considers data subsets of fixed size to cluster very large datasets clara(x, k, metric, stand, samples, sampsize, …) Fanny: Fuzzy clustering. fanny(x, k, diss, metric, stand,…) Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018
11
Clustering: Displays of Results
Dendrograms: plot.agnes() plot.diana() plot.mona() Print: print.agnes() print.diana() print.mona() print.pam() print.fanny() print.clara() Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.