Download presentation
Presentation is loading. Please wait.
Published byJulian Sullivan Modified over 9 years ago
1
Clustering Procedure Cheng Lei rexlei86@uvic.ca Department of Electrical and Computer Engineering University of Victoria April 16, 2015
2
Outline ❖ Overview ❖ CLUSTER Procedure ❖ Clustering Methods
3
Overview Data: Distances Coordinates Clustering methods 11 methods supported FASTCLUS Procedure CPU time: proportional to the number of observations Use FASTCLUS for a preliminary cluster analysis Use CLUSTER to cluster the preliminary clusters hierarchically Principles Each observation begins in a cluster by itself Two closet clusters are merged to form a new one to replace the two old ones Repeat the merging step until only one cluster is left
4
Overview CLUSTER Procedure Not practical to very large data sets as CPU time is roughly proportional to the square or cube of the number of the observations Displays a history of the clustering process Shows statistics for estimating the number of clusters RMSSTD Pseudo F Pseudo T-squre Creates dendrogram Create output data sets for TREE procedure to output the cluster membership
5
CLUSTER Procedure PROC CLUSTER METHOD=method-name ; BY variables; COPY variables; FREQ variables; ID variables; RMSSTD variables; VAR variables;
6
Options RMSSTD Root mean squared standard deviation of a cluster Pseudo F The ratio of between-cluster variance to within cluster variance Pseudo T-square A measure of merging two clusters to a new cluster
7
RMSSTD : the within-group sum of squares of cluster k : the number of elements in cluster k : the number of variables
8
Pseudo F : the between-group sum of squares : the within-group sum of squares : the number of clusters at a certain step : the number of observations
9
Pseudo T-Square : within-cluster sum of squares of clusters K and L : number of observations in cluster k and L : between-cluster sum of squares
10
METHODS Average Linkage (AVE or AVERAGE) Centroid Method (CEN or CENTROID) Complete Linkage (COM or COMPLETE) Density Linkage (DEN or DENSITY) Maximum likelihood (EML) Flexible-Beta Method (FLE or FLEXIBLE) McQuitty’s Similarity Analysis (MCQ or MCQUITTY) Median Method (MED or MEDIAN) Single Linkage (SIN or SINGLE) Two-Stage Density Linkage (TWO or TWOSTAGE) Ward’s minimum-variance method (WAR or WARD)
11
Average Linkage Idea: Compute the distance between two clusters and it is defined as the average distance between pairs of observations, one in each cluster
12
Centroid Method Idea: Compute the Euclidean distance between two clusters
13
Next week’s work Do examples with SAS base language More reading about other procedures in SAS/STAT
14
Thank You!!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.