Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clustering Procedure Cheng Lei Department of Electrical and Computer Engineering University of Victoria April 16, 2015.

Similar presentations


Presentation on theme: "Clustering Procedure Cheng Lei Department of Electrical and Computer Engineering University of Victoria April 16, 2015."— Presentation transcript:

1 Clustering Procedure Cheng Lei rexlei86@uvic.ca Department of Electrical and Computer Engineering University of Victoria April 16, 2015

2 Outline ❖ Overview ❖ CLUSTER Procedure ❖ Clustering Methods

3 Overview Data: Distances Coordinates Clustering methods 11 methods supported FASTCLUS Procedure CPU time: proportional to the number of observations Use FASTCLUS for a preliminary cluster analysis Use CLUSTER to cluster the preliminary clusters hierarchically Principles Each observation begins in a cluster by itself Two closet clusters are merged to form a new one to replace the two old ones Repeat the merging step until only one cluster is left

4 Overview CLUSTER Procedure Not practical to very large data sets as CPU time is roughly proportional to the square or cube of the number of the observations Displays a history of the clustering process Shows statistics for estimating the number of clusters RMSSTD Pseudo F Pseudo T-squre Creates dendrogram Create output data sets for TREE procedure to output the cluster membership

5 CLUSTER Procedure PROC CLUSTER METHOD=method-name ; BY variables; COPY variables; FREQ variables; ID variables; RMSSTD variables; VAR variables;

6 Options RMSSTD Root mean squared standard deviation of a cluster Pseudo F The ratio of between-cluster variance to within cluster variance Pseudo T-square A measure of merging two clusters to a new cluster

7 RMSSTD : the within-group sum of squares of cluster k : the number of elements in cluster k : the number of variables

8 Pseudo F : the between-group sum of squares : the within-group sum of squares : the number of clusters at a certain step : the number of observations

9 Pseudo T-Square : within-cluster sum of squares of clusters K and L : number of observations in cluster k and L : between-cluster sum of squares

10 METHODS Average Linkage (AVE or AVERAGE) Centroid Method (CEN or CENTROID) Complete Linkage (COM or COMPLETE) Density Linkage (DEN or DENSITY) Maximum likelihood (EML) Flexible-Beta Method (FLE or FLEXIBLE) McQuitty’s Similarity Analysis (MCQ or MCQUITTY) Median Method (MED or MEDIAN) Single Linkage (SIN or SINGLE) Two-Stage Density Linkage (TWO or TWOSTAGE) Ward’s minimum-variance method (WAR or WARD)

11 Average Linkage Idea: Compute the distance between two clusters and it is defined as the average distance between pairs of observations, one in each cluster

12 Centroid Method Idea: Compute the Euclidean distance between two clusters

13 Next week’s work Do examples with SAS base language More reading about other procedures in SAS/STAT

14 Thank You!!!


Download ppt "Clustering Procedure Cheng Lei Department of Electrical and Computer Engineering University of Victoria April 16, 2015."

Similar presentations


Ads by Google