Charity Morgan Functional Data Analysis April 12, 2005 Cluster Analysis Charity Morgan Functional Data Analysis April 12, 2005
Sources Everitt, B. S. (1979). Unresolved Problems in Cluster Analysis. Biometrics, 35, 169-181. Romesburg, H. C. (1984). Cluster Analysis for Researchers. Lifetime Learning Publications: Belmont. Everitt, B. S., Landau, S., & Leese, M. (2001). Cluster Analysis. Oxford University Press: New York
Outline Motivation Introduction Method Measure Proximity Choose Clustering Method Hierarchical Clustering Optimization Clustering Select Best Clustering
Motivation - An Example Dataset in this presentation comes from a paper on infant temperament. [Stern, H. S., Arcus, D., Kagan, J., Rubin, D. B., & Snidman, N. (1995). Using Mixture Models in Temperament Research. International Journal of Behavioral Development, 18, 407-423.] 76 infants were measured on 3 dimensions: motor activity (Motor), irritability (Cry), and fear response (Fear).
Motivation – An Example
Motivation Given a data set, can we find natural groupings in the data? How can we decide how many groups exist? Could there be subgroups within the groups?
Introduction – What is Cluster Analysis? Cluster analysis is a method to uncover groups in data. The group memberships of the data points are not known at the outset. Data points are placed into groups based on how “close” or “far apart” they are from each other.
Introduction – Examples of Cluster Analysis Astronomy: Faundez-Abans et al. (1996) used cluster analysis to classify 192 planetary nebulae. Psychiatry: Pilowsky et al. (1969) clustered 200 patients, using their responses to a depression symptom questionnaire. Archaeology: Hodson (1971) used a clustering technique to group hand axes found in the British Isles.
Methods – Measurement of Proximity Given n individuals X1,…,Xn, where Xi = (xi1,…,xip), we will create a dissimilarity matrix, D, where dij is the distance between individual i and individual j. There are many ways of defining distance.
Methods – Measurement of Proximity
Methods – Hierarchical Clustering Data is not partitioned into a set number of classes, but classification consists of a series of partitions. Results can be presented as a diagram known as a dendrogram. Can be agglomerative or divisive.
Methods – Hierarchical Clustering Agglomerative: first partition is n single member clusters; last partition is one cluster containing all n individuals. Divisive: first partition is one cluster containing all n individuals; last partition is n single member clusters.
Methods – Agglomerative Clustering Methods Single Linkage (Nearest Neighbor) Distance between groups is defined as that of the closest pair of individuals. Only need proximity matrix, not the original data. Tends to produce unbalanced and straggly clusters, especially in large data sets.
Methods – Agglomerative Clustering Methods
Methods – Agglomerative Clustering Methods
Methods – Agglomerative Clustering Methods Add individual 3 to the cluster containing individuals 4 and 5. Then merge the groups (1,2) and (3,4,5) into a single cluster.
Methods – Agglomerative Clustering Methods
Methods – Agglomerative Clustering Methods Complete Linkage (Furthest Neighbor) Distance between groups is that of the furthest pair of individuals. Tends to find compact clusters with equal diameters. Centroid Clustering Distance between groups is the distance between their centers. Requires original data.
Methods – Agglomerative Clustering Methods
Methods – Agglomerative Clustering Methods
Methods – Agglomerative Clustering Methods
Methods – Agglomerative Clustering Methods Final step will merge clusters (1,2) with (3,4,5).
Methods – Agglomerative Clustering Methods Ward’s Minimum Variance At each stage, the objective is to fuse two clusters based on keeping variance, or within-cluster sum of squares, small.
Methods – Agglomerative Clustering Methods i.e., want to minimize the increase in E, where, is the mean of the mth cluster for the kth variable. xml,k is the score on the kth variable for the lth object in the mth cluster.
Methods – Agglomerative Clustering Methods Tends to find same size, spherical clusters. Sensitive to outliers. Most widely used agglomerative technique.
Methods – Divisive Clustering Methods Can be computationally demanding if all 2k-1 – 1 possible divisions into two subclusters of a cluster of k objects are considered at each stage. Less commonly used than agglomerative methods
Methods – Hierarchical Clustering of Motivating Example Used an Euclidean distance matrix and Ward’s minimum variance technique.
Methods – Optimization Clustering Assumes number of clusters has already been fixed by the investigator. Basic idea: associated with each partition of the n individuals in the required number of groups, g, is an adequacy index c(n,g). This index is used to compare partitions.
Methods – Optimization Clustering Concepts of homogeneity and separation can be used to develop the adequacy index. Homogeneity: objects within a group should have a cohesive structure. Separation: groups should be well isolated from each other.
Methods – Optimization Clustering Criteria Decompose the total dispersion matrix, T, given by into T = W + B.
Methods – Optimization Clustering Criteria W is the within-group dispersion matrix, given by B is the between-group dispersion matrix, given by
Methods – Optimization Clustering Criteria Minimize trace(W) Equivalent to maximizing the trace(B). Maximizes the sum of the squared Euclidean distances between individuals and their group mean. Also known as the k-means algorithm. Not scale-invariant and tends to find spherical clusters.
Methods – Optimization Clustering Criteria Minimize det(W) Actually want to maximize det(T)/det(W), but T is the same for all possible partitions of n individuals into g groups. Can identify elliptical clusters and is scale-invariant. Tends to produce clusters that have an equal number of objects and are the same shape.
Methods – Optimization Clustering Criteria (a) Trace (W) (b) Det (W)
Methods – Optimization Clustering Criteria Minimize Wm is the dispersion matrix within the mth group, given by Can produce clusters of different shapes. Not often used.
Methods – Optimization Clustering of Motivating Example
Methods – Optimization Clustering of Motivating Example
Methods – Optimization Clustering of Motivating Example
Methods – Choosing the Optimal Number of Clusters Plot clustering criteria against number of groups and look for large changes in plot. Choose g to maximize C(g), where Choose g to minimize g2det(W).
Methods – Choosing the Optimal Number of Clusters Hypothesis tests Let J12(m) be the within-cluster sum of squares of the mth cluster. Let J22(m) be the WSS when the mth cluster is optimally divided in two. Reject the null hypothesis that the mth cluster is homogeneous if L(m) exceeds the critical value of a standard normal, where
Methods – Choosing the Optimal Number of Clusters Let Sg2 be the sum of squared deviations from cluster centroids. A division of the n objects into g2 clusters is significantly better than g1 clusters (g2>g1) if F*(g1,g2) exceeds the critical value of a F distribution with degrees of freedom p(g2-g1) and p(n-g2), where
The End