Charity Morgan Functional Data Analysis April 12, 2005

Charity Morgan Functional Data Analysis April 12, 2005
Cluster Analysis Charity Morgan Functional Data Analysis April 12, 2005

Sources Everitt, B. S. (1979). Unresolved Problems in Cluster Analysis. Biometrics, 35, Romesburg, H. C. (1984). Cluster Analysis for Researchers. Lifetime Learning Publications: Belmont. Everitt, B. S., Landau, S., & Leese, M. (2001). Cluster Analysis. Oxford University Press: New York

Outline Motivation Introduction Method Measure Proximity
Choose Clustering Method Hierarchical Clustering Optimization Clustering Select Best Clustering

Motivation - An Example
Dataset in this presentation comes from a paper on infant temperament. [Stern, H. S., Arcus, D., Kagan, J., Rubin, D. B., & Snidman, N. (1995). Using Mixture Models in Temperament Research. International Journal of Behavioral Development, 18, ] 76 infants were measured on 3 dimensions: motor activity (Motor), irritability (Cry), and fear response (Fear).

Motivation – An Example

Motivation Given a data set, can we find natural groupings in the data? How can we decide how many groups exist? Could there be subgroups within the groups?

Introduction – What is Cluster Analysis?
Cluster analysis is a method to uncover groups in data. The group memberships of the data points are not known at the outset. Data points are placed into groups based on how “close” or “far apart” they are from each other.

Introduction – Examples of Cluster Analysis
Astronomy: Faundez-Abans et al. (1996) used cluster analysis to classify 192 planetary nebulae. Psychiatry: Pilowsky et al. (1969) clustered 200 patients, using their responses to a depression symptom questionnaire. Archaeology: Hodson (1971) used a clustering technique to group hand axes found in the British Isles.

Methods – Measurement of Proximity
Given n individuals X1,…,Xn, where Xi = (xi1,…,xip), we will create a dissimilarity matrix, D, where dij is the distance between individual i and individual j. There are many ways of defining distance.

Methods – Measurement of Proximity

Methods – Hierarchical Clustering
Data is not partitioned into a set number of classes, but classification consists of a series of partitions. Results can be presented as a diagram known as a dendrogram. Can be agglomerative or divisive.

Methods – Hierarchical Clustering
Agglomerative: first partition is n single member clusters; last partition is one cluster containing all n individuals. Divisive: first partition is one cluster containing all n individuals; last partition is n single member clusters.

Methods – Agglomerative Clustering Methods
Single Linkage (Nearest Neighbor) Distance between groups is defined as that of the closest pair of individuals. Only need proximity matrix, not the original data. Tends to produce unbalanced and straggly clusters, especially in large data sets.

Add individual 3 to the cluster containing individuals 4 and 5. Then merge the groups (1,2) and (3,4,5) into a single cluster.

Complete Linkage (Furthest Neighbor) Distance between groups is that of the furthest pair of individuals. Tends to find compact clusters with equal diameters. Centroid Clustering Distance between groups is the distance between their centers. Requires original data.

Final step will merge clusters (1,2) with (3,4,5).

Ward’s Minimum Variance At each stage, the objective is to fuse two clusters based on keeping variance, or within-cluster sum of squares, small.

i.e., want to minimize the increase in E, where, is the mean of the mth cluster for the kth variable. xml,k is the score on the kth variable for the lth object in the mth cluster.

Tends to find same size, spherical clusters. Sensitive to outliers. Most widely used agglomerative technique.

Methods – Divisive Clustering Methods
Can be computationally demanding if all 2k-1 – 1 possible divisions into two subclusters of a cluster of k objects are considered at each stage. Less commonly used than agglomerative methods

Methods – Hierarchical Clustering of Motivating Example
Used an Euclidean distance matrix and Ward’s minimum variance technique.

Methods – Optimization Clustering
Assumes number of clusters has already been fixed by the investigator. Basic idea: associated with each partition of the n individuals in the required number of groups, g, is an adequacy index c(n,g). This index is used to compare partitions.

Methods – Optimization Clustering
Concepts of homogeneity and separation can be used to develop the adequacy index. Homogeneity: objects within a group should have a cohesive structure. Separation: groups should be well isolated from each other.

Methods – Optimization Clustering Criteria
Decompose the total dispersion matrix, T, given by into T = W + B.

W is the within-group dispersion matrix, given by B is the between-group dispersion matrix, given by

Minimize trace(W) Equivalent to maximizing the trace(B). Maximizes the sum of the squared Euclidean distances between individuals and their group mean. Also known as the k-means algorithm. Not scale-invariant and tends to find spherical clusters.

Minimize det(W) Actually want to maximize det(T)/det(W), but T is the same for all possible partitions of n individuals into g groups. Can identify elliptical clusters and is scale-invariant. Tends to produce clusters that have an equal number of objects and are the same shape.

(a) Trace (W) (b) Det (W)

Minimize Wm is the dispersion matrix within the mth group, given by Can produce clusters of different shapes. Not often used.

Methods – Optimization Clustering of Motivating Example

Methods – Choosing the Optimal Number of Clusters
Plot clustering criteria against number of groups and look for large changes in plot. Choose g to maximize C(g), where Choose g to minimize g2det(W).

Hypothesis tests Let J12(m) be the within-cluster sum of squares of the mth cluster. Let J22(m) be the WSS when the mth cluster is optimally divided in two. Reject the null hypothesis that the mth cluster is homogeneous if L(m) exceeds the critical value of a standard normal, where

Let Sg2 be the sum of squared deviations from cluster centroids. A division of the n objects into g2 clusters is significantly better than g1 clusters (g2>g1) if F*(g1,g2) exceeds the critical value of a F distribution with degrees of freedom p(g2-g1) and p(n-g2), where

The End

Charity Morgan Functional Data Analysis April 12, 2005

Similar presentations

Presentation on theme: "Charity Morgan Functional Data Analysis April 12, 2005"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Charity Morgan Functional Data Analysis April 12, 2005

Similar presentations

Presentation on theme: "Charity Morgan Functional Data Analysis April 12, 2005"— Presentation transcript:

Similar presentations

About project

Feedback