Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cluster Analysis.

Similar presentations


Presentation on theme: "Cluster Analysis."— Presentation transcript:

1 Cluster Analysis

2 Introduction Goal: Group individual units into subsets (clusters) of similar units based on observed variables Groups are not known in advance (Unsupervised Learning) Groupings made in terms of similarities/distances of variables between individual units

3 Similarity Measures

4 Similarity Coefficients for Binary Outcomes on 2 Units

5 Example – Diversity of Artifacts at 8 Canadian Forts

6 Similarity and Distance Measures

7 Similarity and Association for Variables
Binary Variables

8 Example – Diversity of Artifacts at 8 Canadian Forts

9 Hierarchical Clustering Mehods
Agglomerate Methods – Begin with individual units or variables and combine until a single cluster Linking Strategies for Combining Clusters: Single Linkage – Minimum distance between objects in clusters Complete Linkage – Maximum distance between objects in clusters Average Linkage – Mean distance between objects in clusters Divisive Methods – Begin with single cluster and split apart until each object is a cluster Dendogram – 2-dimensional diagram of process

10 Example – Clustering of 5 WNBA Players
n = 5 Players (Angel McCoughtry, Candace Parker, Maya Moore, Skylar Duggins, Tina Charles) p = 3 Variables (Rebounds, Assists, Points, each per 36 Minutes played)

11 Clustering of 5 WNBA Players – Single Linkage
Step 1: Closest 2 are AM, CP => Combine AM/CP Step 2: dM,AC = min( , ) = dS,AC = min( , ) = dT,AC = min( , ) = Smallest Distance in Table is (ACS) Step 3: dM,ACS = min( , , ) = dT,ACS = min( , , ) = Smallest Distance in Table is (ACMS) Step 4: Add T (ACMST)

12

13 Clustering of 5 WNBA Players – Complete Linkage
Step 1: Closest 2 are AM, CP => Combine AM/CP Step 2: dM,(AC) = max( , ) = dS,(AC) = max( , ) = dT,(AC) = max( , ) = Smallest Distance in Table is (ACM) Step 3: dS,(ACM) = max( , , ) = dT,(ACM) = max( , , ) = Smallest Distance in Table is (ACMT) Step 4: Add S (ACMST)

14

15 Clustering of 5 WNBA Players – Average Linkage
Step 1: Closest 2 are AM, CP => Combine AM/CP Step 2: dM,(AC) = mean( , ) = dS,(AC) = mean( , ) = dT,(AC) = mean( , ) = Smallest Distance in Table is (ACM) Step 3: dS,(ACM) = mean( , , ) = dT,(ACM) = mean( , , ) = Smallest Distance in Table is (ACMT) Step 4: Add S (ACMST)

16

17 Nonhierarchical Clustering Methods
Intended to cluster individual units, not variables into K clusters K can be selected a priori or by the process Computationally simpler than hierarchical methods and can be used on larger datasets Distance matrix is not computed and raw data need not be stored during run K-means Method Randomly partition units into k groups (using random seed) Go through all units (1-at-a-time), moving to group with nearest centroid, re-calculate centroids for exit/enter groups Continue until no units change groups

18 Example – 12 WNBA Players & K=2 Clusters
Give each player a random #, and sort so that 6 are in group 1 and 6 in group 2 Obtain the mean of the p variables by group (group centroids) Player-by-Player, Obtain distance from each centroid and move to closest group and re-compute group centroids Players 3,4,5,6 remain in group 1 Player 8 moves to group 2 and centroids re-calculated Player 12 distances measured from new centroids (n1=7, n2=5) and stays in group 1 Players 10,7,1 stay in group 2; Player 2 switches to group 1, centroids recalculated Player 11 remains in group 2; Player 9 switches to group 1, centroids recalculated


Download ppt "Cluster Analysis."

Similar presentations


Ads by Google