Download presentation
Presentation is loading. Please wait.
1
John Nicholas Owen Sarah Smith
Clustering Theory John Nicholas Owen Sarah Smith
2
What is clustering? The activity of grouping similar objects.
Clustering methods are useful for data reduction, for developing classification schemes and for suggesting or supporting hypotheses about the structure of the data
3
Steps to Clustering Pattern representation. The analyst identifies the number, type, and scale of features available to the clustering algorithm. Identify the pattern proximity relative to the data domain. Usually performed using the Euclidean distances. Grouping or Clustering of the data. Data abstraction. Assessment of output.
4
Creating Clusters There are two basic approaches for creating the clusters: Partitional Hierarchical
5
Partitional Theory The analyst evaluates and groups the data using statistical algorithms The most popular methods of partitioning include k-means Hierarchical agglomerative clustering Unsupervised Bayes Mode finding, or density based
6
k-means Clustering Clusters are defined by measuring the Euclidian distances between data points Requires the analyst to know something about the underlying data The analyst needs to provide the number of clusters to be performed. Then the software will perform a four step iterative process to cluster the data.
7
Step 1 Randomly assign the cluster center’s position.
8
Step 2 Assign each data point to its nearest “center point”
9
Step 3 Find the actual center of each of the new clusters
10
Step 4 Place the centroid in the new position
11
End State Repeat the four step process until the cluster is optimized
12
Heirarchical theory Does not generate a set of disjointed clusters
Top-down (divisive) or bottom-up (agglomerative) approach The bottom up approach being more common
13
Divisive Approach Generates a hierarchy of nested clusters that can be represented by a tree, called a dendrogram A dendrogram consists of many upside down U-shaped lines connecting data points in a hierarchical tree This method is favored by biologists because it may give more insights into the structure of the clusters than other methods
14
Dendrogram
15
Agglomerative Approach
Each individual data point starts by being alone its own group The groups closest to each are merged with one another This continues until all individual data points are in one single group
16
Agglomerative Clustering
Step 2 Step 1
17
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.