Presentation is loading. Please wait.

Presentation is loading. Please wait.

John Nicholas Owen Sarah Smith

Similar presentations


Presentation on theme: "John Nicholas Owen Sarah Smith"— Presentation transcript:

1 John Nicholas Owen Sarah Smith
Clustering Theory John Nicholas Owen Sarah Smith

2 What is clustering? The activity of grouping similar objects.
Clustering methods are useful for data reduction, for developing classification schemes and for suggesting or supporting hypotheses about the structure of the data

3 Steps to Clustering Pattern representation. The analyst identifies the number, type, and scale of features available to the clustering algorithm. Identify the pattern proximity relative to the data domain. Usually performed using the Euclidean distances. Grouping or Clustering of the data. Data abstraction. Assessment of output.

4 Creating Clusters There are two basic approaches for creating the clusters: Partitional Hierarchical

5 Partitional Theory The analyst evaluates and groups the data using statistical algorithms The most popular methods of partitioning include k-means Hierarchical agglomerative clustering Unsupervised Bayes Mode finding, or density based

6 k-means Clustering Clusters are defined by measuring the Euclidian distances between data points Requires the analyst to know something about the underlying data The analyst needs to provide the number of clusters to be performed. Then the software will perform a four step iterative process to cluster the data.

7 Step 1 Randomly assign the cluster center’s position.

8 Step 2 Assign each data point to its nearest “center point”

9 Step 3 Find the actual center of each of the new clusters

10 Step 4 Place the centroid in the new position

11 End State Repeat the four step process until the cluster is optimized

12 Heirarchical theory Does not generate a set of disjointed clusters
Top-down (divisive) or bottom-up (agglomerative) approach The bottom up approach being more common

13 Divisive Approach Generates a hierarchy of nested clusters that can be represented by a tree, called a dendrogram A dendrogram consists of many upside down U-shaped lines connecting data points in a hierarchical tree This method is favored by biologists because it may give more insights into the structure of the clusters than other methods

14 Dendrogram

15 Agglomerative Approach
Each individual data point starts by being alone its own group The groups closest to each are merged with one another This continues until all individual data points are in one single group

16 Agglomerative Clustering
Step 2 Step 1

17 Questions?


Download ppt "John Nicholas Owen Sarah Smith"

Similar presentations


Ads by Google