John Nicholas Owen Sarah Smith

Slides:



Advertisements
Similar presentations
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Advertisements

Hierarchical Clustering
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Clustering Basic Concepts and Algorithms
PARTITIONAL CLUSTERING
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Introduction to Bioinformatics
2004/05/03 Clustering 1 Clustering (Part One) Ku-Yaw Chang Assistant Professor, Department of Computer Science and Information.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
6-1 ©2006 Raj Jain Clustering Techniques  Goal: Partition into groups so the members of a group are as similar as possible and different.
Clustering II.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Introduction to Bioinformatics - Tutorial no. 12
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Evaluating Performance for Data Mining Techniques
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Prepared by: Mahmoud Rafeek Al-Farra
By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
V. Clustering 인공지능 연구실 이승희 Text: Text mining Page:82-93.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Machine Learning Queens College Lecture 7: Clustering.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Clustering (1) Clustering Similarity measure Hierarchical clustering
Clustering Anna Reithmeir Data Mining Proseminar 2017
Data Mining: Basic Cluster Analysis
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Clustering CSC 600: Data Mining Class 21.
Chapter 15 – Cluster Analysis
Machine Learning Clustering: K-means Supervised Learning
Hierarchical Clustering
Data Mining K-means Algorithm
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Canadian Bioinformatics Workshops
Data Clustering Michael J. Watts
CSE 5243 Intro. to Data Mining
K-means and Hierarchical Clustering
Clustering.
CSE572, CBS598: Data Mining by H. Liu
Revision (Part II) Ke Chen
Information Organization: Clustering
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Revision (Part II) Ke Chen
Multivariate Statistical Methods
DATA MINING Introductory and Advanced Topics Part II - Clustering
Clustering John Owen Sarah Smith.
Hierarchical and Ensemble Clustering
CSE572, CBS572: Data Mining by H. Liu
Clustering Wei Wang.
Cluster Analysis.
Text Categorization Berlin Chen 2003 Reference:
Hierarchical Clustering
Clustering The process of grouping samples so that the samples are similar within each group.
Unsupervised Learning: Clustering
SEEM4630 Tutorial 3 – Clustering.
Presentation transcript:

John Nicholas Owen Sarah Smith Clustering Theory John Nicholas Owen Sarah Smith

What is clustering? The activity of grouping similar objects. Clustering methods are useful for data reduction, for developing classification schemes and for suggesting or supporting hypotheses about the structure of the data

Steps to Clustering Pattern representation. The analyst identifies the number, type, and scale of features available to the clustering algorithm. Identify the pattern proximity relative to the data domain. Usually performed using the Euclidean distances. Grouping or Clustering of the data. Data abstraction. Assessment of output.

Creating Clusters There are two basic approaches for creating the clusters: Partitional Hierarchical

Partitional Theory The analyst evaluates and groups the data using statistical algorithms The most popular methods of partitioning include k-means Hierarchical agglomerative clustering Unsupervised Bayes Mode finding, or density based

k-means Clustering Clusters are defined by measuring the Euclidian distances between data points Requires the analyst to know something about the underlying data The analyst needs to provide the number of clusters to be performed. Then the software will perform a four step iterative process to cluster the data.

Step 1 Randomly assign the cluster center’s position.

Step 2 Assign each data point to its nearest “center point”

Step 3 Find the actual center of each of the new clusters

Step 4 Place the centroid in the new position

End State Repeat the four step process until the cluster is optimized

Heirarchical theory Does not generate a set of disjointed clusters Top-down (divisive) or bottom-up (agglomerative) approach The bottom up approach being more common

Divisive Approach Generates a hierarchy of nested clusters that can be represented by a tree, called a dendrogram A dendrogram consists of many upside down U-shaped lines connecting data points in a hierarchical tree This method is favored by biologists because it may give more insights into the structure of the clusters than other methods

Dendrogram

Agglomerative Approach Each individual data point starts by being alone its own group The groups closest to each are merged with one another This continues until all individual data points are in one single group

Agglomerative Clustering Step 2 Step 1

Questions?