John Nicholas Owen Sarah Smith

Slides:

Advertisements

Similar presentations

SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.

Advertisements

Hierarchical Clustering

Cluster Analysis: Basic Concepts and Algorithms

1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.

Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.

Clustering Basic Concepts and Algorithms

PARTITIONAL CLUSTERING

Data Mining Cluster Analysis: Basic Concepts and Algorithms

Introduction to Bioinformatics

2004/05/03 Clustering 1 Clustering (Part One) Ku-Yaw Chang Assistant Professor, Department of Computer Science and Information.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.

6-1 ©2006 Raj Jain Clustering Techniques  Goal: Partition into groups so the members of a group are as similar as possible and different.

Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.

Introduction to Bioinformatics - Tutorial no. 12

Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:

© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.

Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.

Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress.

Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz

Evaluating Performance for Data Mining Techniques

1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.

Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.

Prepared by: Mahmoud Rafeek Al-Farra

By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets.

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.

V. Clustering 인공지능 연구실 이승희 Text: Text mining Page:82-93.

Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.

Machine Learning Queens College Lecture 7: Clustering.

Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.

Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Clustering Algorithms Sunida Ratanothayanon. What is Clustering?

Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.

DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.

Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.

Clustering (1) Clustering Similarity measure Hierarchical clustering

Clustering Anna Reithmeir Data Mining Proseminar 2017

Data Mining: Basic Cluster Analysis

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering

Clustering CSC 600: Data Mining Class 21.

Chapter 15 – Cluster Analysis

Machine Learning Clustering: K-means Supervised Learning

Hierarchical Clustering

Data Mining K-means Algorithm

Dimension reduction : PCA and Clustering by Agnieszka S. Juncker

Canadian Bioinformatics Workshops

Data Clustering Michael J. Watts

CSE 5243 Intro. to Data Mining

K-means and Hierarchical Clustering

CSE572, CBS598: Data Mining by H. Liu

Revision (Part II) Ke Chen

Information Organization: Clustering

Data Mining 資料探勘分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Revision (Part II) Ke Chen

Multivariate Statistical Methods

DATA MINING Introductory and Advanced Topics Part II - Clustering

Clustering John Owen Sarah Smith.

Hierarchical and Ensemble Clustering

CSE572, CBS572: Data Mining by H. Liu

Clustering Wei Wang.

Cluster Analysis.

Text Categorization Berlin Chen 2003 Reference:

Hierarchical Clustering

Clustering The process of grouping samples so that the samples are similar within each group.

Unsupervised Learning: Clustering

SEEM4630 Tutorial 3 – Clustering.

Presentation transcript:

John Nicholas Owen Sarah Smith Clustering Theory John Nicholas Owen Sarah Smith

What is clustering? The activity of grouping similar objects. Clustering methods are useful for data reduction, for developing classification schemes and for suggesting or supporting hypotheses about the structure of the data

Steps to Clustering Pattern representation. The analyst identifies the number, type, and scale of features available to the clustering algorithm. Identify the pattern proximity relative to the data domain. Usually performed using the Euclidean distances. Grouping or Clustering of the data. Data abstraction. Assessment of output.

Creating Clusters There are two basic approaches for creating the clusters: Partitional Hierarchical

Partitional Theory The analyst evaluates and groups the data using statistical algorithms The most popular methods of partitioning include k-means Hierarchical agglomerative clustering Unsupervised Bayes Mode finding, or density based

k-means Clustering Clusters are defined by measuring the Euclidian distances between data points Requires the analyst to know something about the underlying data The analyst needs to provide the number of clusters to be performed. Then the software will perform a four step iterative process to cluster the data.

Step 1 Randomly assign the cluster center’s position.

Step 2 Assign each data point to its nearest “center point”

Step 3 Find the actual center of each of the new clusters

Step 4 Place the centroid in the new position

End State Repeat the four step process until the cluster is optimized

Heirarchical theory Does not generate a set of disjointed clusters Top-down (divisive) or bottom-up (agglomerative) approach The bottom up approach being more common

Divisive Approach Generates a hierarchy of nested clusters that can be represented by a tree, called a dendrogram A dendrogram consists of many upside down U-shaped lines connecting data points in a hierarchical tree This method is favored by biologists because it may give more insights into the structure of the clusters than other methods

Dendrogram

Agglomerative Approach Each individual data point starts by being alone its own group The groups closest to each are merged with one another This continues until all individual data points are in one single group

Agglomerative Clustering Step 2 Step 1

Questions?