Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.

Slides:



Advertisements
Similar presentations
Multivariate Description. What Technique? Response variable(s)... Predictors(s) No Predictors(s) Yes... is one distribution summary regression models...
Advertisements

What we Measure vs. What we Want to Know
Multivariate Description. What Technique? Response variable(s)... Predictors(s) No Predictors(s) Yes... is one distribution summary regression models...
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Basic Gene Expression Data Analysis--Clustering
Clustering.
Cluster analysis Species Sequence P.symA AATGCCTGACGTGGGAAATCTTTAGGGCTAAGGTTTTTATTTCGTATGCTATGTAGCTTAAGGGTACTGACGGTAG P.xanA AATGCCTGACGTGGGAAATCTTTAGGGCTAAGGTTAATATTCCGTATGCTATGTAGCTTAAGGGTACTGACGGTAG.
Clustering Categorical Data The Case of Quran Verses
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Metrics, Algorithms & Follow-ups Profile Similarity Measures Cluster combination procedures Hierarchical vs. Non-hierarchical Clustering Statistical follow-up.
Introduction to Bioinformatics
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)
Clustering (1) Clustering Similarity measure Hierarchical clustering Model-based clustering Figures from the book Data Clustering by Gan et al.
6-1 ©2006 Raj Jain Clustering Techniques  Goal: Partition into groups so the members of a group are as similar as possible and different.
What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or.
Clustering (slide from Han and Kamber)
Today Unsupervised Learning Clustering K-means. EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms Ali Al-Shahib.
Lecture 4 Cluster analysis Species Sequence P.symA AATGCCTGACGTGGGAAATCTTTAGGGCTAAGGTTTTTATTTCGTATGCTATGTAGCTTAAGGGTACTGACGGTAG P.xanA AATGCCTGACGTGGGAAATCTTTAGGGCTAAGGTTAATATTCCGTATGCTATGTAGCTTAAGGGTACTGACGGTAG.
Applied Geostatistics Geostatistical techniques are designed to evaluate the spatial structure of a variable, or the relationship between a value measured.
Distance Measures Tan et al. From Chapter 2.
Introduction to Hierarchical Clustering Analysis Pengyu Hong 09/16/2005.
Introduction to Bioinformatics - Tutorial no. 12
Hmmmmm..Pie!. WALT Can you read a pie chart? WILF By the end of the lesson can you….. put a percentage next to a pie chart section? match the result to.
Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.
Dr. Michael R. Hyman Cluster Analysis. 2 Introduction Also called classification analysis and numerical taxonomy Goal: assign objects to groups so that.
COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong
CLUSTER ANALYSIS.
Why is it useful to use multivariate statistical methods for microfacies analysis? A microfacies is a multivariate object: each sample is characterized.
Classification. Similarity measures Each ordination or classification method is based (explicitely or implicitely) on some similarity measure (Two possible.
Multivariate Data Analysis  G. Quinn, M. Burgman & J. Carey 2003.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Cluster Analysis.
Clustering.
Chapter 2: Getting to Know Your Data
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set.
CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Slide 1 EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach,
Unsupervised Learning
Measurements and Data. Topics Types of Data Distance Measurement Data Transformation Forms of Data Data Quality.
CLUSTER ANALYSIS. What is Cluster analysis? Cluster analysis is a techniques for grouping objects, cases, entities on the basis of multiple variables.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Clustering (1) Clustering Similarity measure Hierarchical clustering
Machine Learning for the Quantified Self
CSE 4705 Artificial Intelligence
CZ5211 Topics in Computational Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Ananya Das Christman CS311 Fall 2016
Lecture 2-2 Data Exploration: Understanding Data
Canadian Bioinformatics Workshops
Data Clustering Michael J. Watts
Data Mining -Cluster Analysis. What is a clustering ? Clustering is the process of grouping data into classes, or clusters, so that objects within a cluster.
Jagdish Gangolly State University of New York at Albany
Data Mining Chapter 4 Cluster Analysis Part 1
Similarity and Dissimilarity
Multivariate community analysis
Clustering and Multidimensional Scaling
Information Organization: Clustering
Classification (Dis)similarity measures, Resemblance functions
Multivariate Statistical Methods
Jagdish Gangolly State University of New York at Albany
Data Mining – Chapter 4 Cluster Analysis Part 2
What Is Good Clustering?
Chapter_20 Cluster Analysis
Register variation: correlation, clusters and factors
Cluster Analysis.
Text Categorization Berlin Chen 2003 Reference:
Clustering The process of grouping samples so that the samples are similar within each group.
Presentation transcript:

Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit colour yellow, green, purple, red, orange Binary data:fruit / no fruit

Similarity matrix We define a similarity between units – like the correlation between continuous variables. (also can be a dissimilarity or distance matrix) A similarity can be constructed as an average of the similarities between the units on each variable. (can use weighted average) This provides a way of combining different types of variables.

relevant for continuous variables: Euclidean city block or Manhattan Distance metrics A B A B (also many other variations)

Similarity coefficients for binary data simple matching count if both units 0 or both units 1 Jaccard count only if both units 1 (and many other variants) simple matching can be extended to categorical data

Clustering methods hierarchical divisive put everything together and split monothetic / polythetic agglomerative keep everything separate and join the most similar points (classical cluster analysis) non-hierarchical k-means clustering

Agglomerative hierarchical Single linkage or nearest neighbour finds the minimum spanning tree: shortest tree that connects all points chaining Complete linkage or furthest neighbour Compact clusters of approximately equal size. (makes compact groups even when none exist) Average linkage methods between single and average linkage