Classification. Similarity measures Each ordination or classification method is based (explicitely or implicitely) on some similarity measure (Two possible.

Slides:



Advertisements
Similar presentations
Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.
Advertisements

Classification: Cluster Analysis and Related Techniques Tanya, Caroline, Nick.
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Nonmetric Multidimensional Scaling input data are ranks: most similar pair AB
Clustering.
Cluster analysis Species Sequence P.symA AATGCCTGACGTGGGAAATCTTTAGGGCTAAGGTTTTTATTTCGTATGCTATGTAGCTTAAGGGTACTGACGGTAG P.xanA AATGCCTGACGTGGGAAATCTTTAGGGCTAAGGTTAATATTCCGTATGCTATGTAGCTTAAGGGTACTGACGGTAG.
Permutation Tests Hal Whitehead BIOL4062/5062.
An Introduction to Multivariate Analysis
Introduction to Multivariate Analysis Biology 4605/7220 Chih-Lin Wei Canadian Health Oceans Network Postdoc Fellow Ocean Science Centre, MUN.
TYPES OF DATA. Qualitative vs. Quantitative Data A qualitative variable is one in which the “true” or naturally occurring levels or categories taken by.
CLUSTERING PROXIMITY MEASURES
Agricultural and Biological Statistics
Mantel Test Evaluates correlation between distance, similarity, correlation or dissimilarity matrices Null: no relationship between matrices Pearson correlation.
Introduction to Bioinformatics
Cluster Analysis Hal Whitehead BIOL4062/5062. What is cluster analysis? Non-hierarchical cluster analysis –K-means Hierarchical divisive cluster analysis.
Terminology species data = the measured variables we want to explain (response or dependent variables) environmental data = the variables we use for explaining.
Correlation and Autocorrelation
Description & Analysis of community composition. The individualistic hypothesis Henry Gleason.
Nominal Level Measurement n numbers used as ways to identify or name categories n numbers do not indicate degrees of a variable but simple groupings of.
Rules for means Rule 1: If X is a random variable and a and b are fixed numbers, then Rule 2: If X and Y are random variables, then.
1 Data Analysis  Data Matrix Variables ObjectsX1X1 X2X2 X3X3 …XPXP n.
10/17/071 Read: Ch. 15, GSF Comparing Ecological Communities Part Two: Ordination.
Chapter 6 Distance Measures From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
1 Cluster Analysis EPP 245 Statistical Analysis of Laboratory Data.
Multivariate Data Analysis Chapter 9 - Cluster Analysis
From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Dr. Michael R. Hyman Cluster Analysis. 2 Introduction Also called classification analysis and numerical taxonomy Goal: assign objects to groups so that.
© 2002 Thomson / South-Western Slide 1-1 Chapter 1 Introduction to Statistics with Excel.
INTRODUCTION TO STATISTICS Yrd. Doç. Dr. Elif TUNA.
Chapter 3 Statistical Concepts.
Introduction to the gradient analysis. Community concept (from Mike Austin)
Descriptive Statistics And related matters. Two families of statistics Descriptive statistics – procedures for summarizing, organizing, graphing, and,
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Chapter 1 Introduction to Statistics. Statistical Methods Were developed to serve a purpose Were developed to serve a purpose The purpose for each statistical.
Math 3120 Differential Equations with Boundary Value Problems
Examining Relationships in Quantitative Research
Multivariate Data Analysis  G. Quinn, M. Burgman & J. Carey 2003.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 16.
Distances Between Genes and Samples Naomi Altman Oct. 06.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Types of Data How to Calculate Distance? Dr. Ryan Benton January 29, 2009.
1 Matrix Algebra and Random Vectors Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Introduction. What is/are Statistics? Tools for organizing and summarizing data Tests and estimates for generalizations.
In Stat-I, we described data by three different ways. Qualitative vs Quantitative Discrete vs Continuous Measurement Scales Describing Data Types.
Introduction to Multivariate Analysis and Multivariate Distances Hal Whitehead BIOL4062/5062.
Clustering / Scaling. Cluster Analysis Objective: – Partitions observations into meaningful groups with individuals in a group being more “similar” to.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Université d’Ottawa / University of Ottawa 2003 Bio 8102A Applied Multivariate Biostatistics L4.1 Lecture 4: Multivariate distance measures l The concept.
Data, Tables & Graphs October 24, 2016 BIOL 260
Introduction to Quantitative Research
Inference about the slope parameter and correlation
More complex (multidimensional) methods
Correlational Studies
Computing Reliability
Discrimination and Classification
Chapter 1 Introduction to Statistics with Excel
Collaborative Filtering Nearest Neighbor Approach
Multivariate community analysis
Clustering and Multidimensional Scaling
Classification (Dis)similarity measures, Resemblance functions
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Matrix Algebra and Random Vectors
Multivariate Statistical Methods
Data Mining – Chapter 4 Cluster Analysis Part 2
(A) Hierarchical clustering was performed to identify groups of patients with similar RNASeq expression of 20 genes associated with reduced survivability.
Multidimensional Scaling
数据的矩阵描述.
Register variation: correlation, clusters and factors
Text Categorization Berlin Chen 2003 Reference:
Presentation transcript:

Classification

Similarity measures Each ordination or classification method is based (explicitely or implicitely) on some similarity measure (Two possible formulations of ordination problem)

Similarities (dissimilarities, resemblance functions) based on qualitative/quantitative data Other indices used for sample similarity and for species similarity Similarity of two samples has a meaning by itself: similarity of two species has meaning only in relation to the data set. Species set is „fixed“, samples are random selection from a population

Sample similarity based on qualitative data SörensenJacquard d - number of species absent from both samples (usually not used)

Species similarity based on presence absence d - number of quadrats without both species - absolutely necessary

Transformation is an algebraic function X ij ’=f(X ij ) which is applied independently of the other values. Standardization is done either with respect to the values of other species in the sample (standardization by samples) or with respect to the values of the species in other samples (standardization by species). Quantitative data Centering means the subtraction of a mean so that the resulting variable (species) or sample has a mean of zero. Standardization usually means division of each value by the sample (species) norm or by the total of all the values in a sample (species).

Euclidean distance For ED, standardize by sample norm, not by total The samples with t contain values standardized by the total, those with n samples standardized by sample norm. For samples standardized by total, ED12 = 1.41 (√2), whereas ED34=0.82, whereas for samples standardized by sample norm, ED12=ED34=1.41

Percentual similarity (quantitative Sörensen)

Similarity of species based on quantitative data Correlation coefficients (ordinary, rank)

Similarity of samples vs. similarity of communities

expected number of shared species in two subsamples taken randomly from the second sample. 22 Normalized expected shared species =

Similarity matrices - directly used in Multidimensional scaling (both metric and non-metric) Mantel test

Classification

Hierarchical agglomerative (cluster analysis)

Subjective decissions in the objective procedure

Single linkage and complete linkage

Single linkage - > chaining

Order does not play a role

TWINSPAN (Two Way INdicator SPecies ANalysis) Pseudospecies

01 is more similar to 1 than 00