Multivariate community analysis

Slides:



Advertisements
Similar presentations
Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.
Advertisements

What we Measure vs. What we Want to Know
Multivariate Description. What Technique? Response variable(s)... Predictors(s) No Predictors(s) Yes... is one distribution summary regression models...
CHAPTER 27 Mantel Test From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Cluster analysis Species Sequence P.symA AATGCCTGACGTGGGAAATCTTTAGGGCTAAGGTTTTTATTTCGTATGCTATGTAGCTTAAGGGTACTGACGGTAG P.xanA AATGCCTGACGTGGGAAATCTTTAGGGCTAAGGTTAATATTCCGTATGCTATGTAGCTTAAGGGTACTGACGGTAG.
An Introduction to Multivariate Analysis
CHAPTER 24 MRPP (Multi-response Permutation Procedures) and Related Techniques From: McCune, B. & J. B. Grace Analysis of Ecological Communities.
Multivariate analysis of community structure data Colin Bates UBC Bamfield Marine Sciences Centre.
PSY 307 – Statistics for the Behavioral Sciences
CHAPTER 22 Reliability of Ordination Results From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach,
Lecture 4 Cluster analysis Species Sequence P.symA AATGCCTGACGTGGGAAATCTTTAGGGCTAAGGTTTTTATTTCGTATGCTATGTAGCTTAAGGGTACTGACGGTAG P.xanA AATGCCTGACGTGGGAAATCTTTAGGGCTAAGGTTAATATTCCGTATGCTATGTAGCTTAAGGGTACTGACGGTAG.
Indicator Species Analysis
Timed. Transects Statistics indicate that overall species Richness varies only as a function of method and that there is no difference between sites.
CHAPTER 18 Weighted Averaging From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Statistical Treatment of Data Significant Figures : number of digits know with certainty + the first in doubt. Rounding off: use the same number of significant.
Diversity and Distribution of Species
Rarefaction and Beta Diversity James A. Danoff-Burg Dept. Ecol., Evol., & Envir. Biol. Columbia University.
Chapter 15 Nonparametric Statistics
Evaluating Performance for Data Mining Techniques
Why is it useful to use multivariate statistical methods for microfacies analysis? A microfacies is a multivariate object: each sample is characterized.
Chapter 6: Random Errors in Chemical Analysis CHE 321: Quantitative Chemical Analysis Dr. Jerome Williams, Ph.D. Saint Leo University.
Multidimensional scaling MDS  G. Quinn, M. Burgman & J. Carey 2003.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
1 Gene Ontology Javier Cabrera. 2 Outline Goal: How to identify biological processes or biochemical pathways that are changed by treatment.Goal: How to.
Classification. Similarity measures Each ordination or classification method is based (explicitely or implicitely) on some similarity measure (Two possible.
Multivariate Data Analysis  G. Quinn, M. Burgman & J. Carey 2003.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
© 2014 by Pearson Higher Education, Inc Upper Saddle River, New Jersey All Rights Reserved HLTH 300 Biostatistics for Public Health Practice, Raul.
PATTERN RECOGNITION : CLUSTERING AND CLASSIFICATION Richard Brereton
PCB 3043L - General Ecology Data Analysis.
Statistics: Unlocking the Power of Data Lock 5 Section 4.2 Measuring Evidence with p-values.
Lecture 6 Ordination Ordination contains a number of techniques to classify data according to predefined standards. The simplest ordination technique is.
Lesson Test to See if Samples Come From Same Population.
Cluster Analysis, an Overview Laurie Heyer. Why Cluster? Data reduction – Analyze representative data points, not the whole dataset Hypothesis generation.
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
Lesson Topic: The Mean Absolute Deviation (MAD) Lesson Objective: I can…  I can calculate the mean absolute deviation (MAD) for a given data set.  I.
Lesson 5 DATA ANALYSIS. Am I using and independent groups design or repeated measures? Independent groups Mann- Whitney U test Repeated measures Wilcoxon.
Outline Sampling Measurement Descriptive Statistics:
I. CHI SQUARE ANALYSIS Statistical tool used to evaluate variation in categorical data Used to determine if variation is significant or instead, due to.
Task 2. Average Nearest Neighborhood
Cases and controls A case is an individual with a disease, whose location can be represented by a point on the map (red dot). In this table we examine.
Research Methods in Psychology PSY 311
Exploring Microarray data
PCB 3043L - General Ecology Data Analysis.
Significance analysis of microarrays (SAM)
Clustering.
What is Rarity?.
Clustering and Multidimensional Scaling
Information Organization: Clustering
Classification (Dis)similarity measures, Resemblance functions
Multivariate Statistical Methods
Multivariate Analysis on Stenella Longirostris Pathology Reports in the Main Hawaiian Islands Haley Boyd.
Data Mining – Chapter 4 Cluster Analysis Part 2
Intervention effects on taxonomic and functional pathway diversity of the intestinal microbiome. Intervention effects on taxonomic and functional pathway.
Register variation: correlation, clusters and factors
Cluster Analysis.
Nonmetric multidimensional scaling (NMDS) plots of fecal and biopsy sample-based Bray-Curtis distances computed from the relative abundances of the 258.
Null modeling approach for quantifying influences of assembly processes and connecting those processes to biogeochemical function. Null modeling approach.
Facts from figures Having obtained the results of an investigation, a scientist is faced with the prospect of trying to interpret them. In some cases the.
Statistical analysis.
Statistics: Analyzing Data and Probability Day 5
Name:________________ Date:_________________ Class Period:___________
Species Diversity.
Fig. 2. Illustration of the procedure for finding spatiotemporal regions correspoding to significant differences between conditions using nonparametric.
Multivariate analysis of community structure data
Cluster Analysis.
Fig. 1 A phylogenetically cohesive core rumen microbiome was found across farms with highly conserved hierarchical structure and tight association to overall.
Surveys How to create one.
Presentation transcript:

Multivariate community analysis

Similarity ANOSIM Cluster analysis Ordination

Similarity Site 1 Site 2 A 12 10 B 8 C 4 D 6 E 5 Presence/absence D 6 E 5 Presence/absence Distance coefficients

Similarity: presence/absence Site 1 Site 2 A 1 B C D E Jaccard = number of species in both = 80% total number of species

Similarity: distance Site 1 Site 2 Site 1-Site 2 (absolute) A 12 10 2 8 C 4 D 6 E 5 1 Total 38 31 13 Bray-Curtis= sum of absolute differences = 13 total abundances (38+31)

Similarity matrix A B C D 0.78 0.67 0.54 0.18 0.21 0.44 All pairwise combinations, excluding repeats and diagonal

ANOSIM (Analysis of similarity) 1. Rank all pairwise combinations of species by their similarity. Therefore rank 1 means the most similar. 2. Divide the pairwise combinations into two types: between groups and within groups. 3. Calculate the mean rank for each type. The smaller the rank, the more similar!

ANOSIM (Analysis of similarity) mean rank between groups - mean rank within groups correction factor for number of combinations

ANOSIM (Analysis of similarity) Same! R = mean rank between groups - mean rank within groups correction factor for number of combinations If no effect of groups expect R=0.

ANOSIM (Analysis of similarity) mean rank between groups - mean rank within groups correction factor for number of combinations If no effect of groups expect R=0. If within groups are more similar than between groups, expect R>0. Big (dissimilar) Small (similar)

ANOSIM (Analysis of similarity) How to test for significance? Randomisation test! In the following data, three groups were composed of 5, 7, and 5 samples and gave an R of 0.264. What is the likelihood of obtaining this R by chance division of the dataset into three “groups” of 5,7 and 5 samples? There are 2450448 possible ways to divide the dataset into 5,7,5 “groups”. Randomly select 999 of these, calculate R.

Null “groups” R Real group R (0.26) 12 out of 999 permutations (1.3%) are greater than 0.26

Global Test Sample statistic (Global R): 0.264 Significance level of sample statistic: 1.3% Number of permutations: 999 (Random sample from 2450448) Number of permuted statistics greater than or equal to Global R: 12 Pairwise Tests R Significance Possible Actual Number >= Groups Statistic Level % Permutations Permutations Observed A, B 0.175 9.7 792 792 77 A, C 0.592 0.8 126 126 1 B, C 0.147 11.5 792 792 91

Cluster analysis -nearest neighbour Similarity matrix A B C D 0.78 0.67 0.54 0.18 0.21 0.44 0.67 Distances are 1- similarity Site A Site C 0.44 0.78 0.54 0.18 Site B 0.21 Site D

Cluster analysis -nearest neighbour Similarity matrix A B C D 0.78 0.67 0.54 0.18 0.21 0.44 Similarity 0.78 1 A B 0.67 Distances are 1- similarity Site A Site C 0.44 0.78 0.54 0.18 Site B 0.21 Site D

Cluster analysis -nearest neighbour Similarity matrix A B C D 0.78 0.67 0.54 0.18 0.21 0.44 Similarity 0.67 1 A B C 0.67 Distances are 1- similarity Site A Site C 0.44 0.78 0.54 0.18 Site B 0.21 Site D

Cluster analysis -nearest neighbour Similarity matrix A B C D 0.78 0.67 0.54 0.18 0.21 0.44 0.44 Similarity 1 A B C D 0.67 Distances are 1- similarity Site A Site C 0.44 0.78 0.54 0.18 Site B 0.21 Site D

Cluster analysis-furthest neighbour Similarity matrix A B C D 0.78 0.67 0.54 0.18 0.21 0.44 0.54 Similarity 1 A B C 0.67 Distances are 1- similarity Site A Site C 0.44 0.78 0.54 0.18 Site B 0.21 Site D

Cluster analysis - average linkage Similarity matrix A B C D 0.78 0.67 0.54 0.18 0.21 0.44 Similarity 0.61 0.61 0.195 1 A B C Distances are 1- similarity Site A Site C 0.61 0.44 0.78 Site B 0.195 Site D

Ordination Site A Site C Site B Site D Plot the most similar sites closest to each other - can be multidimensional