Multidimensional scaling MDS  G. Quinn, M. Burgman & J. Carey 2003.

Slides:



Advertisements
Similar presentations
Tables, Figures, and Equations
Advertisements

Chapter 18: The Chi-Square Statistic
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
An Introduction to Multivariate Analysis
CHAPTER 24 MRPP (Multi-response Permutation Procedures) and Related Techniques From: McCune, B. & J. B. Grace Analysis of Ecological Communities.
Multivariate analysis of community structure data Colin Bates UBC Bamfield Marine Sciences Centre.
STAT 135 LAB 14 TA: Dongmei Li. Hypothesis Testing Are the results of experimental data due to just random chance? Significance tests try to discover.
Chapter 17 Overview of Multivariate Analysis Methods
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
CHAPTER 22 Reliability of Ordination Results From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach,
Distance Measures and Ordination
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Analysis of Variance Chapter 3Design & Analysis of Experiments 7E 2009 Montgomery 1.
10/17/071 Read: Ch. 15, GSF Comparing Ecological Communities Part Two: Ordination.
Chapter 6 Distance Measures From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Experimental Evaluation
Today Concepts underlying inferential statistics
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
Introduction to the gradient analysis. Community concept (from Mike Austin)
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/09/2015 7:46 PM 1 Two-sample comparisons Underlying principles.
Basic concepts in ordination
Chapter 15 Data Analysis: Testing for Significant Differences.
DIRECT ORDINATION What kind of biological questions can we answer? How can we do it in CANOCO 4.5?
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Swath surveys. Linking Swath to UPC 10 points counts Species, Substrate, Relief Section 1Section 2Section 3Section 4Section 5Section 6.
Nonparametric Statistics aka, distribution-free statistics makes no assumption about the underlying distribution, other than that it is continuous the.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 23/10/2015 9:22 PM 1 Two-sample comparisons Underlying principles.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Examining Relationships in Quantitative Research
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Multivariate Data Analysis  G. Quinn, M. Burgman & J. Carey 2003.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
1 Nonparametric Statistical Techniques Chapter 17.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Summarizing the Relationship Between Two Variables with Tables and a bit of a review Chapters 6 and 7 Jan 31 and Feb 1, 2012.
CHI SQUARE TESTS.
Describing Relationships Using Correlations. 2 More Statistical Notation Correlational analysis requires scores from two variables. X stands for the scores.
Principal Component Analysis (PCA). Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite)
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
ORDINATION What is it? What kind of biological questions can we answer? How can we do it in CANOCO 4.5? Some general advice on how to start analyses.
PCB 3043L - General Ecology Data Analysis.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
SUMMARY EQT 271 MADAM SITI AISYAH ZAKARIA SEMESTER /2015.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Nonparametric statistics. Four levels of measurement Nominal Ordinal Interval Ratio  Nominal: the lowest level  Ordinal  Interval  Ratio: the highest.
1 Nonparametric Statistical Techniques Chapter 18.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
CHAPTER 15: THE NUTS AND BOLTS OF USING STATISTICS.
Analysis of Variance l Chapter 8 l 8.1 One way ANOVA
Research Methodology Lecture No :25 (Hypothesis Testing – Difference in Groups)
Non-Parametric Tests 12/1.
Non-Parametric Tests 12/1.
PCB 3043L - General Ecology Data Analysis.
Non-Parametric Tests 12/6.
Genome Wide Association Studies using SNP
Non-Parametric Tests.
Data Analysis and Interpretation
Multivariate community analysis
Clustering and Multidimensional Scaling
Non-parametric tests, part A:
Multidimensional Scaling
Understanding Statistical Inferences
Chapter 18: The Chi-Square Statistic
Spearman’s Rank For relationship data.
Multivariate analysis of community structure data
Presentation transcript:

Multidimensional scaling MDS  G. Quinn, M. Burgman & J. Carey 2003

Aim Graphical representation of dissimilarities between objects in as few dimensions (axes) as possible

Graphical representation is termed an “ordination” in ecology Axes of graph represent new variables which are summaries of original variables

Approximate distances by air (km) between Australian Capital cities CAN SYD MELB BRIS ADEL PER HOB DAR CAN SYD MELB BRIS ADEL PER HOB DAR

-2012 Dimension Dimension 2 Stress = 0.014

-2012 Dimension Dimension 2 x -1 Darwin Perth Adelaide Hobart Melbourne Canberra Sydney Brisbane

Darwin Perth Adelaide Hobart Melbourne Canberra Sydney Brisbane

Haynes & Quinn (unpublished) Four sites along Morwell River –site 1 upstream from planned sewage outfall –sites 2, 3 and 4 downstream –site 3 below fish farm Abundance of all species of invertebrates recorded from 3 stations at each site

12 objects (sampling units): –4 sites by 3 stations at each site 94 variables (species) Do invertebrate communities (or assemblages) differ between stations and sites? –Is Site 1 different from rest?

Multidimensional scaling 1. Set up a raw data matrix Species12345etc. Site/sample S S S S S S etc.

2. Calculate a dissimilarity (Bray-Curtis) matrix S 11 S 12 S 13 S 21 S 22 S 23 etc. S S S S S S etc.

3.Decide on number of dimensions (axes) for the ordination: –suspected number of underlying ecological gradients –match distances between objects on plot and dissimilarities between objects as closely as possible –more dimensions means better match –usually between 2 and 4 dimensions

4.Arrange objects (eg. sampling units) initially on ordination plot in chosen number of dimensions –starting configuration –usually generated randomly

Starting configuration Axis I Axis II Site 1Site 3Site 2Site 4

5.Compare distances between objects on ordination plot and Bray-Curtis dissimilarities between objects –strength of relationship measured by Kruskal’s stress value –measures “badness of fit” so lower values indicate better match –plot is called Shepard plot

Starting configuration Axis I Axis II Site 1 Site 3 Site 2 Site Dissimilarity Distance Shepard plot Stress = 0.394

6.Move objects on ordination plot iteratively by method of steepest descent –each step improves match between dissimilarities and distances between objects on ordination plot –lowers stress value

Dissimilarity Distance Axis I Axis II After 20 iterations Stress = 0.119

7.Final configuration further moving of objects on ordination plot cannot improve match between dissimilarities and distances stress as low as possible

Dissimilarity Distance Axis II Axis I Final configuration - 50 iterations Stress = 0.069

IterationStress Stress of final configuration is Iteration history

How low should stress be? Clarke (1993) suggests: > 0.20 is basically random < 0.15 is good < 0.10 is ideal –configuration is close to actual dissimilarities

How many dimensions? Increasing no. of dimensions above 4 usually offers little reduction in stress 2 or 3 dimensions usually adequate to get good fit (ie. low stress) 2 dimensions straightforward to plot

Types of MDS Based on how stress is measured Relationship between distance and dissimilarity Dissimilarity Distance

Metric MDS stress measured from relationship between actual dissimilarities and distances but relationship often non-linear inefficient?

Non-metric MDS stress measured from relationship between ranks of dissimilarities and ranks of distances similar to Spearman rank correlation better for ecological data

Anderson et al. (1994) Effects of substratum type on recruitment of intertidal estuarine fouling assemblage Six replicate panels of 4 substrata placed in estuary for 1 month at 2 times of the year 14 species in total recorded

MDS to examine relationship between panel –do substrata appear different in spp composition? Bray-Curtis dissimilarity Non-metric MDS

concrete aluminium plywood fibreglass Stress = 0.126Stress = JanuaryOctober

Comparing groups in MDS Haynes & Quinn data 4 groups (sites) - must be a priori groups 3 replicate stations per site (n = 3) Are sites significantly different in species composition? Is there an ANOVA-like equivalent for MDS?

Analysis of similarities - ANOSIM Uses (dis)similarity matrix Because dissimilarities are not normally distributed, uses ranks of pairwise dissimilarities Because dissimilarities are not independent of each other, uses randomisation test rather than usual significance testing procedure Generates own test statistic (called R) by randomisation of rank dissimilarities Available through PRIMER package –Not SYSTAT nor SPSS

Null hypothesis Average of rank dissimilarities between objects within groups = average of rank dissimilarities between objects between groups r B = r W No difference in species composition between groups

Within group dissimilarities Between group dissimilarities

Test statistic Raverage of rank dissimilarities between objects between groups - average of rank dissimilarities between objects within groups R = (r B - r W ) / (M / 2) where M = n(n-1)/2 R between -1 and +1. Use randomization test to generate probability distribution of R when H 0 is true.

Haynes & Quinn ANOSIM R = 0.583, P = so reject Ho. Significant differences between sites Followed by pairwise ANOSIM comparisons Adjusted significance levels

ANOSIM Available also for 2 level nested and factorial designs. Primer package. Limited to total of 125 objects (e.g. SU’s). If 2 groups, n must be > 4 for randomization procedure. Alternative is to use ANOVA on NMDS axis scores - ANOSIM is better.

Which variables (species) most important? For MDS-type analyses, three methods: –correlate individual variables (species abundances) with axis scores –SIMPER (similarity percentages) to determine which species contribute most to Bray-Curtis dissimilarity –CA and/or CANOCO to simultaneously ordinate objects and species - biplots

SIMPER (similarity percentages)  |y ij - y ik | Bray-Curtis dissimilarity =  y ij + y ik ) Note  is summing over each species, 1 to p. The contribution of species i is: |y ij - y ik |  i =  y ij + y ik )

Which species discriminate groups of objects? Calculate average  i over all pairs of objects between groups –larger values indicate species contribute more to group differences Calculate standard deviation of  i –smaller values indicate species contribution is consistent across all pairs of objects Calculate ratio of  i / SD(  i ) –larger values indicate good discriminating species between 2 groups

Linking biota MDS to environmental variables Are differences between SU’s in species abundances related to differences in environmental variables? Correlate MDS axis scores with environmental variables BIO-ENV procedure - correlates dissimilarities from biota with dissimilarities from environmental variables

BIO-ENV procedure Samples Species abundances Env variables Euclidean Bray-Curtis Subsets of variables Rank correlation - Spearman - Weighted Spearman Dissimilarity matrix

BIO-ENV correlations Exploratory rather than hypothesis testing procedure. Tries to find best combination of environmental variables, ie. combination most correlated with biotic dissimilarities. A priori chosen correlations can be tested with RELATE procedure - randomization test of correlation.

Vector fitting Uses final NMDS configuration rather than dissimilarity matrix - dependent on dimension number. Calculates vector (direction) through configuration of samples along which sample scores have max. correlation with environmental variable (one at a time). Significance testing (Ho: no correlation) done with randomization (Monte-Carlo) test. Available in DECODA and PATN.