Step three: statistical analyses to test biological hypotheses General protocol continued.

Slides:



Advertisements
Similar presentations
What we Measure vs. What we Want to Know
Advertisements

Multivariate Description. What Technique? Response variable(s)... Predictors(s) No Predictors(s) Yes... is one distribution summary regression models...
Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition.
Krishna Rajan Data Dimensionality Reduction: Introduction to Principal Component Analysis Case Study: Multivariate Analysis of Chemistry-Property data.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
An Introduction to Multivariate Analysis
Dimension reduction (1)
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Chapter 17 Overview of Multivariate Analysis Methods
Lecture 7: Principal component analysis (PCA)
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Multivariate Methods Pattern Recognition and Hypothesis Testing.
WENDIANN SETHI SPRING 2011 SPSS ADVANCED ANALYSIS.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
19-1 Chapter Nineteen MULTIVARIATE ANALYSIS: An Overview.
New Methods in Ecology Complex statistical tests, and why we should be cautious!
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Goals of Factor Analysis (1) (1)to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Factor Analysis Psy 524 Ainsworth.
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
Learning Objective Chapter 14 Correlation and Regression Analysis CHAPTER fourteen Correlation and Regression Analysis Copyright © 2000 by John Wiley &
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
Chapter 24 Multivariate Statistical Analysis © 2010 South-Western/Cengage Learning. All rights reserved. May not be scanned, copied or duplicated, or posted.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Available at Chapter 13 Multivariate Analysis BCB 702: Biostatistics
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Chapter 13 Multiple Regression
Principal Component Analysis (PCA). Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite)
Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression Rubab G. ARIM, MA University of British Columbia December 2006.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Chapter 12: Correlation and Linear Regression 1.
Module III Multivariate Analysis Techniques- Framework, Factor Analysis, Cluster Analysis and Conjoint Analysis Research Report.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Principal Component Analysis (PCA)
Feature Selection and Extraction Michael J. Watts
Factor Analysis Basics. Why Factor? Combine similar variables into more meaningful factors. Reduce the number of variables dramatically while retaining.
Principal Component Analysis
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Chapter 12: Correlation and Linear Regression 1.
Principal Component Analysis
Unsupervised Learning
Principal Component Analysis (PCA)
Dimension Reduction via PCA (Principal Component Analysis)
CHAPTER fourteen Correlation and Regression Analysis
Descriptive Statistics vs. Factor Analysis
Multivariate Statistics
Principal Components Analysis
Principal Component Analysis (PCA)
Principal Component Analysis
Lecture 8: Factor analysis (FA)
Principal Component Analysis
Factor Analysis.
Unsupervised Learning
Presentation transcript:

Step three: statistical analyses to test biological hypotheses General protocol continued

Biological hypotheses and statistical tests Hypotheses driven by Biology Statistics depend on data and hypotheses NO NEW STATISTICAL TOOLS ARE NEEDED FOR MORPHOMETRICS!! Explanatory hypotheses: relative position of specimens in data space:relationship among specimens in data space Confirmatory hypotheses: compare groups, associate shape with other variables, etc.

Some hypotheses (shape related) How do populations and species differ? Does the observed variation generate a predictable pattern? Are there additional factors (ecological, evolutionary) correlated with variation? How does shared evolutionary history affect the observed patterns?

Hypotheses as statistical tests Do populations differ? Is there a predictable pattern? Correlated factors? Effect of phylogeny? MANOVA, CVA PCA, UPGMA Regression, 2B-PLS Comparative Method

Exploratory data analysis Investigate data using only Y-matrix of shape variables (PWScores + U1,U2) Specimens are points in high-dimensional data space Look for patterns and distributions of points Generate summary plot of data space (ordination) Look for relationships of points (clustering)

Ordination and dimension reduction Visualize high dimensional data space as succinctly as possible Describe variation in original data with new set of variables (typically orthogonal vectors) Order new variables by variation explained (most – least) Plot first few dimensions to summarize data Principal Components Analysis (PCA) one approach (others include: PCoA, MDS, CA, etc.)

PCA: what does it do? Rotates data so that main axis of variation (PC1) is horizontal Subsequent PC axes are orthogonal to PC1, and are ordered to explain sequentially less variation The goal is to explain more variation in fewer dimensions

PCA: interpretations Eigenvectors are linear combinations of original variables (interpreted by PC loadings of each variable) PCA PRESERVES EUCLIDEAN DISTANCES among objects PCA does NOTHING to the data, except rotate it to axes expressing the most variation; it loses NO INFORMATION (if all PC vectors retained) If the original variables are uncorrelated, PCA not helpful in reducing dimensionality of data PCA does not find a particular factor (e.g., group differences, allometry): it identifies the direction of most variation, which may be interpretable as a factor (but may not)

Example: leatherside chub

Clustering Data are dots in a high-dimensional space (Y- matrix) Can we connect to dots for groupings, where clusters represent groups of similar specimens? Cluster methods generate 1-dimensional view of relationships, based on some criterion Clustering requires distance (or similarity) between points MANY different criteria Clustering is algorithmic, not algebraic (i.e., it is a procedure, or set of rules for connecting data)

Clustering: UPGMA

Conclusions: exploratory methods Useful tools for summarizing shape variation Help you understand your data through visualizing variation (both ordination plots and cluster diagrams) Help describe relationships among specimens in terms of overall similarity

Confirmatory data analysis Investigate data using shape variables (Y- matrix) and other (independent) variables (X-matrix) Test for patterns of shape variation Independent variables determine type of statistical test

Types of independent variables Categorical: variables delineating groups of specimens (e.g., male/female, species, etc.) Continuous: variables on a continuous scale (e.g., size, moisture, age, etc.) Different statistical methods for each

Some statistical tests Categorical: shape differences among groups Continuous: relationship of variables and shape Continuous: association of variables and shape MANOVA MANOVA Mult. Regression Mult. Regression 2B-PLS (2-Block Partial Least squares) 2B-PLS (2-Block Partial Least squares) MANOVA and multivariate regression are both GLM statistics (General Linear Models)

Group differences: MANOVA Is there a difference in shape between groups? Multivariate generalization of ANOVA Compares variation within groups to variation between groups Significant MANOVA: Group means are different in shape

RW1-RW30 Utah chub SourceSexLoc Sex X loc IL/SLSize MANOVA Wilks' Lambda Wilks' Lambda Wilks' Lambda Wilks' Lambda <.0001 Wilks' Lambda <.0001

MANOVA: post hoc tests Pairwise comparisons using Generalized Mahalanobis Distance (D 2 or D) Convert D 2 T 2 F to test For experiment-wise error rate, adjust using Bonferroni: α exp = α / # comparisons

Discriminant analysis: CVA & DFA Combination of MANOVA and PCA Tests for group differences (MANOVA) PCA of among-group variation relative to within-group variation Suggests which groups differ on which variables Can classify specimens to groups Special case: 2 groups= discriminant function analysis (DFA)

DFA/CVA: post-hoc tests For DFA/CVA, compare difference among groups using Generalized Mahalanobis Distance (D 2 ) Mahalanobis D 2 is logical choice because CVA/DFA is MANOVA, and the PCA is relative to within-group variability (i.e., VCV standardized) Convert D 2 T 2 F to perform statistical test Experiment-wise error rate adjusted as before (i.e., adjusted α)

Continuous variation: regression Is there a relationship between shape and some other variable? Multivariate regression of shape on continuous variable Significant regression implies shape changes as a function of other variable (e.g., size)

Example of shape on size in mountain sucker Multivariate tests of significance: Statistic Value Fs df1 df2 Prob Wilks' Lambda: E-078 Pillai's trace: E-078 Hotelling-Lawley trace: E-078 Roy's maximum root: E-078 Test that kth root and those that follow are zero: k U Fs df1 df2 Prob E-078

Continuous variation: association 2B-PLS Is there an association between shape and some other set of variables (not causal)? Find pairs of linear combinations for X & Y that maximize the covariation between data sets Linear combinations are constrained to be orthogonal within each set (like PC axes) but NOT between data sets Calculations less complicated for 2B-PLS (because fewer mathematical constraints) Analogous to multivariate correlation 2B-PLS is called SINGULAR WARPS when shape is one or more of the data sets. Bookstein et al., 2003: J. of Hum. Evol.)

Resampling methods Methods that take many samples from original data set in some specified way and evaluate the significance of the original based on these samples Resampling approaches are nonparametric, because they do not depend of theoretical distributions for significance testing (they generate a distribution from the data) Are very flexible, and can allow for complicated designs Very useful in morphometrics, and can be used for: Testing standard designs Testing standard designs Testing non-standard designs Testing non-standard designs Testing when sample sizes small relative to # of variables Testing when sample sizes small relative to # of variables

Randomization (permutation) Proposed by Fisher (1935) for assessing significance of 2-sample comparison (Fishers exact test) Fishers exact test: a total enumeration of possible pairings of data Randomization can be used to determine most any test statistic Protocol Calculate observed statistic (e.g., T-statistic): E obs Calculate observed statistic (e.g., T-statistic): E obs Reorder data set (i.e. randomly shuffle data) and recalculate statistic E rand Reorder data set (i.e. randomly shuffle data) and recalculate statistic E rand Repeat many times to generate distribution of statistic Repeat many times to generate distribution of statistic Percentage of E rand more extreme than E obs is significance level Percentage of E rand more extreme than E obs is significance level

Randomization: comments Randomization EXTREMELY useful and flexible technique How and what to resample depends upon data and hypothesis Regression and correlation: shuffle Y vs. X Regression and correlation: shuffle Y vs. X Group comparison (e.g., ANOVA): shuffle Y on groups Group comparison (e.g., ANOVA): shuffle Y on groups Some tests (e.g., t-test) may depend on direction (1-tailed vs. 2-tailed) Some tests (e.g., t-test) may depend on direction (1-tailed vs. 2-tailed) Also useful when no theoretical distribution exists for statistic, or when design is non-standard This is frequently the case in E&E studies

Step four: Graphical depiction of results Strength of landmark-based TPS approach Can view deformation of TPS grid among groups or with continuous variable

Superimposition

Effect of relative intestinal length: measure of trophic level Long IL/SL 3.0 Short IL/SL 0.72

Effect of gradient on shape in mountain sucker Low High