Step three: statistical analyses to test biological hypotheses General protocol continued
Biological hypotheses and statistical tests Hypotheses driven by Biology Statistics depend on data and hypotheses NO NEW STATISTICAL TOOLS ARE NEEDED FOR MORPHOMETRICS!! Explanatory hypotheses: relative position of specimens in data space:relationship among specimens in data space Confirmatory hypotheses: compare groups, associate shape with other variables, etc.
Some hypotheses (shape related) How do populations and species differ? Does the observed variation generate a predictable pattern? Are there additional factors (ecological, evolutionary) correlated with variation? How does shared evolutionary history affect the observed patterns?
Hypotheses as statistical tests Do populations differ? Is there a predictable pattern? Correlated factors? Effect of phylogeny? MANOVA, CVA PCA, UPGMA Regression, 2B-PLS Comparative Method
Exploratory data analysis Investigate data using only Y-matrix of shape variables (PWScores + U1,U2) Specimens are points in high-dimensional data space Look for patterns and distributions of points Generate summary plot of data space (ordination) Look for relationships of points (clustering)
Ordination and dimension reduction Visualize high dimensional data space as succinctly as possible Describe variation in original data with new set of variables (typically orthogonal vectors) Order new variables by variation explained (most – least) Plot first few dimensions to summarize data Principal Components Analysis (PCA) one approach (others include: PCoA, MDS, CA, etc.)
PCA: what does it do? Rotates data so that main axis of variation (PC1) is horizontal Subsequent PC axes are orthogonal to PC1, and are ordered to explain sequentially less variation The goal is to explain more variation in fewer dimensions
PCA: interpretations Eigenvectors are linear combinations of original variables (interpreted by PC loadings of each variable) PCA PRESERVES EUCLIDEAN DISTANCES among objects PCA does NOTHING to the data, except rotate it to axes expressing the most variation; it loses NO INFORMATION (if all PC vectors retained) If the original variables are uncorrelated, PCA not helpful in reducing dimensionality of data PCA does not find a particular factor (e.g., group differences, allometry): it identifies the direction of most variation, which may be interpretable as a factor (but may not)
Example: leatherside chub
Clustering Data are dots in a high-dimensional space (Y- matrix) Can we connect to dots for groupings, where clusters represent groups of similar specimens? Cluster methods generate 1-dimensional view of relationships, based on some criterion Clustering requires distance (or similarity) between points MANY different criteria Clustering is algorithmic, not algebraic (i.e., it is a procedure, or set of rules for connecting data)
Clustering: UPGMA
Conclusions: exploratory methods Useful tools for summarizing shape variation Help you understand your data through visualizing variation (both ordination plots and cluster diagrams) Help describe relationships among specimens in terms of overall similarity
Confirmatory data analysis Investigate data using shape variables (Y- matrix) and other (independent) variables (X-matrix) Test for patterns of shape variation Independent variables determine type of statistical test
Types of independent variables Categorical: variables delineating groups of specimens (e.g., male/female, species, etc.) Continuous: variables on a continuous scale (e.g., size, moisture, age, etc.) Different statistical methods for each
Some statistical tests Categorical: shape differences among groups Continuous: relationship of variables and shape Continuous: association of variables and shape MANOVA MANOVA Mult. Regression Mult. Regression 2B-PLS (2-Block Partial Least squares) 2B-PLS (2-Block Partial Least squares) MANOVA and multivariate regression are both GLM statistics (General Linear Models)
Group differences: MANOVA Is there a difference in shape between groups? Multivariate generalization of ANOVA Compares variation within groups to variation between groups Significant MANOVA: Group means are different in shape
RW1-RW30 Utah chub SourceSexLoc Sex X loc IL/SLSize MANOVA Wilks' Lambda Wilks' Lambda Wilks' Lambda Wilks' Lambda <.0001 Wilks' Lambda <.0001
MANOVA: post hoc tests Pairwise comparisons using Generalized Mahalanobis Distance (D 2 or D) Convert D 2 T 2 F to test For experiment-wise error rate, adjust using Bonferroni: α exp = α / # comparisons
Discriminant analysis: CVA & DFA Combination of MANOVA and PCA Tests for group differences (MANOVA) PCA of among-group variation relative to within-group variation Suggests which groups differ on which variables Can classify specimens to groups Special case: 2 groups= discriminant function analysis (DFA)
DFA/CVA: post-hoc tests For DFA/CVA, compare difference among groups using Generalized Mahalanobis Distance (D 2 ) Mahalanobis D 2 is logical choice because CVA/DFA is MANOVA, and the PCA is relative to within-group variability (i.e., VCV standardized) Convert D 2 T 2 F to perform statistical test Experiment-wise error rate adjusted as before (i.e., adjusted α)
Continuous variation: regression Is there a relationship between shape and some other variable? Multivariate regression of shape on continuous variable Significant regression implies shape changes as a function of other variable (e.g., size)
Example of shape on size in mountain sucker Multivariate tests of significance: Statistic Value Fs df1 df2 Prob Wilks' Lambda: E-078 Pillai's trace: E-078 Hotelling-Lawley trace: E-078 Roy's maximum root: E-078 Test that kth root and those that follow are zero: k U Fs df1 df2 Prob E-078
Continuous variation: association 2B-PLS Is there an association between shape and some other set of variables (not causal)? Find pairs of linear combinations for X & Y that maximize the covariation between data sets Linear combinations are constrained to be orthogonal within each set (like PC axes) but NOT between data sets Calculations less complicated for 2B-PLS (because fewer mathematical constraints) Analogous to multivariate correlation 2B-PLS is called SINGULAR WARPS when shape is one or more of the data sets. Bookstein et al., 2003: J. of Hum. Evol.)
Resampling methods Methods that take many samples from original data set in some specified way and evaluate the significance of the original based on these samples Resampling approaches are nonparametric, because they do not depend of theoretical distributions for significance testing (they generate a distribution from the data) Are very flexible, and can allow for complicated designs Very useful in morphometrics, and can be used for: Testing standard designs Testing standard designs Testing non-standard designs Testing non-standard designs Testing when sample sizes small relative to # of variables Testing when sample sizes small relative to # of variables
Randomization (permutation) Proposed by Fisher (1935) for assessing significance of 2-sample comparison (Fishers exact test) Fishers exact test: a total enumeration of possible pairings of data Randomization can be used to determine most any test statistic Protocol Calculate observed statistic (e.g., T-statistic): E obs Calculate observed statistic (e.g., T-statistic): E obs Reorder data set (i.e. randomly shuffle data) and recalculate statistic E rand Reorder data set (i.e. randomly shuffle data) and recalculate statistic E rand Repeat many times to generate distribution of statistic Repeat many times to generate distribution of statistic Percentage of E rand more extreme than E obs is significance level Percentage of E rand more extreme than E obs is significance level
Randomization: comments Randomization EXTREMELY useful and flexible technique How and what to resample depends upon data and hypothesis Regression and correlation: shuffle Y vs. X Regression and correlation: shuffle Y vs. X Group comparison (e.g., ANOVA): shuffle Y on groups Group comparison (e.g., ANOVA): shuffle Y on groups Some tests (e.g., t-test) may depend on direction (1-tailed vs. 2-tailed) Some tests (e.g., t-test) may depend on direction (1-tailed vs. 2-tailed) Also useful when no theoretical distribution exists for statistic, or when design is non-standard This is frequently the case in E&E studies
Step four: Graphical depiction of results Strength of landmark-based TPS approach Can view deformation of TPS grid among groups or with continuous variable
Superimposition
Effect of relative intestinal length: measure of trophic level Long IL/SL 3.0 Short IL/SL 0.72
Effect of gradient on shape in mountain sucker Low High