Download presentation
Presentation is loading. Please wait.
Published byDarren Flynn Modified over 6 years ago
1
Multivariate Data Analysis of Biological Data/Biometrics Ryan McEwan
Department of Biology University of Dayton
2
Biological phenomenon are very often “multivariate” in nature…meaning that many different variables must be considered simultaneously to understand the system
3
A classic example is a species by plot matrix (see below)
4
For example, if you wanted to compare the animal/insect/plant communities between burned and unburned grassland
5
You might install plots like this.
Then measure the plants/animals/microbes, etc, within the plots. Then you want to compare them… But…the data set….
7
You may also be interested in environmental parameters…measured in the same plots…
8
Traditional Statistical Approaches Just wont Do!
9
How to proceed? (a) Display the compositional information
10
(a) Display the compositional information
11
(a) Display the compositional information
Note these are summary values!
12
(a) Display the compositional information (could be environmental data)
13
(a) Display the compositional information (stacked charts could be good)
14
How to proceed? Display the compositional information Multiple Regression
15
Multiple Regression
16
Simple linear regression is a way of understanding the relationship between two variables where the data analyst assumes that one variable (predictor; independent variable) drives a second variable (response; dependent). Extremely useful this is, and yet in most biological situations any given response variable is likely to be determined by more than just a single predictor. In this case, wing length is related to age, but you can imagine that nutritional status or gender could be important as well.
17
Here is aboveground biomass (Y axis) in a forest
and stem density in that forest. You see a relationship, but a messy one. Maybe adding other variables would help Explain AGB. How about soil nitrogen? How about species diversity? How about mean temperature at each point?
19
In biology you may be collecting a slew of values that might serve as predictors for a potential response.
20
Consider a correlation matrix!!
22
Herbaceous cover =
23
You are building a model!!
Herbaceous cover = +
24
Herbaceous cover = + + +
25
Multiple regression is a process of figuring out statistically what suite variables best predict a particular response… …okay how do you proceed? Herbaceous cover = + + +
26
Forward selection: + + Herbaceous cover =
(1) select the variable that forms the best regression relationship with the response variable. (2) Add all of the variables in the pool, in a stepwise fashion, to find the best relationship, throwing back in weaker ones. (3) Repeat step 2 until adding in variables no longer makes a stronger relationship. Herbaceous cover = + +
27
Backward selection: + + Herbaceous cover =
Start with all variables in the model (2) Eject each one and test the relationship (3) Throw back into the pool the variable(s) that weaken, or fail to strengthen the relationship. Herbaceous cover = + +
28
Backward selection: + + + + Herbaceous cover =
Start with all variables in the model (2) Eject each one and test the relationship (3) Throw back into the pool the variable(s) that weaken, or fail to strengthen the relationship. Herbaceous cover = + + + +
29
+ + + + A few more things to cover: How to evaluate models?
What about correlated variables What about categorical variables? Herbaceous cover = + + + +
30
+ + + + + + A few more things to cover: How to evaluate models?
Herbaceous cover = + + + + Herbaceous cover = + +
31
+ + + + + + A few more things to cover: How to evaluate models?
P-value R2 Akaike Information Criterion (AIC) Herbaceous cover = + + + + Herbaceous cover = + +
32
A few more things to cover: How to evaluate models?
P-value R2 Akaike Information Criterion (AIC) AIC is a way of comparing the information content of different models. It does not provide a statistical test, per se, but rather provides a quantitative way to assess model fit vs. model complexity. The best model is the one with the lowest AIC
33
+ + + + A few more things to cover: How to evaluate models?
What about correlated variables What about categorical variables? Herbaceous cover = + + + +
34
+ + + + A few more things to cover: How to evaluate models?
What about correlated variables What about categorical variables? Herbaceous cover = + + + +
35
+ + + + A few more things to cover: How to evaluate models?
What about correlated variables What about categorical variables? Strongly correlated variables effectively contain the same information, thus should not be inserted into the same model. The data analyst needs to assess “muliticollinearity” among the variables in the model. One simple way to think about it = correlation matrix. Formally, a model building procedure generally includes calculation of “Variable Inflation Factors” and ejecting from the model one of two variables that are highly correlated. Herbaceous cover = + + + +
36
+ + + + A few more things to cover: How to evaluate models?
What about correlated variables? What about categorical variables? Multiple regression models CAN incorporate yes/no variables (logistic) or even categorical variables. Herbaceous cover = + + + + H vs. M vs. N invaded Burned vs. UnBurned
37
How to proceed? Display the compositional information Multiple Regression Ordination
38
Ordination
39
Non-metric multidimensional scaling
NMS, or NMDS is the analysis approach du jour partly because analyses are fashionable, and folks are just jumping in line, but mostly because the approach does not require multidimensional normality, and it can handle zeros in the matrix relatively well. As you might imagine, many matricies have a LOT of zeros! There are a few things you need to know: (1) NMS involves a randomization/resampling procedure. (2) The number of axes is set by the process (3) The axis that explains the most variation might not be the 1st axis (4) You CAN assess environmental factors, but that is a wholly unique 2nd step…not the same as CCA (5) Instead of p-values, here you want to minimize stress & instability
43
How to proceed? Display the compositional information Multiple Regression Ordination Cluster Analysis
44
Cluster Analysis
45
Figure 2. Sorenson, UPGMA
46
Figure 5. Correlation, Farthest Neighbor
47
Figure 6. Sorenson, Farthest Neighbor
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.