Download presentation
Presentation is loading. Please wait.
1
Principal Component Analysis Erek Dyskant Physiological Ecology Swarthmore College
2
The Problem In biology research we tend to isolate one variable's effect on another. However, the goal in ecology is, by definition, to study systems, how many variables contribute to an end result. For example, in Bio 2, we studied how the diameter of gallfly galls predicts larva survival. What about density? Skin thickness? Roundness? If these variables are correlated, we could find one value which captures the commonality between the other three
3
Component analysis Imagine a 3D scatter plot of your data. Rotate the graph until the 2D projection maximizes variance. Take a linear best-fit line of that projection. Strictly speaking, a linear transformation. While this method can only be visualized in 3 dimensions, the logic holds for n dimensions.
4
Interpreting components Separates each factor into a component. The principal component is called PC1. Each successive less important component is called PC2, PC3, etc. In the correlation matrix, the variance for each variable is set to 1.
5
Interpreting components Eigenvalue is the variance of that component, and since the variance of each original variable is set to 1, it tells you how many variables worth of variance each component incorporates. Percent is percentage of variance incorporated by the component. Cum Percent is the cumulative percentage the component and all of the more important components preceding it.
6
How many variable to retain? The purpose of PCA is to reduce the number of variables, however, we don't want to do this at the expense of removing useful information. Each successive factor counts for less variability, so ideally we want to stop extracting variables when there is little random variability left. The decision is basically arbitrary, however two guidelines are commonly used
7
Kaiser Criteria Proposed by Kaiser (1960) Retain just the factors with an eigenvalue over one. Essentially we're saying that we're only interested in a variable if it incorporates at least the variance that was present in one original variable. This criteria is the most commonly used in ecology.
8
Scree Test Proposed by Catell (1996) Plot the eigenvalues as a function of their index. Find the point where the graph appears to level off into a linear descent, and use only factors to the left of that point. Fun fact: Scree is the rocky debris that accumulates under a steep hill.
9
Which criteria to use? Both criteria have been studied in-depth. Theoretically, one can figure out which is best by generating correlated data with random elements and determining which picks out the variance. Both perform well when there are many variables and few factors. Scree tends to lean towards too many factors, whereas Kaiser can result in too few. In practice, it's often decided based upon which model results in the easiest explanation. Kaiser tends to be used more for research experiments, and Scree for data exploration. The Kaiser method is used much more in ecology research.
10
Ecology Example Moe, Stølevik, & Beck (2004) Examined how ducklings prioritize their physiological processes in response to short-term food shortage. Ducklings experience wide variation in food availability in the wild, but in most lab experiments, the animals are given an unlimited food supply. Restricted the duckling's food supply for five day periods, and determined growth rate, overall metabolic rate, muscle development, and organ As part of this experiment, they wanted to see how the growth rate of the ducklings predicted the overall metabolic rate.
11
How to measure growth rate? They measured wing length, tarsus length, and skull length every 5 days for 15 days All of these measurements are likely to be correlated, but how do we make an index for overall growth rate? Using a mean of all the values would give you some overview, however, a mean tends to decrease variation, when we really want to maximize the variation. Moe performed a PCA on all three growth factors, and compared the PC1 (primary component) with the metabolic rate. Found that growth rate is a positive weak predictor of metabolic rate.
12
Second Example Salamon & Davies (2004) Wanted to know if related koalas had sternal secretions (a territorial marking) that were more similar than non-related koalas, and if the composition changed with age. Took sternal secretion samples of koalas, and used gas chromatograph to determine relative concentrations of 12 different chemicals associated with odour. Used PCA on those 12 variables. Using Kaiser method, determined that the first 2 of the factors were significant. While their sample size was too small to rule out random chance, they found that PC1 was associated with age, and PC2 was associated with family vs nonfamily.
13
Other related statistical methods Confirmatory Factor Analysis Used to test a hypothesis about how many factors there are, and what the relationships are within each factor. Used to test experimental results against a model. Frequently used in face recognition algorithms to determine how close a face matches to a reference face. Could potentially be used to test complex ecological models against experimental needs, where one needs to compare 2 or more classes of separate variables. Linear Discriminant Analysis Finds the linear relationship that best represents the difference between two defined classes. Used for when we already hypothesize (or know from ANOVA) that there is a difference between two sets of variables, and we want to model the best-fit linear relationship to them.
14
The end References JMP experimental design manual Statsoft PCA tutorial Salamon & Davis (2004) Moe, Stølevik, & Beck (2004)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.