Statistics review Basic concepts: Variability measures Distributions Hypotheses Types of error Common analyses T-tests One-way ANOVA Two-way ANOVA Regression
Variance Ecological rule # 1: Everything varies …but how much does it vary?
Variance S2= Σ (xi – x )2 n-1 x
Variance S2= Σ (xi – x )2 n-1 x
Variance S2= Σ (xi – x )2 n-1 What is the variance of 4, 3, 3, 2 ? 2/3
1. Standard deviation (s, or SD) = Square root (variance) Variance variants 1. Standard deviation (s, or SD) = Square root (variance) Advantage: units
Advantage: indicates reliability Variance variants 2. Standard error (S.E.) = s n Advantage: indicates reliability
How to report We observed 29.7 (+ 5.3) grizzly bears per month (mean + S.E.). A mean (+ SD)of 29.7 (+ 7.4) grizzly bears were seen per month + 1SE or SD - 1SE or SD
Distributions Normal Quantitative data Poisson Count (frequency) data
Normal distribution 67% of data within 1 SD of mean
Poisson distribution mean Mostly, nothing happens (lots of zeros)
Poisson distribution Frequency data Lots of zero (or minimum value) data Variance increases with the mean
What do you do with Poisson data? Correct for correlation between mean and variance by log-transforming y (but log (0) is undefined!!) Use non-parametric statistics (but low power) Use a “general linear model” specifying a Poisson distribution
Hypotheses Null (Ho): no effect of our experimental treatment, “status quo” Alternative (Ha): there is an effect
Whose null hypothesis? Conditions very strict for rejecting Ho, whereas accepting Ho is easy (just a matter of not finding grounds to reject it). A criminal trial? Environmental protection? Industrial viability? Exotic plant species? WTO?
Hypotheses Null (Ho) and alternative (Ha): always mutually exclusive So if Ha is treatment>control…
Types of error Type 1 error Reject Ho Accept Ho Ho true Type 2 error Ho false
Types of error Usually ensure only 5% chance of type 1 error (ie. Alpha =0.05) Ability to minimize type 2 error: called power
Statistics review Basic concepts: Variability measures Distributions Hypotheses Types of error Common analyses T-tests One-way ANOVA Two-way ANOVA Regression
The t-test Asks: do two samples come from different populations? YES NO DATA Ho A B
The t-test Depends on whether the difference between samples is much greater than difference within sample. A B Between >> within… A B
The t-test Depends on whether the difference between samples is much greater than difference within sample. A B Between < within… A B
Difference between means Standard error within each sample The t-test T-statistic= Difference between means Standard error within each sample sp2 + sp2 n1 n2
How many degrees of freedom? The t-test How many degrees of freedom? (n1-1) + (n2-1) sp2 + sp2 n1 n2
T-tables v 0.10 0.05 0.025 1 3.078 6.314 12.706 2 1.886 2.920 4.303 3 1.638 2.353 3.182 4 1.533 2.132 2.776 infinity 1.282 1.645 1.960 Two samples, each n=3, with t-statistic of 2.50: significantly different?
T-tables v 0.10 0.05 0.025 1 3.078 6.314 12.706 2 1.886 2.920 4.303 3 1.638 2.353 3.182 4 1.533 2.132 2.776 infinity 1.282 1.645 1.960 Two samples, each n=3, with t-statistic of 2.50: significantly different? No!
If you have two samples with similar n and S. E If you have two samples with similar n and S.E., why do you know instantly that they are not significantly different if their error bars overlap? v 0.10 0.05 0.025 1 3.078 6.314 12.706 2 1.886 2.920 4.303 3 1.638 2.353 3.182 4 1.533 2.132 2.776 infinity 1.282 1.645 1.960
If you have two samples with similar n and S. E If you have two samples with similar n and S.E., why do you know instantly that they are not significantly different if their error bars overlap? v 0.10 0.05 0.025 1 3.078 6.314 12.706 2 1.886 2.920 4.303 3 1.638 2.353 3.182 4 1.533 2.132 2.776 infinity 1.282 1.645 1.960 } Careful! Doesn’t work the other way around!! the difference in means < 2 x S.E., i.e. t-statistic < 2 and, for any df, t must be > 1.96 to be significant!
General form of the t-test, can have more than 2 samples One-way ANOVA General form of the t-test, can have more than 2 samples Ho: All samples the same… Ha: At least one sample different
General form of the t-test, can have more than 2 samples One-way ANOVA General form of the t-test, can have more than 2 samples A B C DATA A B C Ho Ha A B C A C B
One-way ANOVA Just like t-test, compares differences between samples to differences within samples A B C Difference between means Standard error within sample T-test statistic (t) MS between groups MS within group ANOVA statistic (F)
MS= Sum of squares df Mean squares: Analogous to variance
Variance: S2= Σ (xi – x )2 n-1 Sum of squared differences
} } ANOVA tables df SS MS F p Treatment (between groups) df (X) SSX MSX MSE Look up ! Error (within groups) df (E) SSE Total df (T) SST } }
Do three species of palms differ in growth rate Do three species of palms differ in growth rate? We have 5 observations per species. Complete the table! df SS MS F p Treatment (between groups) 69 Error (within groups) k(n-1) Total 104
Hint: For the total df, remember that we calculate total SS as if there are no groups… MS F p Treatment (between groups) 69 Error (within groups) k(n-1) Total 104
At alpha = 0.05, F2,12 = 3.89 df SS MS F p Treatment (between groups) 2 69 34.5 11.8 ? Error (within groups) 12 35 2.92 Total 14 104
Two-way ANOVA Just like one-way ANOVA, except subdivides the treatment SS into: Treatment 1 Treatment 2 Interaction 1&2
Two-way ANOVA Suppose we wanted to know if moss grows thicker on north or south side of trees, and we look at 10 aspen and 10 fir trees: Aspect (2 levels, so 1 df) Tree species (2 levels, so 1 df) Aspect x species interaction (1df x 1df = 1df) Error? k(n-1) = 4 (10-1) = 36
v df SS MS F Aspect 1 SS(Aspect) MS(Aspect) MS(As) MSE Species SS(Species) MS(Species) MS(Sp) Aspect x Species SS(Int) MS(Int) Error (within groups) 36 SSE Total 39 SST
Interactions Combination of treatments gives non-additive effect North South Alder Fir
Interactions Combination of treatments gives non-additive effect Anything not parallel! North South North South
Careful! If you log-transformed your variables, the absence of interaction is a multiplicative effect: log (a) + log (b) = log (ab) y Log (y) North South North South
Regression Problem: to draw a straight line through the points that best explains the variance
Regression Problem: to draw a straight line through the points that best explains the variance
Regression Problem: to draw a straight line through the points that best explains the variance
Regression Test with F, just like ANOVA: Variance explained by x-variable / df Variance still unexplained / df Variance explained (change in line lengths2) Variance unexplained (residual line lengths2)
Regression Test with F, just like ANOVA: Variance explained by x-variable / df Variance still unexplained / df In regression, each x-variable will normally have 1 df
Regression Test with F, just like ANOVA: Variance explained by x-variable / df Variance still unexplained / df Essentially a cost: benefit analysis – Is the benefit in variance explained worth the cost in using up degrees of freedom?
Regression example Total variance for 32 data points is 300 units. An x-variable is then regressed against the data, accounting for 150 units of variance. What is the R2? What is the F ratio?
Regression example Total variance for 32 data points is 300 units. An x-variable is then regressed against the data, accounting for 150 units of variance. What is the R2? What is the F ratio? R2 = 150/300 = 0.5 F 1,30 = 150/1 = 15 300/30 Why is df error = 30?
ANCOVA In regression, x-variables can be continuous or categorical. Eg. To convert a treatment (size) with two levels (small, large) into a regression variable, we could code small=0, large= 1. Y= size*B + constant Mean value for small Difference between large and small Test significance of “size” in regression
ANCOVA In an Analysis of Covariance, we look at the effect of a treatment (categorical) while accounting for a covariate (continuous) Fertilized N+P Fertilized N Growth rate (g/day) Plant height (cm)
ANCOVA Fit full model (categorical treatment, covariate, interaction) Test for interaction (if significant- stop!) Test for differences in intercept Fertilized N+P Fertilized N Growth rate (g/day) No interaction Intercepts differ Plant height (cm)