Lecture 7: Single classification analysis of variance (ANOVA) When to use ANOVA ANOVA models and partitioning sums of squares ANOVA: hypothesis testing ANOVA: assumptions A non-parametric alternative: Kruskal-Wallis ANOVA Power analysis in single classification ANOVA Bio 4118 Applied Biostatistics 2001
When to use ANOVA Tests for effect of “discrete” independent variables. Each independent variable is called a factor, and each factor may have two or more levels or treatments (e.g. crop yields with nitrogen (N) or nitrogen and phosphorous (N + P) added). ANOVA tests whether all group means are the same. Use when number of levels (groups) is greater than two. Control Experimental (N) Experimental (N+P) Frequency Yield mC mN mN+P Bio 4118 Applied Biostatistics 2001
Why not use multiple 2-sample tests? Yield mC mN mN+P Control Experimental (N) Experimental (N+P) mc:mN mN:mN+P mC: mN+P Frequency For k comparisons, the probability of accepting a true H0 for all k is (1 - a)k. For 4 means, (1 - a)k = (0.95)6 = .735. So a (for all comparisons) = 0.265. So, when comparing the means of four samples from the same population, we would expect to detect significant differences among at least one pair 27% of the time. Bio 4118 Applied Biostatistics 2001
What ANOVA does/doesn’t do Yield Frequency mC mN mN+P Tells us whether all group means are equal (at a specified a level)... ...but if we reject H0, the ANOVA does not tell us which pairs of means are different from one another. Control Experimental (N) Experimental (N+ P) Bio 4118 Applied Biostatistics 2001
Model I ANOVA: effects of temperature on trout growth 3 treatments determined (set) by investigator. Dependent variable is growth rate (l), factor (X) is temperature. Since X is controlled, we can estimate the effect of a unit increase in X (temperature) on l (the effect size)... …and can predict l at other temperatures. Water temperature (°C) 16 20 24 28 0.00 0.04 0.08 0.12 0.16 0.20 Growth rate l (cm/day) Bio 4118 Applied Biostatistics 2001
Model II ANOVA: geographical variation in body size of black bears 3 locations (groups) sampled from set of possible locations. Dependent variable is body size, factor (X) is location. Even if locations differ, we have no idea what factors are controlling this variability... …so we cannot predict body size at other locations. Body size (kg) 120 160 200 240 280 Riding Mountain Kluane Algonquin Bio 4118 Applied Biostatistics 2001
Model differences In Model I, the putative causal factor(s) can be manipulated by the experimenter, whereas in Model II they cannot. In Model I, we can estimate the magnitude of treatment effects and make predictions, whereas in Model II we can do neither. In one-way (but NOT multi-way!) ANOVA, calculations are identical for both models. Bio 4118 Applied Biostatistics 2001
How is it done? And why call it ANOVA? In ANOVA, the total variance in the dependent variable is partitioned into two components: among-groups: variance of means of different groups (treatments) within-groups (error): variance of individual observations within groups around the mean of the group Bio 4118 Applied Biostatistics 2001
Statistical analysis as model building All statistical analyses begin with a mathematical model that supposedly “describes” the data, e.g., regression, ANOVA. “Model fitting” is then the process by which model parameters are estimated. X Linear regression ANOVA e42 m2 a2 Y m Group 1 Group 2 Group 3 Bio 4118 Applied Biostatistics 2001
Least squares estimation (LSE) SSR An ordinary least squares (OLS) estimate of a model parameter Q is that which minimizes the sum of squared differences between observed and predicted values: OLS Q Predicted values are derived from some model whose parameters we wish to estimate Bio 4118 Applied Biostatistics 2001
Example: LSE of model parameters in simple linear regression Data consists of a set of n paired observations (x1, y1), …, (xnyn) The “model” for the I th observation is: What is the LSE of the model parameters a and b? Y ei X Residual: Bio 4118 Applied Biostatistics 2001
The general ANOVA model Y m2 a2 m e42 The general model is: ANOVA algorithms fit the above model (by ordinary least squares) to estimate the ai’s. H0: all ai’s = 0 m =m1 = m2 = m3 Y m Group 1 Group 2 Group 3 a1 =a2 =a3 = 0 Group Bio 4118 Applied Biostatistics 2001
Partitioning the total sums of squares Y m m3 m1 Total SS Model (Groups) SS Error SS Group 1 Group 2 Group 3 Bio 4118 Applied Biostatistics 2001
The ANOVA table Source of Variation Sum of Squares Degrees of freedom (df) Mean Square F Total n - 1 SS/df MSgroups MSerror Groups k - 1 SS/df Error n - k SS/df Bio 4118 Applied Biostatistics 2001
Variance components and group means Yield Frequency mC mN mN+P MSgroups measures average squared difference among group means. MSerror is a measure of precision. Control Experimental (N) Experimental (N+ P) Bio 4118 Applied Biostatistics 2001
ANOVA: the null hypothesis Yield Frequency mC mN mN+P H0: all group means are the same, or... H0: all group effects (ai) are zero, or... H0: F = MSgroups/ MSerror = 1 For k groups and N observations, compare with F distribution at desired a level with k - 1 and N - k degrees of freedom. Control Experimental (N) Experimental (N+ P) Bio 4118 Applied Biostatistics 2001
Lab example: temporal variation in size of sturgeon (Model II ANOVA) Prediction: dam construction resulted in loss of large sturgeon Test: compare sturgeon size before and after dam construction H0: mean size is the same for all years (?) 1954 1958 1965 1966 YEAR 35.0 38.8 42.6 46.4 50.2 54.0 FKLNGTH Dam construction Bio 4118 Applied Biostatistics 2001
Temporal variation in size of sturgeon (ANOVA results) Conclusion: reject H0 Bio 4118 Applied Biostatistics 2001
ANOVA assumptions Residuals are independent of one another. Residuals are normally distributed. Variance of residuals within groups is the same for all groups (homoscedasticity). Note: all assumptions apply to the residuals, not the raw data. Since all assumptions apply to the residuals, not the raw data… …all tests of assumptions are done after the analysis is completed (and residuals have been generated). Bio 4118 Applied Biostatistics 2001
The general ANOVA model Y m2 a2 m e42 The general model is: … so the predicted value of all observations in the ith group is The difference between the predicted value for an observation and the observed value is its residual. Group 1 Group 2 Group 3 m =m1 = m2 = m3 Y m a1 =a2 =a3 = 0 Group Bio 4118 Applied Biostatistics 2001
Residual independence Lack of residual independence usually arises because observations are correlated in time or space E.g. measures of phosphorous concentrations upstream and downstream of a point source Upstream site Downstream site Bio 4118 Applied Biostatistics 2001
What are degrees of freedom? The degrees of freedom for any estimated quantity (e.g. the mean, variance, etc.) is the total sample minus one… … because if you know the quantity (e.g. the mean) and the values of all n-1 observations, you know the value of the nth observation. The degrees of freedom for any statistical model is the total sample size minus the number of parameters Bio 4118 Applied Biostatistics 2001
Why do degrees of freedom matter? Larger df The distribution of the test statistic is different for different degrees of freedom. Therefore, depending on the degrees of freedom, the same difference in sample means will give different p values. Smaller df Probability -3 -2 -1 1 2 3 t Bio 4118 Applied Biostatistics 2001
Why does observations need to be independent? If observations are not independent, then the true degrees of freedom is less (sometimes much less) than the calculated degrees of freedom … … the distribution used to calculate p will be wrong … … and p will be smaller than it ought to be. calcuated df true df Probability Calculated t -3 -2 -1 1 2 3 t Bio 4118 Applied Biostatistics 2001
Checking independence of observations (residuals) Does the experimental design suggest that sampling units may not be independent (e.g. spatiotemporal correlation?) Calculate intraclass R correlation Do autocorrelation plots to check for serial autocorrelation correlation. Bio 4118 Applied Biostatistics 2001
Testing independence of residuals: ACF plots Sort residuals by estimate and run ACF Do any correlations fall outside the 95% confidence intervals? Bio 4118 Applied Biostatistics 2001
Testing normality of residuals -20 -10 10 20 30 RESIDUAL -3 -2 -1 1 2 3 Expected Value for Normal Distribution Outliers? Generate normal probability plot of residuals and check for linearity. If warranted, run Lilliefors test, keeping in mind the power issue! Bio 4118 Applied Biostatistics 2001
Testing homoscedasticity I: plotting residuals against estimates Does “spread” of residuals appear the same for each group? 42 43 44 45 46 47 48 49 ESTIMATE -3 -2 -1 1 2 3 4 5 6 STUDENT Outlier? Bio 4118 Applied Biostatistics 2001
Testing homoscedasticity II: Levene’s test Least Squares Means 1954 1958 1965 1966 YEAR 2 4 6 ABSRES Calculate mean absolute residual for each group. Does this value vary among groups? Bio 4118 Applied Biostatistics 2001
Testing homoscedasticity II: Levene’s test (cont’d) Bio 4118 Applied Biostatistics 2001
Effects of violations of assumptions 4 1 2 3 5 0.0 0.2 0.4 0.6 0.8 1.0 Probability Calculation of p assumes p(F) = p(F*) … but as residuals conform less to required assumptions, the deviation between the two increases. Therefore, calculated p values are incorrect. F, low conformity F, high conformity True F (F*) Bio 4118 Applied Biostatistics 2001
Robustness of ANOVA with respect to violation of assumptions Bio 4118 Applied Biostatistics 2001
Residual analysis: questions Which assumptions are not met, and how robust is ANOVA to their violation? What is the sample size? Is the violation of assumptions due to a couple of outliers? How close is p to a? Eliminate outliers and rerun analysis. Transform data. Try a non-parametric alternative (generally recommended if sample sizes are small, i.e. < 10 per group) such as Kruskal-Wallis ANOVA. Bio 4118 Applied Biostatistics 2001
A non-parametric alternative: Kruskal-Wallis ANOVA Calculate rank sum (Rg) for each group. H0: RC = R1 = R2 Calculate K-W H statistic: … which is distributed as c2 with k-1 df if N for each group is not too small, otherwise use critical values for H. A non-parametric alternative: Kruskal-Wallis ANOVA Bio 4118 Applied Biostatistics 2001
Power and sample size in single-classification ANOVA If H0 is true, then variance ratio MSgroups/MSerror follows central F distribution. But, if H0 is false, then MSgroups/MSerror follows non-central F, defined by n1, n2 and non-centrality parameter f. So, power calculations depend on non-central F. Frequency Yield mC mN mN+P Control Experimental (N) Experimental (N+P) Bio 4118 Applied Biostatistics 2001
Power and sample size in single-classification ANOVA Power of a test involving k groups with n replicates per group at specified a when (1) group means are known; (2) minimal detectable distance is specified. estimation of minimum sample size and minimal detectable difference among groups Frequency Yield mC mN mN+P Control Experimental (N) Experimental (N+P) Bio 4118 Applied Biostatistics 2001
Power and sample size in single-classification ANOVA ANOVA with k groups with n replicates per group at specified a . If we have an estimate of the within-group variability s2 (MSerror), we can calculate f: Frequency Yield mC mN mN+P Control Experimental (N) Experimental (N+P) Bio 4118 Applied Biostatistics 2001
Calculating power given f 1-b Decreasing n2 n1 = 2 a = .05 2 3 4 5 a = .01 1 1.5 2.5 Given n1 ,n2, a and f, we can read 1-b from suitable tables or curves (e.g. Zar (1996), Appendix Figure B.1). f(a = .01) f(a = .05) Bio 4118 Applied Biostatistics 2001
Model I ANOVA: minimal detectable difference Frequency mC mN mN+P Control Experimental (N) Experimental (N+P) d Model I ANOVA: minimal detectable difference Suppose we want to detect a difference between the two most different sample means of at least d. To test at the a significance level with 1 - b power, we can calculate the minimal sample size nmin required to detect d, given a sample group variance s2 by solving iteratively. Bio 4118 Applied Biostatistics 2001
Model I ANOVA: power of the test Control Experimental (N) Experimental (N+P) Frequency Yield mC mN mN+P Model I ANOVA: power of the test If H0 is accepted, it is good practice to calculate power! Knowing MSgroups , s2 (= MSerror), and k, we can calculate f. Bio 4118 Applied Biostatistics 2001
Power of the test: an example Effect of temperature on insect development time 4 eggs each at two temperatures, 5 at the third (k = 3, n1 = n2 = 4, n3 = 5) So, there is a 67% chance of committing a Type II error. Bio 4118 Applied Biostatistics 2001
Factors determining power in single classification ANOVA Power increases with increasing f. Therefore, power increases with (1) increasing sample size n; (2) increasing differences among group means (MSgroups); (3) decreasing number of groups; (4) decreasing within-group variability s2 (MSerror). Bio 4118 Applied Biostatistics 2001
Power in single-classification Model II ANOVA 120 160 200 240 280 Riding Mountain Kluane Algonquin In this case, we can calculate 1- b from central F: Knowing n1, n2, a and MSgroups, we can estimate 1 - b. Body size (kg) Bio 4118 Applied Biostatistics 2001
Power in non-parametric single-classification ANOVA If assumptions of parametric ANOVA are met, then non-parametric ANOVA is 3/p = 95% as powerful. If non-parametric ANOVA is used, calculate power for parametric ANOVA to get a rough estimate of power of non-parametric test. Bio 4118 Applied Biostatistics 2001