AP Statistics: ANOVA Section 1
In section 13.1 A, we used a t-test to compare the means between two groups. An ANOVA (ANalysis Of VAriance) test is used to test for a difference in means among several groups.
EXAMPLE: As a young student in Australia, Dominic Kelly enjoyed watching ants gather on pieces of sandwich. Later, as a university student he decided to conduct a formal experiment. He chose three types of sandwich filling: vegemite, peanut butter and ham & pickles. To conduct the experiment he randomly chose a sandwich, broke off a piece and left it on the ground near an anthill. After several minutes, he placed a jar over the sandwich bit and counted the number of ants. He repeated this process until he had 8 samples for each of the eight sandwich fillings.
We wish to determine if the differences in the means are statistically significant. Hypotheses:
The means and the standard deviations for the data above are given at the right. We can see that the sample means are different and that the mean of ham & pickles is quite a bit larger than the others. BUT, is the difference statistically significant.
An assessment of the difference in means between several groups depends upon two kinds of variability: how different the means are from each other AND the amount of variability within each sample.
The basic idea of ANOVA is to split the total variability into these two parts TOTAL variability of all data values(SST) = Variability BETWEEN groups (SSG) + Variability WITHIN groups (SSE) SST: sum of squares total SSG: sum of squares group SSE: sum of squares error
The variability between the groups (SSG) is a good measure of how much the group means vary, but we need to balance that against the background variation within the groups (SSE).
We cannot compare these two pieces of variation directly, however. In our particular example, SSG is computed using only the 3 group means while SSE is computed using all 24 data values (remember, if we assume the H 0 is true, these 24 values are from a population with the same mean).
To put them on comparable scales, we involve degrees of freedom. Degrees of freedom for groups = k – 1 Degrees of freedom for error = n – k Note: df for groups + df for error = Total df
We can now compute the mean square for groups (MSG) and the mean square for error (MSE)
An ANOVA test compares variability within groups to variability between groups. If the ratio of variability between groups to variability within groups is higher than we would expect just by random chance, we have evidence of a difference in means. This ratio is called the F-statistic.
The table at the right is called an analysis of variance table. The information for the sums of squares for the example above is given. Fill in the missing parts of the table.
Note that our test statistic is F =, thus we will need to use the F-distribution to find our p-value. Like the -distribution, the F-distribution is right skewed. When referencing the F distribution, the numerator degrees of freedom are always given first, and the denominator degrees of freedom are given second
We can use the F-distribution to find the p-value when the following conditions are true. 1. The data from each population should follow a Normal distribution. (Watch for clear skewness or outliers if the sample size is small) 2. The variability should be roughly the same within each group. (General Rule: Largest sd not more than twice the smallest sd)
Conditions:
Calculations: TI-83/84: 2 nd VARS (DISTR) 9:Fcdf(lower limit, upper limit, df-numerator, df-denominator)
Conclusion:
Example: Two sets of sample data, A and B, are given. Without doing any calculations, indicate in which set of sample data there is likely to be stronger evidence of a difference in the two population means. Dataset ADataset B Group 1Group 2Group 1Group
Dataset ADataset B Group 1Group 2Group 1Group An ANOVA test compares variability within groups to variability between groups. If the ratio of variability between groups to variability within groups is higher than we would expect just by random chance, we have evidence of a difference in means.
Dataset ADataset B Group 1Group 2Group 1Group