Comparing k Populations Means – One way Analysis of Variance (ANOVA)
Example In this example we are looking at the weight gains (grams) for rats under six diets differing in level of protein (High or Low) and source of protein (Beef, Cereal, or Pork). –Ten test animals for each diet Diets 1.High protein, Beef 2.High protein, Cereal 3.High protein, Pork 4.Low protein, Beef 5.Low protein, Cereal 6.Low protein, Pork
Table Gains in weight (grams) for rats under six diets differing in level of protein (High or Low) and source of protein (Beef, Cereal, or Pork) Level High ProteinLow protein Source Beef Cereal PorkBeefCerealPork Diet Median Mean IQR PSD Variance Std. Dev
High ProteinLow Protein Beef Cereal Pork
Exploratory Conclusions Weight gain is higher for the high protein meat diets Increasing the level of protein - increases weight gain but only if source of protein is a meat source
The differences observed in the diets may due to chance (random variation) or they may be due to actual difference in the diets A confirmatory test of hypothesis will answer this question (with a 5% or 1% margin of error) We need confirmatory tests
One possible solution for comparing k populations Use the two sample t test to compare the means of each pair of populations. The number of tests in the example
The problem with this approach is the build up of the probability of type I error. (declaring a difference when it does not exist) Suppose that each test is performed using = 0.05 This means that each test has a 5% chance of making a type I error. However in a group of tests (15) the chance that a type I error is made could be considerably higher than 5%.
A batter in baseball may have a 5% chance that he hits a home run each time If he comes to bat 15 times the chance that he will hit a home run at least one is actually 53.7% We need a single test that will detect a difference amongst the means. This test is called the F - test
The F test – for comparing k means Situation We have k normal populations Let i and denote the mean and standard deviation of population i. i = 1, 2, 3, … k. Note: we assume that the standard deviation for each population is the same. 1 = 2 = … = k =
We want to test against
The data Assume we have collected data from each of th k populations Let x i1, x i2, x i3, … denote the n i observations from population i. i = 1, 2, 3, … k. Let
The pooled estimate of standard deviation and variance:
Consider the statistic comparing the sample means where
To test against use the test statistic
Computing Formulae
Now Thus
To Compute F: Compute 1) 2) 3) 4) 5)
Then 1) 2) 3)
The sampling distribution of F The sampling distribution of the statistic F when H 0 is true is called the F distribution. The F distribution arises when you form the ratio of two 2 random variables divided by there degrees of freedom.
i.e. if U 1 and U 2 are two independent c 2 random variables with degrees of freedom n 1 and n 2 then the distribution of is called the F-distribution with 1 degrees of freedom in the numerator and 2 degrees of freedom in the denominator
Recall: To test against use the test statistic
We reject if F is the critical point under the F distribution with 1 degrees of freedom in the numerator and 2 degrees of freedom in the denominator
Example In the following example we are comparing weight gains resulting from the following six diets 1.Diet 1 - High Protein, Beef 2.Diet 2 - High Protein, Cereal 3.Diet 3 - High Protein, Pork 4.Diet 4 - Low protein, Beef 5.Diet 5 - Low protein, Cereal 6.Diet 6 - Low protein, Pork
Hence
Thus Thus since F > we reject H 0
The ANOVA Table A convenient method for displaying the calculations for the F-test
Sourced.f.Sum of Squares Mean Square F-ratio Betweenk - 1SS Between MS Between MS B /MS W WithinN - kSS Within MS Within TotalN - 1SS Total Anova Table
Diet Example
Equivalence of the F-test and the t-test when k = 2 the t-test
the F-test
Hence