IE241: Introduction to Design of Experiments
Last term we talked about testing the difference between two independent means. For means from a normal population, the test statistic is where the denominator is the estimated standard deviation of the difference between two independent means. This denominator represents the random variation to be expected with two different samples. Only if the difference between the sample means is much greater than the expected random variation do we declare the means different.
We also covered the case where the two means are not independent, and what we must do to account for the fact that they are dependent.
And finally, we talked about the difference between two variances, where we used the F ratio. The F distribution is a ratio of two chi- square variables. So if s 2 1 and s 2 2 possess independent chi-square distributions with v 1 and v 2 df, respectively, then has the F distribution with v 1 and v 2 df.
Reference: Chapter 3 All of this is valuable if we are testing only two means. But what if we want to test to see if there is a difference among three means, or four, or ten? What if we want to know whether fertilizer A or fertilizer B or fertilizer C is best? Or what if we want to know if treatment A or treatment B or treatment C or treatment D is best?
Enter the analysis of variance! ANOVA, as it is usually called, is a way to test the differences between means in such situations. Previously, we tested single-factor experiments with only two treatment levels. Now we move to single-factor experiments with more than two treatment levels.
ANOVA begins with a linear statistical model y ij = μ + τ i + ε ij where y ij = the ij th observation where j = 1,2,…,n μ = the grand mean of all N observations τ i = the i th treatment effect where i = 1,2,…,a ε ij = a random error component, assumed to be independent normally distributed random variables with mean = 0 and variance = σ 2, which is constant for all levels of the factor.
For all experiments, randomization is critical. So to draw any conclusions from the experiment, we must require that the treatments be applied in random order. We must also assign the experimental units to the treatments randomly. If all this randomization occurs, the design is called a completely randomized design.
More terminology – Replication of the design refers to using more than one experimental unit for each treatment in the experiment. If there are the same number n replicates for each treatment, the design is said to be balanced.
This model is for a one-way or single-factor ANOVA. The goal of the model is to test hypotheses about the treatment effects and to estimate them. If the treatments have been selected by the experimenter, the model is called a fixed-effects model. In this case, the conclusions will apply only to the treatments under consideration.
Another type of model is the random effects model or components of variance model. In this situation, the treatments used are a random sample from large population of treatments. Here the τ i are random variables and we are interested in their variability, not in the differences among the means being tested.
First, we will talk about fixed effects, completely randomized, balanced models. In the model we showed earlier, the τ i are defined as deviations from the grand mean so where a = the number of levels of the factor being tested. It follows that the mean of the i th treatment is
Now the hypothesis under test is: Ho: μ 1 = μ 2 = μ 3 = … μ a Ha: μ i ≠ μ j for at least one i,j pair The test procedure is ANOVA, which is a decomposition of the total sum of squares into its components parts according to the model.
The total SS is divided into its component parts SS = variability of the differences among the a treatments SS ε = variability of the random error within treatment So SS total = SS treatments + SS error Sometimes this is called SS total = SS between + SS within
Each of these SS terms becomes an MS term when divided by the appropriate df. Now we can find the expected values of each of these terms. E(MS error ) = σ 2 E(MS treatments ) =
Now if there are no differences among the treatment means, then for all i. So we can test for differences with our old friend F with a -1 and N-a df. Under Ho, both numerator and denominator are estimates of σ 2 so the result will not be significant. Under Ha, the result should be significant because the numerator is estimating the treatment effects as well as σ 2.
The results of an ANOVA are presented in an ANOVA table. For this one-way, fixed- effects, balanced model: Source SS df MS p Model SS between a-1 MS between p Error SS within N-a MS within Total SS total N-1
In the tensile strength example in the text, the ANOVA table is In this case, we would reject Ho and declare that there is an effect of the cotton weight percent. Source SS df MS p Model <0.01 Error Total
We can estimate the treatment parameters by subtracting the grand mean from the treatment means. In this example, τ 1 = 9.80 – = τ 2 = – = τ 3 = – = τ 4 = – = τ 5 = – = Clearly, treatment 4 is the best because it provides the greatest tensile strength.
Now you could have computed these values from the raw data yourself instead of doing the ANOVA. You would get the same results, but you wouldn’t know if treatment 4 was significantly better. In fact if you did a scatter diagram of the original data, you would see that treatment 4 was best, with no analysis whatsoever. In fact, you should always look at the original data to see if the results do make sense. A scatter diagram of the raw data usually tells as much as any analysis can.
Designs are more powerful if they are balanced, but balance is not always possible. Suppose you are doing an experiment and the equipment breaks down on one of the tests. Now, not by design but by circumstance, you have unequal replicates for the treatments. The only effect of this is the obvious one in the computations and that is a problem for the computer, not the experimenter. In the ANOVA table, however, the SS entries are not additive.