Topic 24: Two-Way ANOVA
Outline Two-way ANOVA –Data –Cell means model –Parameter estimates –Factor effects model
Two-Way ANOVA The response variable Y is continuous There are two categorical explanatory variables or factors
Data for two-way ANOVA Y is the response variable Factor A with levels i = 1 to a Factor B with levels j = 1 to b Y ijk is the k th observation in cell (i,j) In Chapter 19, we assume equal sample size in each cell (n ij =n)
KNNL Example KNNL p 833 Y is the number of cases of bread sold A is the height of the shelf display, a=3 levels: bottom, middle, top B is the width of the shelf display, b=2 levels: regular, wide n=2 stores for each of the 3x2=6 treatment combinations (n T =12)
Read the data data a1; infile ‘../data/ch19ta07.txt'; input sales height width; proc print data=a1; run;
The data Obs sales height width
Notation For Y ijk we use –i to denote the level of the factor A –j to denote the level of the factor B –k to denote the k th observation in cell (i,j) i = 1,..., a levels of factor A j = 1,..., b levels of factor B k = 1,..., n observations in cell (i,j)
Model We assume that the response variable observations are –Normally distributed With a mean that may depend on the levels of the factors A and B With a constant variance –Independent
Cell Means Model Y ijk = μ ij + ε ijk –where μ ij is the theoretical mean or expected value of all observations in cell (i,j) –the ε ijk are iid N(0, σ 2 ) This means Y ijk ~ N(μ ij, σ 2 ), independent The parameters of the model are – μ ij, for i = 1 to a and j = 1 to b –σ 2
Estimates Estimate μ ij by the mean of the observations in cell (i,j), For each (i,j) combination, we can get an estimate of the variance We need to combine these to get an estimate of σ 2
Pooled estimate of σ 2 In general we pool the s ij 2, using weights proportional to the df, n ij -1 The pooled estimate is s 2 = (Σ (n ij -1)s ij 2 ) / (Σ(n ij -1)) Here, n ij = n, so s 2 = (Σs ij 2 ) / (ab), which is the average sample variance
Run proc glm proc glm data=a1; class height width; model sales= height width height*width; means height width height*width; run;
Output Class Level Information ClassLevelsValues height width21 2 Number of Observations Read12 Number of Observations Used12
Means statement height Level of heightN sales MeanStd Dev
Means statement width Level of widthN sales MeanStd Dev
Means statement ht*w Level of height Level of widthN sales MeanStd Dev
Code the factor levels data a1; set a1; if height eq 1 and width eq 1 then hw='1_BR'; if height eq 1 and width eq 2 then hw='2_BW'; if height eq 2 and width eq 1 then hw='3_MR'; if height eq 2 and width eq 2 then hw='4_MW'; if height eq 3 and width eq 1 then hw='5_TR'; if height eq 3 and width eq 2 then hw='6_TW';
Plot the data symbol1 v=circle i=none; proc gplot data=a1; plot sales*hw/frame; run;
The plot
Put the means in a2 proc means data=a1; var sales; by height width; output out=a2 mean=avsales; proc print data=a2; run;
Output Data Set Obs height width _TYPE_ _FREQ_ avsales
Plot the means symbol1 v=square i=join c=black; symbol2 v=diamond i=join c=black; proc gplot data=a2; plot avsales*height=width/frame; run;
The interaction plot
Questions to consider Does the height of the display affect sales? If yes, compare top with middle, top with bottom, and middle with bottom Does the width of the display affect sales? If yes, compare regular and wide
But wait!!! Are these factor level comparisons meaningful? Does the effect of height on sales depend on the width? Does the effect of width on sales depend on the height? If yes, we have an interaction and we need to do some additional analysis
Factor effects model For the one-way ANOVA model, we wrote μ i = μ + α i Here we use μ ij = μ + α i + β j + (αβ) ij Under “common” formulation –μ (μ.. in KNNL) is the “overall mean” –α i is the main effect of A –β j is the main effect of B –(αβ) ij is the interaction between A and B
Factor effects model μ = (Σ ij μ ij )/(ab) μ i. = (Σ j μ ij )/b and μ.j = (Σ i μ ij )/a α i = μ i. – μ and β j = μ.j - μ (αβ) ij is difference between μ ij and μ + α i + β j (αβ) ij = μ ij - (μ + (μ i. - μ) + (μ.j - μ)) = μ ij – μ i. – μ.j + μ
Interpretation μ ij = μ + α i + β j + (αβ) ij μ is the “overall” mean α i is an adjustment for level i of A β j is an adjustment for level j of B (αβ) ij is an additional adjustment that takes into account both i and j that cannot be explained by the previous adjustments
Constraints for this framework α. = Σ i α i = 0 β. = Σ j β j = 0 (αβ).j = Σ i (αβ) ij = 0 for all j (αβ) i. = Σ j (αβ) ij = 0 for all i
Estimates for factor effects model
SS for ANOVA Table
df for ANOVA Table df A = a-1 df B = b-1 df AB = (a-1)(b-1) df E = ab(n-1) df T = abn-1 = n T -1
MS for ANOVA Table MSA = SSA/df A MSB = SSB/df B MSAB = SSAB/df AB MSE = SSE/df E MST = SST/df T
Hypotheses for two-way ANOVA H 0A : α i = 0 for all i H 1A : α i ≠ 0 for at least one i H 0B : β j = 0 for all j H 1B : β j ≠ 0 for at least one j H 0AB : (αβ) ij = 0 for all (i,j) H 1AB : (αβ) ij ≠ 0 for at least one (i,j)
F statistics H 0A is tested by F A = MSA/MSE; df=df A, df E H 0B is tested by F B = MSB/MSE; df=df B, df E H 0AB is tested by F AB = MSAB/MSE; df=df AB, df E
ANOVA Table Source df SS MS F A a-1 SSA MSA MSA/MSE B b-1 SSB MSB MSB/MSE AB (a-1)(b-1) SSAB MSAB MSAB/MSE Error ab(n-1) SSE MSE _ Total abn-1 SSTO MST
P-values P-values are calculated using the F(dfNumerator, dfDenominator) distributions If P ≤ 0.05 we conclude that the effect being tested is statistically significant
KNNL Example NKNW p 833 Y is the number of cases of bread sold A is the height of the shelf display, a=3 levels: bottom, middle, top B is the width of the shelf display, b=2: regular, wide n=2 stores for each of the 3x2 treatment combinations
PROC GLM proc glm data=a1; class height width; model sales= height width height*width; run;
Output Note that there are 6 cells in this design…(6-1)df for model SourceDF Sum of Squares Mean SquareF ValuePr > F Model Error Corrected Total
Output ANOVA Note Type I and Type III Analyses are the same because n ij is constant SourceDFType III SSMean SquareF ValuePr > F height <.0001 width height*width
Other output R-SquareCoeff VarRoot MSEsales Mean Commonly do not consider R-sq when performing ANOVA…interested more in difference in levels rather than the models predictive ability
Results The main effect of height is statistically significant (F=74.71; df=2,6; P<0.0001) The main effect of width is not statistically significant (F=1.16; df=1,6; P=0.32) The interaction between height and width is not statistically significant (F=1.16; df=2,6; P=0.37)
Interpretation The height of the display affects sales of bread The width of the display has no apparent effect The effect of the height of the display is similar for both the regular and the wide widths
Plot of the means
Additional analyses We will need to do additional analyses to explain the height effect (factor A) There were three levels: bottom, middle and top We could rerun the data with a one- way anova and use the methods we learned in the previous chapters Use means statement with lines
Run Proc GLM proc glm data=a1; class height width; model sales= height width height*width; means height / tukey lines; lsmeans height / adjust=tukey; run;
MEANS Output Alpha0.05 Error Degrees of Freedom6 Error Mean Square Critical Value of Studentized Range Minimum Significant Difference Means with the same letter are not significantly different. Tukey GroupingMeanNheight A B B B
LSMEANS Output heightsales LSMEAN LSMEAN Number Least Squares Means for effect height Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: sales i/j < <.0001
Last slide We went over Chapter 19 We used program topic24.sas to generate the output for today.