Download presentation
Presentation is loading. Please wait.
Published byElijah Fields Modified over 9 years ago
1
Topic 24: Two-Way ANOVA
2
Outline Two-way ANOVA –Data –Cell means model –Parameter estimates –Factor effects model
3
Two-Way ANOVA The response variable Y is continuous There are two categorical explanatory variables or factors
4
Data for two-way ANOVA Y is the response variable Factor A with levels i = 1 to a Factor B with levels j = 1 to b Y ijk is the k th observation in cell (i,j) In Chapter 19, we assume equal sample size in each cell (n ij =n)
5
KNNL Example KNNL p 833 Y is the number of cases of bread sold A is the height of the shelf display, a=3 levels: bottom, middle, top B is the width of the shelf display, b=2 levels: regular, wide n=2 stores for each of the 3x2=6 treatment combinations (n T =12)
6
Read the data data a1; infile ‘../data/ch19ta07.txt'; input sales height width; proc print data=a1; run;
7
The data Obs sales height width 1 47 1 1 2 43 1 1 3 46 1 2 4 40 1 2 5 62 2 1 6 68 2 1 7 67 2 2 8 71 2 2 9 41 3 1 10 39 3 1 11 42 3 2 12 46 3 2
8
Notation For Y ijk we use –i to denote the level of the factor A –j to denote the level of the factor B –k to denote the k th observation in cell (i,j) i = 1,..., a levels of factor A j = 1,..., b levels of factor B k = 1,..., n observations in cell (i,j)
9
Model We assume that the response variable observations are –Normally distributed With a mean that may depend on the levels of the factors A and B With a constant variance –Independent
10
Cell Means Model Y ijk = μ ij + ε ijk –where μ ij is the theoretical mean or expected value of all observations in cell (i,j) –the ε ijk are iid N(0, σ 2 ) This means Y ijk ~ N(μ ij, σ 2 ), independent The parameters of the model are – μ ij, for i = 1 to a and j = 1 to b –σ 2
11
Estimates Estimate μ ij by the mean of the observations in cell (i,j), For each (i,j) combination, we can get an estimate of the variance We need to combine these to get an estimate of σ 2
12
Pooled estimate of σ 2 In general we pool the s ij 2, using weights proportional to the df, n ij -1 The pooled estimate is s 2 = (Σ (n ij -1)s ij 2 ) / (Σ(n ij -1)) Here, n ij = n, so s 2 = (Σs ij 2 ) / (ab), which is the average sample variance
13
Run proc glm proc glm data=a1; class height width; model sales= height width height*width; means height width height*width; run;
14
Output Class Level Information ClassLevelsValues height31 2 3 width21 2 Number of Observations Read12 Number of Observations Used12
15
Means statement height Level of heightN sales MeanStd Dev 1444.00000003.16227766 2467.00000003.74165739 3442.00000002.94392029
16
Means statement width Level of widthN sales MeanStd Dev 1650.000000012.0664825 2652.000000013.4313067
17
Means statement ht*w Level of height Level of widthN sales MeanStd Dev 11245.00000002.82842712 12243.00000004.24264069 21265.00000004.24264069 22269.00000002.82842712 31240.00000001.41421356 32244.00000002.82842712
18
Code the factor levels data a1; set a1; if height eq 1 and width eq 1 then hw='1_BR'; if height eq 1 and width eq 2 then hw='2_BW'; if height eq 2 and width eq 1 then hw='3_MR'; if height eq 2 and width eq 2 then hw='4_MW'; if height eq 3 and width eq 1 then hw='5_TR'; if height eq 3 and width eq 2 then hw='6_TW';
19
Plot the data symbol1 v=circle i=none; proc gplot data=a1; plot sales*hw/frame; run;
20
The plot
21
Put the means in a2 proc means data=a1; var sales; by height width; output out=a2 mean=avsales; proc print data=a2; run;
22
Output Data Set Obs height width _TYPE_ _FREQ_ avsales 1 1 1 0 2 45 2 1 2 0 2 43 3 2 1 0 2 65 4 2 2 0 2 69 5 3 1 0 2 40 6 3 2 0 2 44
23
Plot the means symbol1 v=square i=join c=black; symbol2 v=diamond i=join c=black; proc gplot data=a2; plot avsales*height=width/frame; run;
24
The interaction plot
25
Questions to consider Does the height of the display affect sales? If yes, compare top with middle, top with bottom, and middle with bottom Does the width of the display affect sales? If yes, compare regular and wide
26
But wait!!! Are these factor level comparisons meaningful? Does the effect of height on sales depend on the width? Does the effect of width on sales depend on the height? If yes, we have an interaction and we need to do some additional analysis
27
Factor effects model For the one-way ANOVA model, we wrote μ i = μ + α i Here we use μ ij = μ + α i + β j + (αβ) ij Under “common” formulation –μ (μ.. in KNNL) is the “overall mean” –α i is the main effect of A –β j is the main effect of B –(αβ) ij is the interaction between A and B
28
Factor effects model μ = (Σ ij μ ij )/(ab) μ i. = (Σ j μ ij )/b and μ.j = (Σ i μ ij )/a α i = μ i. – μ and β j = μ.j - μ (αβ) ij is difference between μ ij and μ + α i + β j (αβ) ij = μ ij - (μ + (μ i. - μ) + (μ.j - μ)) = μ ij – μ i. – μ.j + μ
29
Interpretation μ ij = μ + α i + β j + (αβ) ij μ is the “overall” mean α i is an adjustment for level i of A β j is an adjustment for level j of B (αβ) ij is an additional adjustment that takes into account both i and j that cannot be explained by the previous adjustments
30
Constraints for this framework α. = Σ i α i = 0 β. = Σ j β j = 0 (αβ).j = Σ i (αβ) ij = 0 for all j (αβ) i. = Σ j (αβ) ij = 0 for all i
31
Estimates for factor effects model
32
SS for ANOVA Table
33
df for ANOVA Table df A = a-1 df B = b-1 df AB = (a-1)(b-1) df E = ab(n-1) df T = abn-1 = n T -1
34
MS for ANOVA Table MSA = SSA/df A MSB = SSB/df B MSAB = SSAB/df AB MSE = SSE/df E MST = SST/df T
35
Hypotheses for two-way ANOVA H 0A : α i = 0 for all i H 1A : α i ≠ 0 for at least one i H 0B : β j = 0 for all j H 1B : β j ≠ 0 for at least one j H 0AB : (αβ) ij = 0 for all (i,j) H 1AB : (αβ) ij ≠ 0 for at least one (i,j)
36
F statistics H 0A is tested by F A = MSA/MSE; df=df A, df E H 0B is tested by F B = MSB/MSE; df=df B, df E H 0AB is tested by F AB = MSAB/MSE; df=df AB, df E
37
ANOVA Table Source df SS MS F A a-1 SSA MSA MSA/MSE B b-1 SSB MSB MSB/MSE AB (a-1)(b-1) SSAB MSAB MSAB/MSE Error ab(n-1) SSE MSE _ Total abn-1 SSTO MST
38
P-values P-values are calculated using the F(dfNumerator, dfDenominator) distributions If P ≤ 0.05 we conclude that the effect being tested is statistically significant
39
KNNL Example NKNW p 833 Y is the number of cases of bread sold A is the height of the shelf display, a=3 levels: bottom, middle, top B is the width of the shelf display, b=2: regular, wide n=2 stores for each of the 3x2 treatment combinations
40
PROC GLM proc glm data=a1; class height width; model sales= height width height*width; run;
41
Output Note that there are 6 cells in this design…(6-1)df for model SourceDF Sum of Squares Mean SquareF ValuePr > F Model51580.0000316.00000030.580.0003 Error662.00000010.333333 Corrected Total111642.0000
42
Output ANOVA Note Type I and Type III Analyses are the same because n ij is constant SourceDFType III SSMean SquareF ValuePr > F height21544.00000772.00000074.71<.0001 width112.000000 1.160.3226 height*width224.00000012.0000001.160.3747
43
Other output R-SquareCoeff VarRoot MSEsales Mean 0.9622416.3030403.21455051.00000 Commonly do not consider R-sq when performing ANOVA…interested more in difference in levels rather than the models predictive ability
44
Results The main effect of height is statistically significant (F=74.71; df=2,6; P<0.0001) The main effect of width is not statistically significant (F=1.16; df=1,6; P=0.32) The interaction between height and width is not statistically significant (F=1.16; df=2,6; P=0.37)
45
Interpretation The height of the display affects sales of bread The width of the display has no apparent effect The effect of the height of the display is similar for both the regular and the wide widths
46
Plot of the means
47
Additional analyses We will need to do additional analyses to explain the height effect (factor A) There were three levels: bottom, middle and top We could rerun the data with a one- way anova and use the methods we learned in the previous chapters Use means statement with lines
48
Run Proc GLM proc glm data=a1; class height width; model sales= height width height*width; means height / tukey lines; lsmeans height / adjust=tukey; run;
49
MEANS Output Alpha0.05 Error Degrees of Freedom6 Error Mean Square10.33333 Critical Value of Studentized Range4.33920 Minimum Significant Difference6.9743 Means with the same letter are not significantly different. Tukey GroupingMeanNheight A67.00042 B44.00041 B B42.00043
50
LSMEANS Output heightsales LSMEAN LSMEAN Number 144.00000001 267.00000002 342.00000003 Least Squares Means for effect height Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: sales i/j123 10.00010.6714 20.0001<.0001 30.6714<.0001
51
Last slide We went over Chapter 19 We used program topic24.sas to generate the output for today.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.