Topic 29: Three-Way ANOVA
Outline Three-way ANOVA –Data –Model –Inference
Data for three-way ANOVA Y, the response variable Factor A with levels i = 1 to a Factor B with levels j = 1 to b Factor C with levels k = 1 to c Y ijkl is the l th observation in cell (i,j,k), l = 1 to n ijk A balanced design has n ijk =n
KNNL Example KNNL p 1005 Y is exercise tolerance, minutes until fatigue on a bicycle test A is gender, a=2 levels: male, female B is percent body fat, b=2 levels: high, low C is smoking history, c=2 levels: light, heavy n=3 persons aged per (i,j,k) cell
Read and check the data data a1; infile 'c:\...\CH24TA04.txt'; input extol gender fat smoke; proc print data=a1; run;
Obs extol gender fat smoke
Define variable for a plot data a1; set a1; if (gender eq 1)*(fat eq 1)*(smoke eq 1) then gfs='1_Mfs'; if (gender eq 1)*(fat eq 2)*(smoke eq 1) then gfs='2_MFs'; if (gender eq 1)*(fat eq 1)*(smoke eq 2) then gfs='3_MfS'; if (gender eq 1)*(fat eq 2)*(smoke eq 2) then gfs='4_MFS'; if (gender eq 2)*(fat eq 1)*(smoke eq 1) then gfs='5_Ffs'; if (gender eq 2)*(fat eq 2)*(smoke eq 1) then gfs='6_FFs'; if (gender eq 2)*(fat eq 1)*(smoke eq 2) then gfs='7_FfS'; if (gender eq 2)*(fat eq 2)*(smoke eq 2) then gfs='8_FFS'; run;
Obs extol gender fat smoke gfs _Mfs _Mfs _Mfs _MfS _MfS _MfS _MFs _MFs _MFs _MFS _MFS _MFS
Plot the data title1 'Plot of the data'; symbol1 v=circle i=none c=black; proc gplot data=a1; plot extol*gfs/frame; run;
Find the means proc sort data=a1; by gender fat smoke; proc means data=a1; output out=a2 mean=avextol; by gender fat smoke;
Define fat*smoke data a2; set a2; if (fat eq 1)*(smoke eq 1) then fs='1_fs'; if (fat eq 1)*(smoke eq 2) then fs='2_fS'; if (fat eq 2)*(smoke eq 1) then fs='3_Fs'; if (fat eq 2)*(smoke eq 2) then fs='4_FS';
Obs gen fat smoke FR avextol fs _fs _fs _fS _fS _Fs _Fs _FS _FS
Plot the means proc sort data=a2; by fs; title1 'Plot of the means'; symbol1 v='M' i=join c=black; symbol2 v='F' i=join c=black; proc gplot data=a2; plot avextol*fs=gender/frame; run;
Cell means model Y ijkl = μ ijk + ε ijkl –where μ ijk is the theoretical mean or expected value of all observations in cell (i,j,k) –the ε ijkl are iid N(0, σ 2 ) –Y ijkl ~ N(μ ijk, σ 2 ), independent
Estimates Estimate μ ijk by the mean of the observations in cell (i,j,k), = (Σ k Y ijkl )/n ijk For each (i,j,k) combination, we can get an estimate of the variance We need to combine these to get an estimate of σ 2
Pooled estimate of σ 2 We pool the s ijk 2, giving weights proportional to the df, n ijk -1 The pooled estimate is MSE=s 2 = (Σ (n ijk -1)s ijk 2 ) / (Σ(n ijk -1))
Factor effects model Model cell mean as μ ijk = μ + α i + β j + γ k + (αβ) ij + (αγ) ik + (βγ) jk + (αβγ) ijk μ is the overall mean α i, β j, γ k are the main effects of A, B, and C (αβ) ij, (αγ) ik, and (βγ) jk are the two-way interactions (first-order interactions) (αβγ) ijk is the three-way interaction (second-order interaction) Extension of the usual constraints apply
ANOVA table Sources of model variation are the three main effects, the three two-way interactions, and the one three-way interaction With balanced data the SS and DF add to the model SS and DF Still have Model + Error = Total Each effect is tested by an F statistic with MSE in the denominator
Run proc glm proc glm data=a1; class gender fat smoke; model extol=gender fat smoke gender*fat gender*smoke fat*smoke gender*fat*smoke; means gender*fat*smoke; run;
Run proc glm proc glm data=a1; class gender fat smoke; model extol=gender|fat|smoke; means gender*fat*smoke; run; Shorthand way to express model
SAS Parameter Estimates Solution option on the model statement gives parameter estimates for the glm parameterization These are as we have seen before; any main effect or interaction with a subscript of a, b, or c is zero These reproduce the cell means in the usual way
ANOVA Table SourceDF Sum of SquaresMean SquareF ValuePr > F Model Error Corrected Total Type I and III SS the same here
Factor effects output Source DF Type I SSMean SquareF ValuePr > F gender fat gender*fat smoke gender*smoke fat*smoke gender*fat*smoke
Analytical Strategy First examine interactions…highest order to lowest order Some options when one or more interactions are significant –Interpret the plot of means –Run analyses for each level of one factor, eg run A*B by C (lsmeans with slice option) –Run as a one-way with abc levels –Define a composite factor by combining two factors, eg AB with ab levels –Use contrasts
Analytical Strategy Some options when no interactions are significant –Use a multiple comparison procedure for the main effects –Use contrasts –When needed, rerun without the interactions
Example Interpretation Since there appears to be a fat by smoke interaction, let’s run a two-way ANOVA (no interaction) using the fat*smoke variable Note that we could also use the interaction plot to describe the interaction
Run glm proc glm data=a1; class gender fs; model extol=gender fs; means gender fs/tukey; run;
ANOVA Table SourceDF Sum of SquaresMean SquareF ValuePr > F Model <.0001 Error Corrected Total
Factor effects output SourceDFType I SS Mean SquareF ValuePr > F gender fs <.0001 Both are significant as expected…compare means
Means for gender Mean N gender A B
Tukey comparisons for fs Mean N fs A _fs B _fS B B _FS B B _Fs
Conclusions Gender difference with males having a roughly 5.5 minute higher exercise tolerance – beneficial to add CI here There was a smoking history by body fat level interaction where those who were low body fat and had a light smoking history had a significantly higher exercise tolerance than the other three groups
Last slide Read NKNW Chapter 24 We used program topic29.sas to generate the output for today