Xuhua Xia Fitting Several Regression Lines Many applications of statistical analysis involves a continuous variable as dependent variable (DV) but both continuous and categorical variables as independent variables (IV). –Relationship between DV and continuous IVs is linear and the slope remains the same in different groups: ANCOVA. –Different slopes: Full model. An illustrative data set will make this clear.
Xuhua Xia Fitting Several Regression Lines The muscle strength (MS) depends on the diameter of the muscle fiber and the type of muscle (TM). Identify DV and IV. How do we incorporate the qualitative variable in to the model? The dummy variables. TMDMS A111.5 A213.8 A314.4 A416.8 A518.7 B110.8 B212.3 B313.7 B414.2 B516.6 C113.1 C216.2 C319.0 C422.9 C526.5
Xuhua Xia Two Scenarios Same intercept Different intercepts Different slopes: full model Same slope: ANCOVA
Xuhua Xia Two Scenarios Same intercept Different intercepts Different slopes Same slope Y 1 = a + b 1 X Y 2 = a + b 2 X Y 1 - Y 2 = (b 1 -b 2 )X Y 1 = a 1 + b X Y 2 = a 2 + b X Y 1 - Y 2 = a 1 -a 2 Multiplicative effect Additive effect
Xuhua Xia Plot of MS vs D by TM
Objectives Obtain regression equations relating MS to D for each TM. Compare the mean MS for the three TMs at a given level of D. Is it meaningful to compare the mean MS for the three TMs without specifying the level of D?
Xuhua Xia Explaining the R functions Every 'factor' variable (TM in our case) used in lm model-fitting creates k-1 dummy variable: DUMA = 0 (not created) DUMB = 1 if TM=B = 0 otherwise DUMC = 1 if TM=C = 0 otherwise MS = + 1 DUMB + 2 DUMC + 3 D + 4 DUMB*D + 5 DUMC*D + The solution option prints estimates of the model coefficients.
Xuhua Xia Illustration with EXCEL MS TM DDUMBDUMCDUMB*DDUMC*D 11.5A A A A A B B B B B C C C C C5 0105
R functions md <- read.table("DiffSlopeMuscle.txt",header=T) attach(md) minX<-min(D) maxX<-max(D) minY<-min(MS) maxY<-max(MS) plot(D[TM=="A"],MS[TM=="A"],xlab="D",ylab="MS",xlim=c(minX,maxX),ylim=c(minY,m axY),pch=16) points(D[TM == "B"], MS[TM == "B"], col='red',pch=16) points(D[TM=="C"], MS[TM == "C"], col='blue',pch=16) # Will ANOVA reveal the difference between the three teachers? fitANOVA<-aov(D~TM);anova(fitANOVA) # No significant difference in D, so students at the beginning appears # to be similar. Given the same-quality students to begin with, which # teacher will produce high-performing students at the end? fitANOVA<-aov(MS~TM);anova(fitANOVA) # Check the plot for slope heterogeneity # Explicit test of slope heterogeneity fit<-lm(MS~D*TM) anova(fit) # Check for significance: if not significant, then do ANCOVA fit<-lm(MS~D+TM) anova(fit)
R Output > anova(fit) Analysis of Variance Table Response: MS Df Sum Sq Mean Sq F value Pr(>F) D e-10 TM e-08 D:TM e-06 Residuals > summary(fit) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) e-09 D e-07 TMB TMC D:TMB D:TMC e-05 highly significant interaction. MS= D-0.35B-0.33C-0.39D*B+1.61D*C A: MS = *D B: MS = D D = *D C: MS = D-0.33C+1.61D = *D It might help to show regression with dummy variables in EXCEL
Type I and Type III SS Xuhua Xia > anova(fit) Analysis of Variance Table Response: MS Df Sum Sq Mean Sq F value Pr(>F) D e-10 *** TM e-08 *** D:TM e-06 *** Residuals > drop1(fit,~.,test="F") Single term deletions Model: MS ~ D * TM Df Sum of Sq RSS AIC F value Pr(>F) D e-07 *** TM D:TM e-06 *** Type I SS and F-test Type III SS and F-test
R functions Xuhua Xia nd1<-subset(md,subset=(TM=="A")) nd2<-subset(md,subset=(TM=="B")) nd3<-subset(md,subset=(TM=="C")) nd1<-nd1[order(nd1$D),] nd2<-nd2[order(nd2$D),] nd3<-nd3[order(nd3$D),] y1<-predict(fit,nd1,interval="confidence") y2<-predict(fit,nd2,interval="confidence") y3<-predict(fit,nd3,interval="confidence") par(mfrow=c(1,3)) plot(D[TM=="A"],MS[TM=="A"],xlab="D",ylab="MS",xlim=c(minX,maxX),ylim=c(minY,maxY),pch=16) points(D[TM == "B"], MS[TM == "B"], col='red',pch=16) points(D[TM=="C"], MS[TM == "C"], col='blue',pch=16) lines(nd1$D,y1[,1],col="black") lines(nd1$D,y1[,2],col="black") lines(nd1$D,y1[,3],col="black") lines(nd2$D,y2[,1],col="red") lines(nd2$D,y2[,2],col="red") lines(nd2$D,y2[,3],col="red") lines(nd3$D,y3[,1],col="blue") lines(nd3$D,y3[,2],col="blue") lines(nd3$D,y3[,3],col="blue") Call plot before lines
95% CI plots Xuhua Xia