Unit 26: Model Assumptions & Case Analysis II: More complex designs

Unit 26: Model Assumptions & Case Analysis II: More complex designs

What is New since Case Analysis I in First Semester
This semester we have focused on how to handle analyses when observations are not completely independent Four types of designs: Pre-test/post-test designs analyzed with ANCOVA Traditional repeated measures designs with categorical within subject factors. Analyzed with either LM on transformed differences and averages or LMER More advanced repeated measures designs with quantitative within subject factors analyzed in LMER Other designs with dependent data where observations are nested within groups (e.g., students within classrooms, siblings within families) analyzed with LMER

Pre-test/Post-test Designs
How do we evaluate model assumptions and conduct case analysis in the pre-test/post test design? These designs involve standard general linear models analyzed with LM. Model assumptions and case analyses are handled as we learned Case analysis conducted first (generally) Regression outliers: visual inspection and test of studentized residuals Influence: visual inspection and thresholds for Cooks D and DFBetas Model assumptions focus on residuals: Normality: Q-Q (Quantile Comparison) plot Constant variance (across Y-hat): statitsical test and spread level plot Linearity (mean = 0 for all Y-hat): Component plus residual plot

Pre-test/Post-test design
Evaluate a course designed to improve math achievement Measure math achievement at pre-test (before course) Randomly assign to course vs. control group Measure math achievement at post-test How do you analyze, why and what is the critical support for the new course?

> m = lm(Post ~ Pre + X, data=d)
> modelSummary(m) lm(formula = Post ~ Pre + X, data = d) Observations: 102 Linear model fit by least squares Coefficients: Estimate SE t Pr(>|t|) (Intercept) ** Pre ** X ** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Sum of squared errors (SSE): , Error df: 99 R-squared: FOCUS ON TEST OF PARAMETER ESTIMATE FOR X

Make Data for Pre-Post Design
#Set.seed to provide same results from rnorm each time. set.seed(4321) X = rep(c(-.5,.5), 50) Pre = rnorm(length(X), 100,10) Post = *X + .5*Pre + rnorm(length(X),0,15) D = data.frame(X=X, Pre=Pre, Post=Post) #add two more points for demonstration purposes d[101,] = c(.5, max(Pre), max(Post)+10) d[102,] = c(-.5, min(Pre), max(Post)+10)

Regression Outliers: Studentized Residuals
> modelCaseAnalysis(m,'Residuals')

Overall Model Influence: Cooks D

Specific Influence: DFBetas

Specific Influence: Added Variable Plots

Updated Results without Outlier (102)
> d1 = dfRemoveCases(d,c(102)) > m1 = lm(Post ~ Pre + X, data=d1) > modelSummary(m1) lm(formula = Post ~ Pre + X, data = d1) Observations: 101 Linear model fit by least squares Coefficients: Estimate SE t Pr(>|t|) (Intercept) Pre *** X *** --- Sum of squared errors (SSE): , Error df: 98 R-squared:

Normality of Residuals
> modelAssumptions(m,‘Normal')

Constant Variance > modelAssumptions(m,'Constant')
Suggested power transformation: Non-constant Variance Score Test Variance formula: ~ fitted.values Chisquare = Df = p =

Constant Variance

Linearity

Next Example: Traditional Repeated Measures
Beverage Group: Alcohol vs. No alcohol (between subjects) Threat: Shock vs. Safe (within subjects) DV is startle response Prediction is BG X Threat interaction with threat effect smaller in Alcohol than No alcohol group (i.e., a stress response dampening effect of alcohol)

Make repeated measures data
set.seed(11112) N=50 Obs=2 SubID = rep(1:N, each=Obs) BG = rep(c(-.5, .5), each=N) Threat = rep(c(-.5, .5), N) STL = (100+rep(rnorm(N,0,20),each=Obs)) + ((30+rep(rnorm(N,0,3),each=Obs))*Threat) + (0*BG) + (-10*BG*Threat) + rnorm(N*Obs,0,5) dL = data.frame(SubID=SubID, BG=BG, Threat=Threat, STL=STL) #SubID 51: Big startler dL[101,] = c(51, .5, .5, 500) dL[102,] = c(51, .5, -.5, 475) #SubID 52: Atypical Threat but really big BG dL[103,] = c(52, .5, .5, 50) dL[104,] = c(52, .5, -.5, 150)

The Data in Long Format > head(dL,30) SubID BG Threat STL
. . .

The Data in Long Format > tail(dL,30) SubID BG Threat STL
. . .

Cast to Wide Format dW = dcast(data=dL, formula= SubID+BG~Threat,
value.var='STL') dW = varRename(dW,c(-.5, .5), c('Safe', 'Shock')) > head(dW,20) SubID BG Safe Shock . . .

How can you test for the BG X Threat interaction using LM
Calculate Shock – Safe Difference Regress difference on BG Test of BG parameter estimate is BG X Threat interaction

> dW$STLDiff = dW$Shock-dW$Safe
> mDiff = lm(STLDiff ~ BG, data=dW) > modelSummary(mDiff) lm(formula = STLDiff ~ BG, data = dW) Observations: 52 Linear model fit by least squares Coefficients: Estimate SE t Pr(>|t|) (Intercept) *** BG ** --- Sum of squared errors (SSE): , Error df: 50 R-squared: What does intercept test?

> dW$STLAvg = (dW$Shock+dW$Safe)/2
> mAvg = lm(STLAvg ~ BG, data=dW) > modelSummary(mAvg) lm(formula = STLAvg ~ BG, data = dW) Observations: 52 Linear model fit by least squares Coefficients: Estimate SE t Pr(>|t|) (Intercept) <2e-16 *** BG --- Sum of squared errors (SSE): , Error df: 50 R-squared:

Your experiment focused on the BG X Threat interaction
Your experiment focused on the BG X Threat interaction. You want to report the test of that parameter estimate with confidence. What do you do? Evaluate model assumptions and do case analysis on the mDiff model. This is just a simple linear regression in LM. You know exactly what to do.

Regression Outliers: Studentized Residuals
modelCaseAnalysis(mDiff,'Residuals')

Model Influence: Cooks d
modelCaseAnalysis(mDiff,‘Cooksd')

Specific Influence: DFBetas
modelCaseAnalysis(mDiff,‘DFBETAS')

Added Variable Plots

> dW1 = dfRemoveCases(dW,c(52)) > dim(dW1) [1] 51 6
[1] 51 6 > mDiff1 = lm(STLDiff ~ BG, data=dW1) > modelSummary(mDiff1) lm(formula = STLDiff ~ BG, data = dW1) Observations: 51 Linear model fit by least squares Coefficients: Estimate SE t Pr(>|t|) (Intercept) < 2e-16 *** BG *** --- Sum of squared errors (SSE): , Error df: 49 R-squared:

Normality modelAssumptions(mDiff1,'Normal')

modelAssumptions(mDiff1,’Constant’)
Non-constant Variance Score Test Chisquare = Df = p =

Component + Residual Plot
modelAssumptions(mDiff1,’Linear’)

What about the mAvg model
What about the mAvg model? Should you evaluate model assumptions and conduct case analysis on that model? That model has no effect at all on the parameter estimate that tests the BG x Threat interaction. If you are interested reporting and interpreting the effects from that model (main effect of BG on startle, mean startle response), then you should of course evaluate the model. However, if not, then it is not necessary. You will find SubID 51 to be unusual in that model Classic ANOVA bound these models together in results. As such, if you evaluate both, people probably want to see the same sample in both models.

Classic Repeated Measures in LMER
m = lmer(STL ~ BG*Threat + (1+Threat|SubID),data=dL, control= lmerControl(check.nobs.vs.nRE="ignore"))

Classic Repeated Measures in LMER
modelSummary(m) Observations: 104; Groups: SubID, 52 Linear mixed model fit by REML Fixed Effects: Estimate SE F error df Pr(>F) (Intercept) < 2e-16 BG Threat BG:Threat --- NOTE: F, error df, and p-values from Kenward-Roger approximation Random Effects: Groups Name Std.Dev. Corr SubID (Intercept) Threat Residual AIC: ; BIC: ; logLik: ; Deviance: 995.5

Case Analysis and Model Assumptions in LMER
Standards are not as well established for LMEM as for linear models. We will follow recommendations from Loy & Hoffmann (2014). Can examine residuals at both levels of the model (eij at level 1 and random effects at level 2) Can use these residuals to check for normality, constant variance, and linearity Level 2 residuals can also identify model outliers with respect to fixed effects Can calculate influence statistics (Cooks d) …BUT it’s a bit complicated at this point. Stay tuned for integration in lmSupport.

Examine level 1 residuals using LS residuals (basically residuals from fitting OLS regression in each subject) > resid1 = HLMresid(m, level = 1, type = "LS", standardize = TRUE) > head(resid1) STL BG Threat SubID LS.resid fitted std.resid NaN NaN NaN NaN NaN NaN

Examine level 2 residuals using Empirical Bayes approach (default)
Examine level 2 residuals using Empirical Bayes approach (default). These are simply the random effects (ranef) for each subject from the model > resid2 = HLMresid(object = m, level = "SubID") > head(resid2) (Intercept) Threat

Level 2 Outlier for Intercept
varPlot(resid2[,1, VarName = ‘Intercept', IDs=rownames(resid2))

Level 2 Outlier for Threat
varPlot(resid2$Threat, VarName = 'Threat', IDs=rownames(resid2))

Influence: Cooks D cooksd <- cooks.distance(m, group = "SubID")
varPlot(cooksd, , IDs=rownames(resid2))

dotplot_diag(x = cooksd, cutoff = "internal", name = "cooks
dotplot_diag(x = cooksd, cutoff = "internal", name = "cooks.distance") + ylab("Cook's distance") + xlab("SubID")

Quantifying Parameter Estimate Change
beta_51 = as.numeric(attr(cooksd, "beta_cdd")[[51]]) names(beta_51) <- names(fixef(m)) beta_52 = as.numeric(attr(cooksd, "beta_cdd")[[52]]) names(beta_52) <- names(fixef(m)) fixef(m) (Intercept) BG Threat BG:Threat beta_51 #This subject increases intercept and BG beta_52 #This subject decreases threat and increases (more negative) interaction

Normality for Random Intercept
ggplot_qqnorm(x = resid2[,1], line = "rlm")

Normality for Random Threat
ggplot_qqnorm(x = resid2$Threat, line = "rlm")

TRUE LMEM Threat: Safe vs. Shock Trial 1-20 DV is startle Focus on Trial X Threat interaction

set.seed(11111) N=50 Obs=20 SubID = rep(1:N, each=Obs) Trial = rep(1:Obs, N) cTrial = Trial - mean(Trial) Threat = rep(c(-.5, .5),Obs/2*N) STL = (100+rep(rnorm(N,0,20),each=Obs)) + ((30+rep(rnorm(N,0,3),each=Obs))*Threat) + ((-2+rep(rnorm(N,0,.3),each=Obs))*cTrial) + ((0+rep(rnorm(N,0,.3),each=Obs))*cTrial*Threat) + rnorm(N*Obs,0,3) dL = data.frame(SubID=SubID, Trial=Trial, Threat=Threat, STL=STL)

> head(dL,30) SubID Trial Threat STL

Level 1 Residuals: Normality
resid1 = HLMresid(m, level = 1, type = "LS", standardize = TRUE) ggplot_qqnorm(x = resid1$std.resid, line = "rlm")

Constant Variance plot(resid1$fitted,resid1$std.resid)

Linear plot(dL$Trial,resid1$std.resid)

Level 1 Outliers varPlot(resid1$std.resid)

Level 2: Normal Random Intercept
resid2 <- HLMresid(object = m, level = "SubID") ggplot_qqnorm(x = resid2[,1], line = "rlm")

Level 2: Normal Random Threat
ggplot_qqnorm(x = resid2$Threat, line = "rlm")

Level 2: Normal Random cTrial
ggplot_qqnorm(x = resid2$cTrial, line = "rlm")

Level 2: Normal Random Threat:cTrial

Level 2 Outliers: cTrial:Threat

Influence: Cooks D cooksd <- cooks.distance(m, group = "SubID")
varPlot(cooksd, IDs=rownames(cooksd))

Cooks D dotplot_diag(x = cooksd, cutoff = "internal", name =
"cooks.distance") + ylab("Cook's distance") + xlab("SubID")

Unit 26: Model Assumptions & Case Analysis II: More complex designs

Similar presentations

Presentation on theme: "Unit 26: Model Assumptions & Case Analysis II: More complex designs"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Unit 26: Model Assumptions & Case Analysis II: More complex designs

Similar presentations

Presentation on theme: "Unit 26: Model Assumptions & Case Analysis II: More complex designs"— Presentation transcript:

Similar presentations

About project

Feedback