Lecture 5 Linear Mixed Effects Models

Name: Lecture 5 Linear Mixed Effects Models
Uploaded: 2017-07-13T19:06:13+00:00
Duration: PTM35S47
Channel: Veronica Curtis
Description: Lecture 5 Linear Mixed Effects Models

Lecture 5 Linear Mixed Effects Models
Advanced Research Skills Lecture 5 Linear Mixed Effects Models Olivier MISSA,

Outline Explore options available when assumptions of classical linear models are untenable. In this lecture: What can we do when observations (and thus residuals) are not strictly independent ?

Classical Linear Models
Defined by three assumptions: (1) the response variable is continuous. (2) the residuals (ε) are normally distributed and ... (3) ... independently and identically distributed. Today, we will consider a range of options available when we either know or suspect that our data are not strictly independent from each other. (Departures from the other assumptions will be dealt with later)

Non-independent Residuals
In previous lectures: We merely checked the independence of our residuals by inspecting the plot of residuals vs. fitted values. Example from lecture 2: A non-linear trend which suggested that our linear model was probably misspecified

Non-independent Observations
Data collection can often lead to non-independence among your observations. A few examples: Repeated (longitudinal) observations on the same "individuals" (on different days, weeks, months, years) Collecting data from a few locations (spatial structure) (surveys conducted in schools/streets, fields/sites/islands) Collecting data on related individuals (father & sons, twins, species within the same genus/tribe/family). If we treat all these observations as fully independent, we are likely to overestimate the number of degrees of freedom. which may lead us to wrongly reject a null hypothesis (type I error)

Non-independent Observations
Two ways to cope with non-independent observations When design is balanced ("equal sample size") We can use factors to partition our observations in different "groups" and analyse them as an ANOVA or ANCOVA. We already know how to do that (when factors are "crossed") We just need to figure out how to cope with nested factors. When design is unbalanced ("uneven sample size") Mixed effect models are then called for.

Nested Anova Example: A designed field experiment on crop yield with three treatments : irrigation (control, irrigated) sowing density (low, medium, high) fertilizer (N, P, NP) Split plot design control irrigated high each block has 18 different subplots medium Block low N P NP

Nested Anova Example: A designed field experiment on crop yield with three treatments > yields <- read.table("splityield.txt", header=T) > attach(yields) > names(yields) [1] "yield" "block" "irrigation" "density" "fertilizer" > str(yields) 'data.frame': 72 obs. of 5 variables: $ yield : int $ block : Factor w/ 4 levels "A","B","C","D": $ irrigation: Factor w/ 2 levels "control","irrigated": $ density : Factor w/ 3 levels "high","low","medium": $ fertilizer: Factor w/ 3 levels "N","NP","P":

Nested Anova Example: A designed field experiment on crop yield with three treatments > model0 <- aov(yield ~ irrigation*density*fertilizer) ## non-nested version (incorrect !!) > summary(model0) Df Sum Sq Mean Sq F value Pr(>F) irrigation e-10 *** density ** fertilizer ** irrigation:density *** irrigation:fertilizer * density:fertilizer irrigation:density:fertilizer Residuals Sum

Nested Anova ## Correct nested version, nesting from large to small
> model1 <- aov(yield ~ irrigation*density*fertilizer + Error(block/irrigation/density) ) ## Correct nested version, nesting from large to small > summary(model1) Error:block Df Sum Sq Mean Sq F value Pr(>F) Residuals Error:block:irrigation Df Sum Sq Mean Sq F value Pr(>F) irrigation * Residuals Error:block:irrigation:density Df Sum Sq Mean Sq F value Pr(>F) density irrigation:density * Residuals Error:Within Df Sum Sq Mean Sq F value Pr(>F) fertilizer *** irrigation:fertilizer ** density:fertilizer irrigation:density:fertilizer Residuals Res Sum Gd Sum

Nested Anova Comparison between nested and non-nested results
Non-nested Nested Df F value Pr(>F) F value Pr(>F) irrigation e density fertilizer irrig:dens irrig:ferti dens:ferti irrig:dens:ferti control irrigated high medium Block low N P NP

Recognizing Nestedness is key !
Being able to distinguish crossed factors (independent from each other) from nested factors is essential. Nestedness occurs most often from spatial structure Student surveys in different classes from different schools. Samples from individual branches on sets of trees within a number of forest patches. But can also occur from temporal structure Samples taken from the same individuals every fortnight for 2 months on two successive years. At the end of the slide: Explain a few things on Error formulae: (independent "crossed" factors) are better inserted within the main model (can then be simplified) (nested factors) need to be specified in the error formula (and must not be simplified).

When the design is not balanced
We need a different modelling framework: Mixed Effects Models. So called because they mix together fixed effects and random effects. Until now, we have only used fixed effects in our models, each effect having an estimated parameter (intercept, slope, mean, ...). But in certain circumstances, these parameters may not be very informative and one would be better off trying to "estimate" the underlying distribution they come from. An example will help clarify the difference between these 2 approaches.

Mixed Effects Modelling
Example: railway rails tested for longitudinal stress. 6 rails chosen at random and tested three times with ultrasound. > library(nlme) ## package dedicated to mixed effects models > data(Rail) > names(Rail) [1] "Rail" "travel" > stripchart(Rail$travel ~ Rail$Rail, pch=16, ylab="Ultrasonic Travel Time (nanosecs)", xlab="Rail number", vertical =T, col=rainbow(6) ) > abline(h=mean(Rail$travel), col="Gray85", lty=2, lwd=2) Classically in a linear model, we would be able to tell whether the rails differ significantly from each other. But it doesn't help us make predictions about other rails.

Random effects: interested in explaining the variance of the response. Fixed effects: interested in explaining the response itself. Fixed effects Male & Female Control & Treatment Wet vs. Dry Light vs. Shade Random effects Blocks Individuals with Repeated measures Genotypes Sites

Back to our example: Makes Rail a simple factor (not an ordered one) > Rail2 <- data.frame(travel=Rail$travel, Rail=factor(as.character(Rail$Rail)) ) > Rail.lm <- lm(travel ~ Rail, data=Rail2) ## LINEAR MODEL > summary(Rail.lm) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) e-11 *** Rail e-05 *** Rail e-07 *** Rail e-08 *** Rail Rail e-06 *** --- Residual standard error: on 12 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 5 and 12 DF, p-value: 1.033e-09

> anova(Rail.lm) Analysis of Variance Table Response: travel Df Sum Sq Mean Sq F value Pr(>F) Rail e-09 *** Residuals ## now as a MIXED EFFECT MODEL > Rail.lme <- lme(travel ~ 1, data=Rail, random= ~1|Rail) > summary(Rail.lme) Linear mixed-effects model fit by REML Data: Rail AIC BIC logLik Random effects: Formula: ~1 | Rail (Intercept) Residual StdDev: Fixed effects: travel ~ 1 Value Std.Error DF t-value p-value (Intercept) Standard Deviation associated with rails Standard Deviation of residuals Grand Average

Is the Random effect significant ? > Rail.lme$call ## random effect model lme.formula(fixed = travel ~ 1, data = Rail, random = ~1 | Rail) > AIC(Rail.lme) ## exact AIC version [1] > Rail.lm0 <- lm(travel ~ 1, data=Rail2) ## NULL linear model > AIC(Rail.lm0) [1] > Rail.lm$call ## model with Rail as a fixed effect factor lm(formula = travel ~ Rail, data = Rail) > AIC(Rail.lm) [1] Comparing models which only differ in their random effects is easy (with AIC). Comparing models which differ in their fixed effects is a little harder. Can only be done using "maximum likelihood" (not the default method in lme).

Applied on the split plot study of crop yield > yields <- read.table("splityield.txt", header=T) > yield.lme <- lme(yield ~ irrigation*density*fertilizer, data=yields, random= ~1|block/irrigation/density) > summary(yield.lme) Linear mixed-effects model fit by REML Data: yields AIC BIC logLik Random effects: Formula: ~1 | block (Intercept) StdDev: Formula: ~1 | irrigation %in% block StdDev: Formula: ~1 | density %in% irrigation %in% block (Intercept) Residual StdDev:

Fixed effects: yield ~ irrigation * density * fertilizer Value Std.Error DF t-value p-value (Intercept) irrigirrig dnslow dnsmed fertiNP fertiP irrigirrig:dnslow irrigirrig:dnsmed irrigirrig:fertiNP irrigirrig:fertiP dnslow:fertiNP dnsmed:fertiNP dnslow:fertiP dnsmed:fertiP irrigirrig:dnslow:fertiNP irrigirrig:dnsmed:fertiNP irrigirrig:dnslow:fertiP irrigirrig:dnsmed:fertiP

> anova(yield.lme) numDF denDF F-value p-value (Intercept) <.0001 irrigation density fertilizer irrigation:density irrigation:fertilizer density:fertilizer irrigation:density:fertilizer ## We should probably remove the three-way interaction ## But if we are fiddling with the fixed effects, we ought ## to fit the model through Maximum Likelihood and base our ## decisions on its AIC values and Likelihood Ratio Tests > yield.lme.ml <- update(yield.lme, ~. ,method="ML") > AIC(yield.lme.ml) [1]

> yield.lme.ml2 <- update(yield.lme.ml, ~. - irrigation:density:fertilizer) > yield.lme.ml2$method [1] "ML" ## just checking that update() kept using "ML" > AIC(yield.lme.ml2) [1] ## an improvement > anova(yield.lme.ml2) numDF denDF F-value p-value (Intercept) <.0001 irrigation density fertilizer irrigation:density irrigation:fertilizer density:fertilizer > yield.lme.ml3 <- update(yield.lme.ml2, ~. - density:fertilizer) > AIC(yield.mle.lm3) [1]

> anova(yield.lme.ml, yield.lme.ml2) Model df AIC BIC logLik Test L.Ratio p-value yield.lme.ml yield.lme.ml vs > anova(yield.lme.ml2, yield.lme.ml3) yield.lme.ml yield.lme.ml vs > anova(yield.lme.ml3) numDF denDF F-value p-value (Intercept) <.0001 irrigation density fertilizer irrigation:density irrigation:fertilizer > yield.lme.ml4 <- update(yield.lme.ml3, ~. –irrigation:density) > AIC(yield.mle.ml4) [1]

> anova(yield.lme.m3, yield.lme.ml4) Model df AIC BIC logLik Test L.Ratio p-value yield.lme.ml yield.lme.ml vs > anova(yield.lme.ml4) numDF denDF F-value p-value (Intercept) <.0001 irrigation density fertilizer irrigation:fertilizer ## here comes Model Checking > shapiro.test(yield.lme.ml3$residuals[,"fixed"]) # Best Model Shapiro-Wilk normality test data: yield.lme.ml3$residuals[, "fixed"] W = , p-value = ml4 is one simplification too far matrix column

including all random effects > res <- yield.lme.ml3$resid[,"fixed"] > st.res <- res/sd(res) > qqnorm(st.res, pch=16, main="") > qqline(st.res, col="red", lwd=2) excluding all random effects > res <- yield.lme.ml3$resid[,4] > st.res <- res/sd(res) > qqnorm(st.res, pch=16, main="") > qqline(st.res, col="red", lwd=2)

> plot(yield.lme.ml3) ## by default Residuals vs Fitted values > plot(yield.lme.ml3, yield ~ fitted(.) ) ## Observed vs Fitted values

> qqnorm(yield.lme.ml3, ~resid(.) | block) ## qqplot but broken down by block

Lecture 5 Linear Mixed Effects Models

Similar presentations

Presentation on theme: "Lecture 5 Linear Mixed Effects Models"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 5 Linear Mixed Effects Models

Similar presentations

Presentation on theme: "Lecture 5 Linear Mixed Effects Models"— Presentation transcript:

Similar presentations

About project

Feedback