Andrea Friese, Silvia Artuso, Danai Laina

Slides:



Advertisements
Similar presentations
AMMBR from xtreg to xtmixed (+checking for normality, random slopes)
Advertisements

SPH 247 Statistical Analysis of Laboratory Data 1April 2, 2013SPH 247 Statistical Analysis of Laboratory Data.
Inference for Regression
October 13, 2009 Session 7Slide 1 PSC 5940: Elaborating Multi- Level Models in R Session 7 Fall, 2009.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Multilevel modeling in R Tom Dunn and Thom Baguley, Psychology, Nottingham Trent University
Objectives (BPS chapter 24)
October 20, 2009 Session 8Slide 1 PSC 5940: Estimating the Fit of Multi-Level Models Session 8 Fall, 2009.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Introduction to Linear Regression.
Chapter 13 Multiple Regression
Linear Modelling III Richard Mott Wellcome Trust Centre for Human Genetics.
Chapter 12 Multiple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Mixed models Various types of models and their relation
Introduction to Multilevel Modeling Using SPSS
Statistics for the Social Sciences Psychology 340 Fall 2013 Tuesday, November 19 Chi-Squared Test of Independence.
Introduction to Linear Regression and Correlation Analysis
Lecture 5 Linear Mixed Effects Models
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Introduction to Linear Regression
Multilevel Linear Models Field, Chapter 19. Why use multilevel models? Meeting the assumptions of the linear model – Homogeneity of regression coefficients.
BUSI 6480 Lecture 8 Repeated Measures.
Mixed Effects Models Rebecca Atkins and Rachel Smith March 30, 2015.
Basic Managerial Accounting Concepts Cost Behavior: Fixed Costs Fixed vs. Variable Relevant range: normal production levels Behavior of fixed costs as.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Chapter 10: Determining How Costs Behave 1 Horngren 13e.
Problem What if we want to model: –Changes in a group over time Group by time slices (month, year) –Herd behavior of multiple species in the same area.
G Lecture 71 Revisiting Hierarchical Mixed Models A General Version of the Model Variance/Covariances of Two Kinds of Random Effects Parameter Estimation.
Linear model. a type of regression analyses statistical method – both the response variable (Y) and the explanatory variable (X) are continuous variables.
Stats Methods at IC Lecture 3: Regression.
Lecture Slides Elementary Statistics Twelfth Edition
An Introduction to Latent Curve Models
Chapter 2 Linear regression.
CHAPTER 12 More About Regression
Chapter 13 Simple Linear Regression
Multilevel modelling: general ideas and uses
Chapter 15 Multiple Regression Model Building
The simple linear regression model and parameter estimation
Chapter 15 Panel Data Models.
BINARY LOGISTIC REGRESSION
Jefferson Davis Research Analytics
Applied Biostatistics: Lecture 2
Correlation, Regression & Nested Models
Inferences for Regression
Inference for Regression
How Good is a Model? How much information does AIC give us?
Linear Mixed Models in JMP Pro
Unit 26: Model Assumptions & Case Analysis II: More complex designs
Mixed models and their uses in meta-analysis
CJT 765: Structural Equation Modeling
Multiple Regression.
CHAPTER 12 More About Regression
Chapter 5 Introduction to Factorial Designs
Simple Linear Regression - Introduction
Lecture Slides Elementary Statistics Thirteenth Edition
The Practice of Statistics in the Life Sciences Fourth Edition
STAT120C: Final Review.
G Lecture 6 Multilevel Notation; Level 1 and Level 2 Equations
6-1 Introduction To Empirical Models
When You See (This), You Think (That)
M248: Analyzing data Block D UNIT D2 Regression.
Simple Linear Regression
CHAPTER 12 More About Regression
Alexandra Kuznetsova Biostatistician, Leo Pharma
DSS-ESTIMATING COSTS Cost estimation is the process of estimating the relationship between costs and cost driver activities. We estimate costs for three.
Chapter 14 Inference for Regression
CHAPTER 12 More About Regression
Inferences for Regression
F test for Lack of Fit The lack of fit test..
Presentation transcript:

Andrea Friese, Silvia Artuso, Danai Laina LME4 Andrea Friese, Silvia Artuso, Danai Laina

What we are going to do... introduction of the package linear mixed models: introduction to example fixed and random effects crossed and nested random effects random slopes and random intercepts models comparison and evaluation of the models statistics

About lme4 provides functions to fit and analyze statistical linear mixed models, generalized mixed models, and nonlinear mixed models we focus on linear mixed models what they are? → I’ll get there

mainly used in fields like biology, physics, or social sciences developed by Bates, Mächler, Bolker and Walker around 2012 fundamentally based on its predecessor nlme lme4 as baseline for multilevel models + various packages that enhance its feature set

Now: what is a linear mixed model? picture a basic linear model with one dependent variable and one independent variable ( = fixed effect) add various fixed effects add random effects random effect: a factor we want to control, because it can affect the result; “grouping-factor”

In lme4: command to analyze fixed effects on the dependent variable: lm <- lm(variable ~ fixed effect 1 + … , data = lmm.data) summary(lm) to see if/ how individual fixed effects respond in other combinations replace the + operator by * or /: ...fixed effect 1 * fixed effect 2 + fixed effect 3...

to fit a random effect into the function: lmer <- lmer(variable ~ fixed effect 1 + ....+ (1 | random), data = lmm.data) summary(lmer)

Exercises Exercise 1: Run the data that’s provided, then use head(lmm.data) and str(lmm.data) and take a look at your data Exercise 2: Analyze the fixed effects and the dependent variable from our example Exercise 3: Add the random variable ‘school’ To simplify the code check the dataset for abbreviations of fixed effects

Types of random effects we used (1|random)to fit our random effect on the right side of | operator is the so-called “grouping factor” (--> “clusters”) random effects (factors) can be crossed or nested - it depends on the relationship between the variables

Crossed random effects Means that a given factor appears in more than one level of the upper level factor. We can have observations of the same subject across all the random variable ranges (crossed) or at least some of them (partially crossed). We would then fit the identity of the subject and the random factor ranges as (partially) crossed random effects. For example, there are pupils within classes measured over several years. To specify for them: (1|class) + (1|pupil)

Be careful: from Wikipedia: “Multilevel models (also known as hierarchical linear models, linear mixed-effects models, mixed models, nested data models, random coefficient, random-effects models, random parameter models, or split-plot designs)” But while all “hierarchical linear models” are mixed models, not all mixed models are hierarchical. The reason is because you can have crossed (or partially crossed) random factors, that do not represent levels in a hierarchy.

Nested random effects When a lower level factor appears only within a particular level of an upper level factor. For example, pupils within classes at a fixed point in time, or classes within schools (implicit nesting). There are two ways to explicitly specify for it in lme4: (1|class) + (1|class:pupil) or (1|class/pupil) Or, we can create a new nested factor: dataset <- within(dataset, sample <- factor(class:pupil))

But the nesting can be also explicit in the dataset, when we code differently and uniquely the variables. In this case, using (1|class/pupil) or (1|class)+(1|pupil) in the model will produce the same effect.

The same model specification can be used to represent both (partially) crossed or nested factors, so you can’t use the models specification to tell you what’s going on with the random factors, you have to look at the structure of the factors in the data. Example: head(dataset) or str(dataset) To check if there is explicit nesting, we can use the function isNested: with(dataset, isNested(pupil, class)) # FALSE or TRUE

And if we need to make explicit the nesting of a random effect in a fixed effect? In this case: fixed + (1|fixed : random) Because the interaction between a fixed and a random effect is always random.

Exercise Exercise 4: In the dataset lmm.data, check if class is nested in school Exercise 5: add the random variable class as a nested effect in school and look at the results; use summary()and plot() Exercise 6: Look at the variance explained by the school factor. What happens if you treat it as a fixed effect? Try out

Random slopes vs. random intercepts models Random effects: Groups Name Variance Std.Dev. class:school (Intercept) 8.2046 2.8644 school (Intercept) 93.8433 9.6873 Residual 0.9684 0.9841 Number of obs: 1200, groups: class:school, 24; school, 6 Our model is what is called a random intercept model: subjects and items are allowed to have different intercept, but not different slopes → coef(). But what if we think that a fixed effect is not the same for all subjects and items?

Random slopes model (1 + fixed|random) Example: randomslopes_models <- lmer(response ~ fixedA + fixedB + (1 + fixedA|random), data=data) This allows the slope of the fixedA variable to vary by the random one. Here we modify our random effect term to include variables before the grouping terms: (1 +open|school/class) tells R to fit a varying slope and varying intercept model for schools and classes nested within schools, and to allow the slope of the open variable to vary by school.

Exercise Exercise 7: try to fit to our dataset a random slope model, with openness’ slope varying by school Look at the results

How can we compare and evaluate our mixed models? #Check the arguments of lmer command and find the REML argument ?lmer # Run the following model #Model_1 lmer <- lmer(extro ~ open + agree + social +(1|school/class), data= lmm.data, REML = T) summary(lmer) Check the REML argument, what does it stand for? REML stands for the Restricted Maximum Likelihood It is the default option when constructing a model in lme4 but it is not suitable for comparing models

How can we compare and evaluate our mixed models? # Run the following model #Model_1 lmer <- lmer(extro ~ open + agree + social +(1|school/class), data= lmm.data, REML = F) summary(lmer) Do you notice any differences in the output compared to the previous ones? Linear mixed model fit by maximum likelihood ['lmerMod'] Formula: extro ~ open + agree + social + (1 | school/class) Data: lmm.data AIC BIC logLik deviance df.resid 3545.1 3580.7 -1765.5 3531.1 1193

How can we compare and evaluate our mixed models? #Run the following model #Model_2 lmer2 <- lmer(extro ~ social + (1|class) + (1|school), data=lmm.data, REML = F) summary(lmer2) Which is the output this time? Linear mixed model fit by maximum likelihood ['lmerMod'] Formula: extro ~ social + (1 | class) + (1 | school) Data: lmm.data AIC BIC logLik deviance df.resid 4715.6 4741.0 -2352.8 4705.6 1195 Which model should we select?

How can we simply test statistically our models? # Test statistically the two models anova(lmer, lmer2) Check the output. Which model would you choose? # Record the deviance and df of residuals that you calculated before for the two models # lmer: deviance: 3531.1; df resid: 1193 # lmer2: deviance: 4705.6; df resid: 1195 # Perform a chi-squared test pchisq(4705.6 – 3531.1, 1195 – 1193, lower.tail=F)

And what if we drop out all our fixed factors? # Fit the following models, a full model and a reduced model in which we dropped our fixed effects Full.lmer <- lmer(extro ~ open + agree + social + (1|school/class), data= lmm.data, REML = F) Reduced.lmer <- lmer(extro ~ 1 + (1|school/class), data=lmm.data, REML=F) # Compare them anova(Reduced.lmer, Full.lmer) What would you conclude?

And after choosing the appropriate model? REML calculates the most optimal estimates of the variables, optimizing the model So our model is re-fitted with this criterion # After choosing the appropriate model #Model_1 lmer <- lmer(extro ~ open + agree + social +(1|school/class), data= lmm.data, REML = T) summary(lmer)

Conclusions To compare different models with different fixed effects and evaluate the amount of explained information → Maximum Likelihood measures The lower the measures, the better our models fit to the data set Depending always on our questions, we construct and select the best fitting models, optimizing them with the REML criterion

THANK YOU FOR YOUR ATTENTION