SC968: Panel Data Methods for Sociologists Random coefficients models.

Slides:



Advertisements
Similar presentations
Questions From Yesterday
Advertisements

Multilevel modelling short course
Hierarchical Linear Modeling: An Introduction & Applications in Organizational Research Michael C. Rodriguez.
Longitudinal Data Analysis for Social Science Researchers Introduction to Panel Models
Random effects as latent variables: SEM for repeated measures data Dr Patrick Sturgis University of Surrey.
Inference for Regression
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Correlation and regression
Lecture 4 (Chapter 4). Linear Models for Correlated Data We aim to develop a general linear model framework for longitudinal data, in which the inference.
Repeated Measures, Part 3 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
Analysis of Clustered and Longitudinal Data Module 3 Linear Mixed Models (LMMs) for Clustered Data – Two Level Part A 1 Biostat 512: Module 3A - Kathy.
Objectives (BPS chapter 24)
1 Scottish Social Survey Network: Master Class 1 Data Analysis with Stata Dr Vernon Gayle and Dr Paul Lambert 23 rd January 2008, University of Stirling.
Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Lecture 4 Linear random coefficients models. Rats example 30 young rats, weights measured weekly for five weeks Dependent variable (Y ij ) is weight for.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Chapter 13 Multiple Regression
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.
Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.
Chapter 10 Simple Regression.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
Chapter 12 Multiple Regression
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Clustered or Multilevel Data
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Multilevel Models 3 Sociology 8811, Class 25 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
Correlation and Regression Analysis
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Analysis of Clustered and Longitudinal Data
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Introduction to Multilevel Modeling Using SPSS
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL.
From GLM to HLM Working with Continuous Outcomes EPSY 5245 Michael C. Rodriguez.
Multilevel Analysis Kate Pickett Senior Lecturer in Epidemiology.
MULTILEVEL ANALYSIS Kate Pickett Senior Lecturer in Epidemiology SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.ppt‎University of York.
Repeated Measures, Part 2 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
Lecture 8: Generalized Linear Models for Longitudinal Data.
Hierarchical Linear Modeling (HLM): A Conceptual Introduction Jessaca Spybrook Educational Leadership, Research, and Technology.
Scientific question: Does the lunch intervention impact cognitive ability? The data consists of 4 measures of cognitive ability including:Raven’s score.
Assessing Survival: Cox Proportional Hazards Model
Introduction Multilevel Analysis
Multilevel Linear Models Field, Chapter 19. Why use multilevel models? Meeting the assumptions of the linear model – Homogeneity of regression coefficients.
Multilevel Linear Modeling aka HLM. The Design We have data at two different levels In this case, 7,185 students (Level 1) Nested within 160 Schools (Level.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Linear correlation and linear regression + summary of tests
Lecture 3 Linear random intercept models. Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The.
Regression: Checking the Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
HLM Models. General Analysis Strategy Baseline Model - No Predictors Model 1- Level 1 Predictors Model 2 – Level 2 Predictors of Group Mean Model 3 –
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Logistic Regression Analysis Gerrit Rooks
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Exact Logistic Regression
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Stats Methods at IC Lecture 3: Regression.
BINARY LOGISTIC REGRESSION
Logistic Regression APKC – STATS AFAC (2016).
CHAPTER 7 Linear Correlation & Regression Methods
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
Regression Analysis Week 4.
CHAPTER 29: Multiple Regression*
Introduction to Logistic Regression
Multiple Regression Chapter 14.
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Presentation transcript:

SC968: Panel Data Methods for Sociologists Random coefficients models

Overview Random coefficients models Continuous data Binary data Growth curves

Random coefficients models Also known as Multilevel models MLwiN Hierarchical models HLM Mixed models Stata

Random coefficients models for continuous outcomes

Random coefficients models We started off with OLS models that pooled data across many waves of panel data We separated the between and within variance with fixed effects and between effects models Then we allowed intercepts to vary for each individual using random effects models We can also allow the coefficients for the independent variables to vary for each individual These models are called random coefficients or random slopes models

Example of a random coefficients model Schools’ mean maths scores and student socioeconomic status (SES) Students at level 1 nested within schools at level 2 Using a random coefficients model we can estimate Overall mean maths score How SES relates to individual maths scores Within school variability in maths scores Between school variability in mean maths scores Between school variability in the relationship between SES and individual maths scores

Another example Children’s emotional problems Suppose we have problems measured each year from for pupils in a junior school Want to know if a school policy implemented in 2004 reduces problems Emotional problems for each year at level-1 and pupils at level-2 Using a random coefficients model we can examine Levels of emotional problems, averaged over years Within pupil variability in emotional problems Between pupil variability in emotional problems Whether the intervention reduced emotional problems Whether the intervention had different effects for different children What pupil characteristics made the intervention more or less successful

Possible combinations of slopes and intercepts with panel data Constant slopes Constant intercept The OLS model

Possible combinations of slopes and intercepts with panel data The random effects model Constant slopes Varying intercepts

Possible combinations of slopes and intercepts with panel data Varying slopes Constant intercept Unlikely to occur

Possible combinations of slopes and intercepts with panel data Varying slopes Varying intercepts Random coefficients model - separate regression for each individual

Random coefficients model for continuous data Fixed coefficients Random coefficients Residual

Random coefficients model for continuous data Fixed intercept Fixed slope Random intercept Random slope Random error

Partitioning unexplained variance in a random coefficients model Total variance at each level Variance explained by predictors Remaining unexplained variance Variancedue to random intercept Remaining unexplained variance Variancedue to random slopes Remaining unexplained variance

Steps in multi-level modelling (Hox,1995) 1. Compute variance for the baseline/null/unconditional model which includes only the intercept. 2. Compute variance for the model with level-1 independent variables included and the variance components of the slopes constrained to zero (that is, a fixed coefficients model). 3. Use a chi-square difference test to see if the fixed coefficients model has a significantly better fit than the baseline model. If it does, then proceed to investigate random coefficients. At this stage can drop non-significant level-1 independents from the model.

Steps in multi-level modelling (Hox,1995) 4. Identify which level-1 regression coefficients have significant variance across level-2 groups. Compute -2LL for the model with the variance components of the level-1 coefficients constrained to zero only for the coefficients which do not have significant variance across level-2 groups. 5. Add level-2 independent variables, determining which improve model fit. Drop variables which do not improve model fit. 6. Add cross-level interactions between explanatory level-2 variables and level-1 independent variables that had random coefficients (in step 3). Drop interactions which do not improve model fit.

Worked example Random 20% sample from BHPS Waves Ages 21 to 59 Outcome: GHQ likert scores Explanatory variable: household income last month (logged)

Random coefficients model example where y ij = GHQ score for subject i, j = 1,…, J x ij = logged household income in month to wave j β 1 = mean slope b i = subject-specific random deviation from mean slope u i = subject-specific random intercept

Linear random coefficients model Stata output

Random slopes Fixed effect Estimates covariance between all random effects Least restrictive model. xtmixed hlghq1 lnfihhmn || pid: lnfihhmn, mle cov(unstr) variance Mixed-effects ML regression Number of obs = Group variable: pid Number of groups = 2508 Obs per group: min = 1 avg = 7.4 max = 15 Wald chi2(1) = Log likelihood = Prob > chi2 = hlghq1 | Coef. Std. Err. z P>|z| [95% Conf. Interval] lnfihhmn | _cons | Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] pid: Unstructured | var(lnfihhmn) | var(_cons) | cov(lnfihhmn,_cons) | var(Residual) | LR test vs. linear regression: chi2(3) = Prob > chi2 =

. xtmixed hlghq1 lnfihhmn || pid: lnfihhmn, mle cov(unstr) variance Mixed-effects ML regression Number of obs = Group variable: pid Number of groups = 2508 Obs per group: min = 1 avg = 7.4 max = 15 Wald chi2(1) = Log likelihood = Prob > chi2 = hlghq1 | Coef. Std. Err. z P>|z| [95% Conf. Interval] lnfihhmn | _cons | Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] pid: Unstructured | var(lnfihhmn) | var(_cons) | cov(lnfihhmn,_cons) | var(Residual) | LR test vs. linear regression: chi2(3) = Prob > chi2 = Fixed intercept Fixed coefficient

Covariation between random intercept and random slope Random intercept Random slope. xtmixed hlghq1 lnfihhmn || pid: lnfihhmn, mle cov(unstr) variance Mixed-effects ML regression Number of obs = Group variable: pid Number of groups = 2508 Obs per group: min = 1 avg = 7.4 max = 15 Wald chi2(1) = Log likelihood = Prob > chi2 = hlghq1 | Coef. Std. Err. z P>|z| [95% Conf. Interval] lnfihhmn | _cons | Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] pid: Unstructured | var(lnfihhmn) | var(_cons) | cov(lnfihhmn,_cons) | var(Residual) | LR test vs. linear regression: chi2(3) = Prob > chi2 =

Post estimation predictions Stata output

Post estimation predictions – random coefficients. predict re_slope re_int, reffects | pid re_int re_slope | | | 1. | | 4. | | 16. | | 17. | | 35. | |

Predicted individual regression lines. gen intercept= _b[_cons] + re_int. gen slope = _b[lnfihhmn] + re_slope | pid intercept slope | | | | | | | | | | | | |

Partitioning unexplained variance in a random coefficients model Total variance at each level Variance explained by predictors Remaining unexplained variance Variancedue to random intercept Remaining unexplained variance Variancedue to random slopes Remaining unexplained variance

Calculating the variance partition coefficient Random intercepts model Between variance Total variance i.e. Between + Within

Calculating the variance partition coefficient Random slopes model At the intercept, = 0 So at the intercept, the VPC for the random slopes model reduces to the same as the random intercepts model

Variance partition coefficient for our example Tentative interpretation: least variability in GHQ for those on average incomes

Random coefficients models Categorical outcomes

Random coefficients model for binary data where β k is the mean coefficient or fixed effect of covariate k b ik is a subject-specific random deviation from mean coefficient u i is a subject-specific random intercept with mean zero

Worked example Random 20% sample 15 waves of BHPS Ages 21 to 59 Outcome: GHQ binary scores (psychological morbidity cases: hlghq2 > 2) Explanatory variable: employment status (jbstat recoded to employed/unemployed/olf)

Logistic random coefficients example where y ij = binary GHQ score for subject i, j = 1,…, J x ij = employment status in wave j β 1 = mean slope b i = subject-specific random deviation from mean slope u i = subject-specific random intercept

Logistic random coefficients model Stata output

. xtmelogit ghq unemp olf || pid: unemp olf,variance cov(unstr)

No constant term with odds ratios No random residual Because logit model

Random coefficients models for development over time

Growth curve models Models change over time as a continuous trajectory Suitable for research questions such as What is the trajectory for the population? Are there distinct trajectories for each respondent? If individuals have distinct trajectories, what variables predict these individual trajectories?

Linear growth curve model Individual growth curves t = 0 at baseline and 1,2,3 ….,T in successive waves Mean population growth curve

Worked example Random 20% sample from BHPS Waves All respondents over 16 years Outcome: self-rated health (hlstat) 5 point Likert scale with higher scores indicating poorer health Linear growth function

Slope (change in health over time) Intercept (mean health at baseline)

Individual differences in baseline health Individual differences in health change

Adding time invariant covariates

Interacting gender with time

Adding time varying covariates

Beyond linear change Polynomial trajectories Quadratic or cubic trajectories Piecewise linear trajectories Exponential trajectories

Non linear growth curves

Piecewise growth Research questions Children’s reading before and after summer holiday Grip strength before and after a stroke Change in well-being before and after retirement Household income before and after birth of first child What sort of growth function expected? Jointed piecewise growth with single function Disjointed piecewise growth with single function Disjointed piecewise growth with two functions Jointed piecewise growth with two functions

Specifying time Metrics of time Wave of assessment Chronological age Time before/after an event Individually varying values of time

Age-period-cohort A cohort is defined by their age in a particular period Impossible to separate age, period and cohort effects But any pair of the 3 factors are independent If wave is our metric of time, then usual to control for age at baseline (i.e. cohort) If age is our metric of time, then can control for period (i.e. wave) effects using dummy variables

Accelerated panel designs Respondents of varying age sampled at same time-point then followed for several years Period and age effects not completely confounded as in a birth cohort design Assumption, after controlling for period growth curves for each cohort overlap and form smooth curve

Checking the assumption Stratify sample into a number of cohorts Estimate growth curve model for each strata Plot growth curve

Example of an accelerated panel design Random 20% sample from BHPS Waves All respondents over 16 years Outcome: self-rated health (hlstat) Quadratic growth by age Controlling for period effects (wave)

When are random coefficients not necessary? When number of units at level 2 is small Use dummy variables with OLS and the cluster option When correlation within subjects is small Using OLS and the cluster option will give similar estimates When want to correct for correlation between observations but not interested in variation between and within subjects Use fixed effects or random effects model

Finally…. Random coefficients models can take a very long time to estimate Can speed things up by collapsing data and using frequency weights My personal recommendation is to use MLwiN Excellent online training material Easier to build up model step-by-step