‘Interpreting coefficients from longitudinal models’ Professor Vernon Gayle and Dr Paul Lambert (Stirling University) Wednesday 1st April 2009.

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Multilevel modelling short course
What is Event History Analysis?
Multilevel Event History Modelling of Birth Intervals
What is Event History Analysis?
Multilevel Event History Models with Applications to the Analysis of Recurrent Employment Transitions Fiona Steele.
Economics 20 - Prof. Anderson1 Panel Data Methods y it = x it k x itk + u it.
Longitudinal Data Analysis for Social Science Researchers Introduction to Panel Models
Lecture 11 (Chapter 9).
By Zach Andersen Jon Durrant Jayson Talakai
Cross Sectional Designs
Methods of Economic Investigation Lecture 2
Logistic Regression Psy 524 Ainsworth.
The choice between fixed and random effects models: some considerations for educational research Claire Crawford with Paul Clarke, Fiona Steele & Anna.
Random Assignment Experiments
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Longitudinal and Multilevel Methods for Models with Discrete Outcomes with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David.
Random effects as latent variables: SEM for repeated measures data Dr Patrick Sturgis University of Surrey.
Lecture 8 (Ch14) Advanced Panel Data Method
Nonresponse bias in studies of residential mobility Elizabeth Washbrook, Paul Clarke and Fiona Steele University of Bristol Research Methods Festival,
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Improving health worldwide George B. Ploubidis The role of sensitivity analysis in the estimation of causal pathways from observational.
Lecture 4 (Chapter 4). Linear Models for Correlated Data We aim to develop a general linear model framework for longitudinal data, in which the inference.
Random effects estimation RANDOM EFFECTS REGRESSIONS When the observed variables of interest are constant for each individual, a fixed effects regression.
EPUNet Conference – BCN 06 “The causal effect of socioeconomic characteristics in health limitations across Europe: a longitudinal analysis using the European.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.

QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
Clustered or Multilevel Data
An Introduction to Logistic Regression
Analysis of Clustered and Longitudinal Data
Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.
‘Interpreting results from statistical modelling – a seminar for social scientists’ Dr Vernon Gayle and Dr Paul Lambert (Stirling University) Tuesday 29th.
Hypothesis Testing in Linear Regression Analysis
Longitudinal Data Analysis for Social Science Researchers Thinking About Event Histories
Longitudinal Data: An introduction to some conceptual issues Vernon Gayle.
Using the Margins Command to Estimate and Interpret Adjusted Predictions and Marginal Effects Richard Williams
Error Component Models Methods of Economic Investigation Lecture 8 1.
Sep 2005:LDA - ONS1 Event history data structures and data management Paul Lambert Stirling University Prepared for “Longitudinal Data Analysis for Social.
Assessing Survival: Cox Proportional Hazards Model
Introduction Multilevel Analysis
Longitudinal Data Analysis Professor Vernon Gayle
HSRP 734: Advanced Statistical Methods June 19, 2008.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
Funded through the ESRC’s Researcher Development Initiative Prof. Herb MarshMs. Alison O’MaraDr. Lars-Erik Malmberg Department of Education, University.
“Further Modeling Issues in Event History Analysis by Robert E. Wright University of Strathclyde, CEPR-London, IZA-Bonn and Scotecon.
Introduction to Multilevel Modeling Stephen R. Porter Associate Professor Dept. of Educational Leadership and Policy Studies Iowa State University Lagomarcino.
HAOMING LIU JINLI ZENG KENAN ERTUNC GENETIC ABILITY AND INTERGENERATIONAL EARNINGS MOBILITY 1.
Modelling Longitudinal Data Survival Analysis. Event History. Recurrent Events. A Final Point – and link to Multilevel Models (perhaps).
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Application 3: Estimating the Effect of Education on Earnings Methods of Economic Investigation Lecture 9 1.
The Choice Between Fixed and Random Effects Models: Some Considerations For Educational Research Clarke, Crawford, Steele and Vignoles and funding from.
Pro gradu –thesis Tuija Hevonkorpi.  Basic of survival analysis  Weibull model  Frailty models  Accelerated failure time model  Case study.
Scottish Social Survey Network: Master Class 1 Data Analysis with Stata Dr Vernon Gayle and Dr Paul Lambert 23 rd January 2008, University of Stirling.
More complex event history analysis. Start of Study End of Study 0 t1 0 = Unemployed; 1 = Working UNEMPLOYMENT AND RETURNING TO WORK STUDY Spell or Episode.
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
1/53: Topic 3.1 – Models for Ordered Choices Microeconometric Modeling William Greene Stern School of Business New York University New York NY USA William.
Modelling Longitudinal Data General Points Single Event histories (survival analysis) Multiple Event histories.
[Part 5] 1/43 Discrete Choice Modeling Ordered Choice Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
STATA WORKSHOP
Multivariate Statistics Latent Growth Curve Modelling. Random effects as latent variables: SEM for repeated measures data Dr Patrick Sturgis University.
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Microeconometric Modeling
Chapter 15 Panel Data Models.
EHS Lecture 14: Linear and logistic regression, task-based assessment
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Presentation transcript:

‘Interpreting coefficients from longitudinal models’ Professor Vernon Gayle and Dr Paul Lambert (Stirling University) Wednesday 1st April 2009

Structure of this Session Briefly Mention Change Score Models Transition (table etc) Repeated Cross-Sectional Data Duration Models Panel Models

Y i 2 - Y i 1 =  ’( X i2 -X i1 ) + (  i2 -  i1 ) Change in Score (first difference model) Here the  ’ is simply a regression on the difference or change in scores The panel fixed effects linear model is a special case of the change score model This modelling approach identifies on switcher!

Transitions Historically, social mobility tables Large literature on log-linear models Essentially cross-sectional models are fitted Care is required if b is essentially a lagged effect (association between mother & daughter) –In some circumstances this may swamp other effects

Repeated Cross-Sectional Surveys UK has a wealth of repeated cross-sectional data –Much of it is comparable Often not considered longitudinal because there are no explicit repeated contacts However, very useful for trend over time analyses Cross-sectional models are employed –Be careful of the interpretation of  and the int of  time –Time is often survey year, but can be cohort (e.g. YCS)

Duration Models Modelling time to an event taking place Duration is the outcome

Simple approach accelerated life model Log e t i =     x 1i +e i This is a regression model  is the effect on the log duration When there are no (or a small number) of right censored cases this approach is suitable – it may be questioned by referees however! This model is a little old fashioned, but often results are very similar to hazard models (although in practice betas should be carefully compared to hazard models

Duration Models Duration models Survival models Cox regression Failure time analysis Event history models Hazard models Cox, D.R. (1972) ‘Regression models and life tables’ JRSS,B, 34 pp These are all the same thing – depending on your substantive discipline

Hazard Models Model time to an event They do no model duration – they model the ‘Harzard’ Hazard: measure of the probability that an event occurs at time t conditional on it not having occurred before t These models appropriately control for right- censored data

Hazard Models Hazard models are similar to logit models  is estimated on the logit scale  estimates the increase/decrease in the speed at which individuals (in the group) leave the risk set  is about speed and not rate (as is commonly suggested)

Alternative Types of Event History Analysis Describing sequences / trajectories: characterise progression through states into clusters / sequences / frameworks Growing recent social science interest sequence analysis – Often analyse cluster membership as categorical factor A problem – neutrality of data, e.g. cluster 1= Men in full time employment

Panel Models

Individuals Orthodox Panel Data Structure Observations (t)

Panel Regression Approach xt suite in Stata  can usually be interpreted relatively easily Similarity to  in the multilevel modelling framework

Standard Linear Model Slopes and Intercepts Constant slopes Constant intercept  0 is a constant intercept  1 is a constant slope

Possible Slopes and Intercepts Constant slopes Varying intercepts Varying slopes Varying intercepts The fixed effects model Separate regression for each individual  0j is not a constant intercept  1 is a constant slope  0j is not a constant intercept  1j is not a constant slope

Regression Approach Fixed or Random effects estimators Fierce debate –F.E.  will tend be consistent –R.E. standard errors will be efficient but  may not be consistent –R.E. assumes no correlation between observed X variables and unobserved characteristics

xt Regression Approach Fixed or Random effects –Economists tend towards F.E. (attractive property of consistent  ) –With continuous Y – little problem, fit both F.E. and R.E. models and then Hausman test  f.e. /  r.e. (don’t be surprised if it points towards F.E. model) ( Steve Pudney’s suggestion)

xt Regression Approach Fixed or Random effects estimators Preference for Random Effects (RE) models in some areas (e.g. education studies) Frequent criticism – A key assumption in RE models is than random effects are uncorrelated with the observed variables in the model In practice this assumption goes untested and could potentially result in biased estimates (see Halaby 2004 Ann. Rev. Sociology 30)

Which approaches in practice? Some more general thoughts – banana skins –flies in the ointment

The Hausman test is very sensitive and will usually lead to a preference for the FE model Substantively the RE may be better, the FE is more appropriate in relation to growth or individual level change

Fixed or Random Effect Estimators? In our view R.E. is most appropriate when there are substantively important fixed in time X variables (which are not correlated with unobserved effects) F.E. can be especially misleading for variables that change little in time (e.g. trade union members) because they are “identified by changers” This may be compounded by measurement errors

A further thought about fixed effects models….

The Panel Model Earnings (y) Time changing x vars Unobserved ability The F.E. panel model estimator is theoretically attractive in this situation F.E. is commonly used in economics, as the effect of education level is correlated with ability Remember that this rests on the (potentially strong) assumption that ability is fixed in time Education level (x) fixed in time

The Panel Model Earnings (y) Time changing x vars Unobserved ability R.E. is commonly used in multilevel modelling, but the effect of education level may be correlated with ability Remember that this rests on the (potentially strong) assumption that ability is fixed in time Education level (x) fixed in time Correlation

The Panel Model Explanatory variable Unobserved Fixed Effects - econometrician Stephen Pudney makes this point The standard theoretical position (two slides back) is questionable if there is two-way causality

Population Ave Model (Marginal Models) Is a model that accounts for clustering between individuals all we need?logit y x1, cluster(id) Becoming more popular (Pickles –preference in USA in public health) Do we need ‘subject’ specific random/fixed effect? (is ‘frailty’ or unobserved heterogeneity important) Time constant X variables might be analytically important Marginal Modelling (GEE approaches) may be all we need (e.g. estimating a policy or ‘social group’ difference)

Some further thoughts on comparing estimates between models……

Binary Outcome Panel Models: An example Married women’s employment (SCELI Data) y is the woman working yes=1; no=0 x woman has child aged under 1 year I have contrived this illustration….

Probit  s.e.  Child under Constant Log likelihood n Pseudo R Clusters -- Consistent  smaller standard errors (double the sample size) but Stata thinks that there are 202 individuals and not 101 people surveyed in two waves!

Probit  s.e.   Robust Child under Constant Log likelihood n Pseudo R Clusters Consistent  - standard errors are now corrected – Stata knows that there are 101 individuals (i.e. repeated measures)

Probit R.E. Probit  s.e.   Robust  s.e. Child under Constant Log likelihood n Pseudo R Clusters Beware  and standard errors are no longer measured on the same scale Stata knows that there are 101 individuals (i.e. repeated measures)

 in Binary Panel Models The  in a probit random effects model is scaled differently– Mark Stewart suggests  r.e. * (  1-rho) compared with  pooled probit rho (is analogous to an icc) – proportion of the total variance contributed by the person level variance Panel logit models also have this issue!

 in Binary Panel Models Conceptually two types of  in a binary random effects model X is time changing -  is the ‘effect’ for a woman of changing her value of X X is fixed in time -  is analogous to the effect for two women (e.g. Chinese / Indian) with the same value of the random effect (e.g. u i =0) – For fixed in time X Fiona Steele suggests simulating to get more appropriate value of 

Population Ave Model / Marginal Models Motivation for thinking about these approaches: –Not really been adopted in British Sociology Population average models/Marginal Modelling/GEE approaches are developing rapidly. They might be useful for estimating a policy or ‘social group’ differences Population average models are becoming more popular (Pickles – preference in USA in public health) Is a model that accounts for clustering between individual observations adequate? Simple pop. average model: regress y x1, cluster(id)

Conclusion Clustering is sometimes part of the substantive story –e.g. orthodox hierarchical (or multi-level) situation, pupils nested in schools Explicitly modelling hierarchical structure may be desirable –Ironically, in some instances even with ‘highly’ clustered data we would tell a similar story which ever model we used (strength of coefficient, signs & significance)

Conclusion Population average models/Marginal Modelling/GEE might be useful for estimating a policy or ‘social group’ differences –Is the ‘average’ effect for a group the substantively more interesting or more important for informing policy or practice

Conclusion Some estimators (xtprobit) don’t have F.E. equivalents (xtlogit F.E. is not equivalent to R.E.) Here population average approaches might be attractive since a key assumption in RE models is than random effects are uncorrelated with the observed variables in the model and this can’t be formally tested