Download presentation
Presentation is loading. Please wait.
Published byStanley Jones Modified over 9 years ago
1
‘Interpreting coefficients from longitudinal models’ Professor Vernon Gayle and Dr Paul Lambert (Stirling University) Wednesday 1st April 2009
2
Structure of this Session Briefly Mention Change Score Models Transition (table etc) Repeated Cross-Sectional Data Duration Models Panel Models
3
Y i 2 - Y i 1 = ’( X i2 -X i1 ) + ( i2 - i1 ) Change in Score (first difference model) Here the ’ is simply a regression on the difference or change in scores The panel fixed effects linear model is a special case of the change score model This modelling approach identifies on switcher!
4
Transitions Historically, social mobility tables Large literature on log-linear models Essentially cross-sectional models are fitted Care is required if b is essentially a lagged effect (association between mother & daughter) –In some circumstances this may swamp other effects
5
Repeated Cross-Sectional Surveys UK has a wealth of repeated cross-sectional data –Much of it is comparable Often not considered longitudinal because there are no explicit repeated contacts However, very useful for trend over time analyses Cross-sectional models are employed –Be careful of the interpretation of and the int of time –Time is often survey year, but can be cohort (e.g. YCS)
6
Duration Models Modelling time to an event taking place Duration is the outcome
7
Simple approach accelerated life model Log e t i = x 1i +e i This is a regression model is the effect on the log duration When there are no (or a small number) of right censored cases this approach is suitable – it may be questioned by referees however! This model is a little old fashioned, but often results are very similar to hazard models (although in practice betas should be carefully compared to hazard models
8
Duration Models Duration models Survival models Cox regression Failure time analysis Event history models Hazard models Cox, D.R. (1972) ‘Regression models and life tables’ JRSS,B, 34 pp.187-220. These are all the same thing – depending on your substantive discipline
9
Hazard Models Model time to an event They do no model duration – they model the ‘Harzard’ Hazard: measure of the probability that an event occurs at time t conditional on it not having occurred before t These models appropriately control for right- censored data
10
Hazard Models Hazard models are similar to logit models is estimated on the logit scale estimates the increase/decrease in the speed at which individuals (in the group) leave the risk set is about speed and not rate (as is commonly suggested)
11
Alternative Types of Event History Analysis Describing sequences / trajectories: characterise progression through states into clusters / sequences / frameworks Growing recent social science interest sequence analysis – Often analyse cluster membership as categorical factor A problem – neutrality of data, e.g. cluster 1= Men in full time employment
12
Panel Models
13
1 2 3 4 1 2 3 4 1 2 3 4 5 1 2 3 4 5 1 2 Individuals Orthodox Panel Data Structure 1234512345 Observations (t)
14
Panel Regression Approach xt suite in Stata can usually be interpreted relatively easily Similarity to in the multilevel modelling framework
15
Standard Linear Model Slopes and Intercepts Constant slopes Constant intercept 0 is a constant intercept 1 is a constant slope
16
Possible Slopes and Intercepts Constant slopes Varying intercepts Varying slopes Varying intercepts The fixed effects model Separate regression for each individual 0j is not a constant intercept 1 is a constant slope 0j is not a constant intercept 1j is not a constant slope
17
Regression Approach Fixed or Random effects estimators Fierce debate –F.E. will tend be consistent –R.E. standard errors will be efficient but may not be consistent –R.E. assumes no correlation between observed X variables and unobserved characteristics
18
xt Regression Approach Fixed or Random effects –Economists tend towards F.E. (attractive property of consistent ) –With continuous Y – little problem, fit both F.E. and R.E. models and then Hausman test f.e. / r.e. (don’t be surprised if it points towards F.E. model) ( Steve Pudney’s suggestion)
19
xt Regression Approach Fixed or Random effects estimators Preference for Random Effects (RE) models in some areas (e.g. education studies) Frequent criticism – A key assumption in RE models is than random effects are uncorrelated with the observed variables in the model In practice this assumption goes untested and could potentially result in biased estimates (see Halaby 2004 Ann. Rev. Sociology 30)
20
Which approaches in practice? Some more general thoughts – banana skins –flies in the ointment
24
The Hausman test is very sensitive and will usually lead to a preference for the FE model Substantively the RE may be better, the FE is more appropriate in relation to growth or individual level change
25
Fixed or Random Effect Estimators? In our view R.E. is most appropriate when there are substantively important fixed in time X variables (which are not correlated with unobserved effects) F.E. can be especially misleading for variables that change little in time (e.g. trade union members) because they are “identified by changers” This may be compounded by measurement errors
26
A further thought about fixed effects models….
27
The Panel Model Earnings (y) Time changing x vars Unobserved ability The F.E. panel model estimator is theoretically attractive in this situation F.E. is commonly used in economics, as the effect of education level is correlated with ability Remember that this rests on the (potentially strong) assumption that ability is fixed in time Education level (x) fixed in time
28
The Panel Model Earnings (y) Time changing x vars Unobserved ability R.E. is commonly used in multilevel modelling, but the effect of education level may be correlated with ability Remember that this rests on the (potentially strong) assumption that ability is fixed in time Education level (x) fixed in time Correlation
29
The Panel Model Explanatory variable Unobserved Fixed Effects - econometrician Stephen Pudney makes this point The standard theoretical position (two slides back) is questionable if there is two-way causality
30
Population Ave Model (Marginal Models) Is a model that accounts for clustering between individuals all we need?logit y x1, cluster(id) Becoming more popular (Pickles –preference in USA in public health) Do we need ‘subject’ specific random/fixed effect? (is ‘frailty’ or unobserved heterogeneity important) Time constant X variables might be analytically important Marginal Modelling (GEE approaches) may be all we need (e.g. estimating a policy or ‘social group’ difference)
31
Some further thoughts on comparing estimates between models……
32
Binary Outcome Panel Models: An example Married women’s employment (SCELI Data) y is the woman working yes=1; no=0 x woman has child aged under 1 year I have contrived this illustration….
33
Probit s.e. Child under 1 -1.950.56-1.950.40 Constant 0.670.140.670.10 Log likelihood -54.70-109.39 n 101.00202.00 Pseudo R 2 0.13 Clusters -- Consistent smaller standard errors (double the sample size) but Stata thinks that there are 202 individuals and not 101 people surveyed in two waves!
34
Probit s.e. Robust Child under 1 -1.950.56-1.950.40-1.950.56 Constant 0.670.140.670.100.670.14 Log likelihood -54.70-109.39 n 101.00202.00 Pseudo R 2 0.13 Clusters --101.00 Consistent - standard errors are now corrected – Stata knows that there are 101 individuals (i.e. repeated measures)
35
Probit R.E. Probit s.e. Robust s.e. Child under 1 -1.950.56-1.950.40-1.950.56-19.411.22 Constant 0.670.140.670.100.670.146.390.28 Log likelihood -54.70-109.39 -49.57 n 101.00202.00 Pseudo R 2 0.13 Clusters --101.00 Beware and standard errors are no longer measured on the same scale Stata knows that there are 101 individuals (i.e. repeated measures)
36
in Binary Panel Models The in a probit random effects model is scaled differently– Mark Stewart suggests r.e. * ( 1-rho) compared with pooled probit rho (is analogous to an icc) – proportion of the total variance contributed by the person level variance Panel logit models also have this issue!
37
in Binary Panel Models Conceptually two types of in a binary random effects model X is time changing - is the ‘effect’ for a woman of changing her value of X X is fixed in time - is analogous to the effect for two women (e.g. Chinese / Indian) with the same value of the random effect (e.g. u i =0) – For fixed in time X Fiona Steele suggests simulating to get more appropriate value of
38
Population Ave Model / Marginal Models Motivation for thinking about these approaches: –Not really been adopted in British Sociology Population average models/Marginal Modelling/GEE approaches are developing rapidly. They might be useful for estimating a policy or ‘social group’ differences Population average models are becoming more popular (Pickles – preference in USA in public health) Is a model that accounts for clustering between individual observations adequate? Simple pop. average model: regress y x1, cluster(id)
39
Conclusion Clustering is sometimes part of the substantive story –e.g. orthodox hierarchical (or multi-level) situation, pupils nested in schools Explicitly modelling hierarchical structure may be desirable –Ironically, in some instances even with ‘highly’ clustered data we would tell a similar story which ever model we used (strength of coefficient, signs & significance)
40
Conclusion Population average models/Marginal Modelling/GEE might be useful for estimating a policy or ‘social group’ differences –Is the ‘average’ effect for a group the substantively more interesting or more important for informing policy or practice
41
Conclusion Some estimators (xtprobit) don’t have F.E. equivalents (xtlogit F.E. is not equivalent to R.E.) Here population average approaches might be attractive since a key assumption in RE models is than random effects are uncorrelated with the observed variables in the model and this can’t be formally tested
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.