Longitudinal Data Analysis for Social Science Researchers Thinking About Event Histories www.longitudinal.stir.ac.uk.

Slides:



Advertisements
Similar presentations
What is Event History Analysis?
Advertisements

Multilevel Event History Modelling of Birth Intervals
What is Event History Analysis?
Survival Analysis. Key variable = time until some event time from treatment to death time for a fracture to heal time from surgery to relapse.
Longitudinal Data Analysis for Social Science Researchers Introduction to Panel Models
Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.
SC968: Panel Data Methods for Sociologists Random coefficients models.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
SC968: Panel Data Methods for Sociologists
Models with Discrete Dependent Variables
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
Duration models Bill Evans 1. timet0t0 t2t2 t 0 initial period t 2 followup period a b c d e f h g i Flow sample.
Biostatistics in Research Practice Time to event data Martin Bland Professor of Health Statistics University of York
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
Event History Models Sociology 229: Advanced Regression Class 5
Missing Data.. What do we mean by missing data? Missing observations which were intended to be collected but: –Never collected –Lost accidently –Wrongly.
Event History Models 2 Sociology 229A: Event History Analysis Class 4 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
BINARY CHOICE MODELS: LOGIT ANALYSIS
Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1
Analysis of Complex Survey Data
Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Survival Data John Kornak March 29, 2011
G Lecture 121 Analysis of Time to Event Survival Analysis Language Example of time to high anxiety Discrete survival analysis through logistic regression.
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Logit model, logistic regression, and log-linear model A comparison.
Sep 2005:LDA - ONS1 Event history data structures and data management Paul Lambert Stirling University Prepared for “Longitudinal Data Analysis for Social.
Assessing Survival: Cox Proportional Hazards Model
EHA: More On Plots and Interpreting Hazards Sociology 229A: Event History Analysis Class 9 Copyright © 2008 by Evan Schofer Do not copy or distribute without.
Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Longitudinal Data Analysis Professor Vernon Gayle
“Further Modeling Issues in Event History Analysis by Robert E. Wright University of Strathclyde, CEPR-London, IZA-Bonn and Scotecon.
Modelling Longitudinal Data Survival Analysis. Event History. Recurrent Events. A Final Point – and link to Multilevel Models (perhaps).
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Longitudinal Data Analysis for Social Science Researchers Re-introducing Analysis Methods
Lecture 12: Cox Proportional Hazards Model
‘Interpreting coefficients from longitudinal models’ Professor Vernon Gayle and Dr Paul Lambert (Stirling University) Wednesday 1st April 2009.
More complex event history analysis. Start of Study End of Study 0 t1 0 = Unemployed; 1 = Working UNEMPLOYMENT AND RETURNING TO WORK STUDY Spell or Episode.
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
Exact Logistic Regression
1 Ordinal Models. 2 Estimating gender-specific LLCA with repeated ordinal data Examining the effect of time invariant covariates on class membership The.
: LSS1 Longitudinal Studies Seminars: Longitudinal Analyses Using STATA Stirling University, Data and Variable Management Paul Lambert.
Modelling Longitudinal Data General Points Single Event histories (survival analysis) Multiple Event histories.
Nonparametric Statistics
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
STATA WORKSHOP
Additional Regression techniques Scott Harris October 2009.
Nonparametric Statistics
Logistic Regression APKC – STATS AFAC (2016).
April 18 Intro to survival analysis Le 11.1 – 11.2
Event History Analysis 3
Multiple logistic regression
Nonparametric Statistics
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Biost 513 Discussion Section Week 9
Problems with infinite solutions in logistic regression
Presentation transcript:

Longitudinal Data Analysis for Social Science Researchers Thinking About Event Histories

DURATIONS In simple form – “Time to an EVENT”

Survival Data – Time to an event In the medical area… Time from diagnosis to death Duration from treatment to full health Time to return of pain after taking a pain killer

Survival Data – Time to an event Social Sciences… Duration of unemployment Duration of housing tenure Duration of marriage Time to conception

Start of Study End of Study t1 t2 t3 A B C

These durations are a continuous Y so why can’t we use standard regression techniques?

Two examples of when we can

Start of Study t1 t2 t3 t Birth Cohort Study Research Project 2060 (1 st August 2032 VG retires!) 1=Death A C B

Breast Feeding Study – Data Collection Strategy 1. Retrospective questioning of mothers 2. Data collected by Midwives 3. Health Visitor and G.P. Record

Birth 1995 Start of Study Breast Feeding Study – Age

Birth 1995 Start of Study Breast Feeding Study – Age Time

These durations are a continuous Y so why can’t we use standard regression techniques? We can. It might be better to model the log of Y however. These models are sometimes known as ‘accelerated life models’.

These durations are a continuous Y so why can’t we use standard regression techniques? Here we have censored observations. An old guideline used to be that if less than 10% of observations were censored then a standard regression approach was okay. However, you’d have trouble getting this past a good referee and there is now no excuse given that techniques are widely understood and suitable software is available.

Accelerated Life Model Log e t i =     x 1i +e i constant explanatory variable error term Beware this is log t

Start of StudyEnd of Study C1 D E B CENSORED OBSERVATIONS A CENSORING

Cox Regression (proportional hazard model) is a method for modelling time-to-event data in the presence of censored cases Explanatory variables in your model (continuous and categorical) Estimated coefficients for each of the covariates Handles the censored cases correctly

Cox, D.R. (1972) ‘Regression models and life tables’ JRSS,B, 34 pp

Event History with Cox Model No longer modelling the duration Modelling the Hazard Hazard: measure of the probability that an event occurs at time t conditional on it not having occurred before t A more technical account follows later on!

An Example Duration of first job after leaving education Data from the BHPS 15,401 individual records 11,061 (72%) failures i.e. job spell ended 4,340 (28%) censored – no information on exact end of first job (e.g. still in job) Time in months Mean 78; s.d.102; min 1; max 793 (66 years)

My interests… Gender – males 7,992 (52%) – females 7,409 (48%) School system – Compulsory school age 14 2,244 (15%) – Compulsory school age 155,034 (33%) – Compulsory school age 168,123 (53%)

What is the data structure? pidstart time end time durationgendercohortcensored The row is a person The tricky part is often calculating the duration Remember we need an indicator for censored cases

Histogram of duration

Descriptive Analysis Durations Months Gender MeanMedian – males – females 6639 School system – Compulsory school age – Compulsory school age – Compulsory school age

Kaplan-Meir Survival Plot

A Cox’s Regression – (Hazard Model) This is in the st suite in STATA You must tell STATA what is the duration variable and what is the censor variable stset duration, failure(status)

A Cox’s Regression – (Hazard Model) Think of this as being similar to a logit model. One way of fixing this conceptually is that there is a binary response but we are interested in the time to this outcome.

STATA Output Cox regression -- Breslow method for ties No. of subjects = Number of obs = No. of failures = Time at risk = LR chi2(3) = Log likelihood = Prob > chi2 = _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] fem | cohort1 | cohort2 |

STATA Output No. of subjects = Number of obs = No. of failures = Time at risk = LR chi2(3) = Log likelihood =

STATA Output _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] fem | cohort1 | cohort2 |

STATA Output _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] fem | cohort1 | cohort2 | stcox, nohr These are the Coefficients rather than Haz Ratios (i.e. anti ln (.36)=1.44) _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] fem | cohort1 | cohort2 |

Checking and Testing the Proportional Hazard Assumption Key assumption is that the hazard ratio is proportional over time. More on this in Prof Wright’s talk. See also (see ST (manual) p for a discussion). A simple visualisation might help.

Checking and Testing the Proportional Hazard Assumption

Checking and Testing the Proportional Hazard Assumption A simple example with one X var Time: Rank(t) | rho chi2 df Prob>chi fem | global test | Ho: Fem=0 is proportional to Fem=1 Reject this null hypothesis. See [ST] pg. 142 and more in Professor Wright’s talk!

Treatment of tied durations If time could be measured on a true continuous scale then no observations would be tied. In reality because of the resolution (or scale) that we measure time on there may be tied observations. The basic problem is this affects the size of the risk set - we don’t know who left first. There are various methods for handling this the default is Breslow; this is okay if there are not too many ties (see ST (manual) p.118 for a discussion of options).

Event history data permutations Single state single episode –e.g. duration in first post-school job till end –analogous to a logit framework Single episode competing risks –e.g. duration in job until –promotion / retire / unemp –Analogous to a multinomial logit framework

Event history analysis software SPSS – very limited analysis options STATA – wide range of pre-prepared methods SAS – as STATA S-Plus/R – vast capacity but non-introductory TDA – simple but powerful freeware MLwiN; lEM; {others} – small packages targeted at specific analysis situations [GLIM / SABRE – some unique options]

Discrete Time “We believe that discrete-time methods are simply more appropriate for much of the event-history data that are currently collected because, for logistical and financial reasons, observations are often made in discrete time” (Willet and Singer 1995).

Discrete Time In a discrete time model the dependent variable is a binary indicator Therefore it can be fitted in standard software

Discrete Time We observe this woman until she experiences the event (marriage) pidstart ageend age She need a row for each year – Sometimes this is called person-period format

Data Structure PidYAge

Discrete Time Discrete time approaches are often appropriate when analysing social data collected at ‘discrete’ intervals Being able to fit standard regression models is an obvious attraction

Another event history data permutation Another more complex situation is analysing Multi-state multi-episode –e.g. adult working life histories Paul will show an example of the state-space in his talk

Social Science Event Histories: Comment: Potentially powerful techniques – however in practice they are often trickier to operationalise with ‘real’ social science data. Neat examples are often used in textbooks! In particular: –Many research applications have concentrated on quite simplistic state spaces (e.g. working V not in work) –Incorporating many explanatory factors can be difficult – time constant V time-varying; and duration data V panel data.

Social Science Event Histories: In particular: –Many research applications have concentrated on quite simplistic state spaces (e.g. working V not in work) –Incorporating many explanatory factors can be difficult – time constant V time-varying

Social Science Event Histories: Is an event history analysis really what we need? Are we really interested in the ‘time’ to an event? Often a panel modelling approach may be more appropriate given our substantive interest