Modelling Longitudinal Data Survival Analysis. Event History. Recurrent Events. A Final Point – and link to Multilevel Models (perhaps).

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

What is Event History Analysis?
Assumptions underlying regression analysis
Multilevel Event History Modelling of Birth Intervals
What is Event History Analysis?
Multilevel Event History Models with Applications to the Analysis of Recurrent Employment Transitions Fiona Steele.
Longitudinal Data Analysis for Social Science Researchers Introduction to Panel Models
Brief introduction on Logistic Regression
Logistic Regression Psy 524 Ainsworth.
Hypothesis Testing Steps in Hypothesis Testing:
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Multilevel survival models A paper presented to celebrate Murray Aitkin’s 70 th birthday Harvey Goldstein ( also 70 ) Centre for Multilevel Modelling University.
Lecture 6 (chapter 5) Revised on 2/22/2008. Parametric Models for Covariance Structure We consider the General Linear Model for correlated data, but assume.
Cox Model With Intermitten and Error-Prone Covariate Observation Yury Gubman PhD thesis in Statistics Supervisors: Prof. David Zucker, Prof. Orly Manor.
Models with Discrete Dependent Variables
Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.
Multiple Linear Regression Model
Lecture 3: Chi-Sqaure, correlation and your dissertation proposal Non-parametric data: the Chi-Square test Statistical correlation and regression: parametric.
Event History Analysis: Introduction Sociology 229 Class 3 Copyright © 2010 by Evan Schofer Do not copy or distribute without permission.
Generalised linear models
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Modeling clustered survival data The different approaches.
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
Classification and Prediction: Regression Analysis
Generalized Linear Models
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Analysis of Complex Survey Data
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.
1 Regression Models with Binary Response Regression: “Regression is a process in which we estimate one variable on the basis of one or more other variables.”
Longitudinal Data Analysis for Social Science Researchers Thinking About Event Histories
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
G Lecture 121 Analysis of Time to Event Survival Analysis Language Example of time to high anxiety Discrete survival analysis through logistic regression.
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
Longitudinal Data: An introduction to some conceptual issues Vernon Gayle.
Sep 2005:LDA - ONS1 Event history data structures and data management Paul Lambert Stirling University Prepared for “Longitudinal Data Analysis for Social.
Longitudinal Data Analysis Professor Vernon Gayle
Factorial Survey Methods: and the use of HLM, HOLIT, HULIT, and HLIT Models R. L. Brown, Ph.D. University of Wisconsin-Madison
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
“Further Modeling Issues in Event History Analysis by Robert E. Wright University of Strathclyde, CEPR-London, IZA-Bonn and Scotecon.
Logistic Regression Database Marketing Instructor: N. Kumar.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Linear correlation and linear regression + summary of tests
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Survival Analysis/Event History Analysis:
‘Interpreting coefficients from longitudinal models’ Professor Vernon Gayle and Dr Paul Lambert (Stirling University) Wednesday 1st April 2009.
A generalized bivariate Bernoulli model with covariate dependence Fan Zhang.
More complex event history analysis. Start of Study End of Study 0 t1 0 = Unemployed; 1 = Working UNEMPLOYMENT AND RETURNING TO WORK STUDY Spell or Episode.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Logistic Regression Analysis Gerrit Rooks
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
1/53: Topic 3.1 – Models for Ordered Choices Microeconometric Modeling William Greene Stern School of Business New York University New York NY USA William.
Modelling Longitudinal Data General Points Single Event histories (survival analysis) Multiple Event histories.
Logistic Regression and Odds Ratios Psych DeShon.
[Part 5] 1/43 Discrete Choice Modeling Ordered Choice Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
STATA WORKSHOP
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Chapter 7. Classification and Prediction
Logistic Regression APKC – STATS AFAC (2016).
Notes on Logistic Regression
Program Evaluation Models
Generalized Linear Models
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Fixed, Random and Mixed effects
Presentation transcript:

Modelling Longitudinal Data Survival Analysis. Event History. Recurrent Events. A Final Point – and link to Multilevel Models (perhaps).

Y i 1 =  ’ X i1 +  i1 Vector of explanatory variables and estimates Independent identifiably distributed error Outcome 1 for individual i

Y i 2 =  ’ X i2 +  i2 Vector of explanatory variables and estimates Independent identifiably distributed error Outcome 2 for individual i THE SAME AGAIN AT TIME 2

Y i 1 =  ’ X i1 +  i1 Y i 2 =  ’ X i2 +  i2 Considered together conventional regression analysis in NOT appropriate

Y i 2 - Y i 1 =  ’( X i2 -X i1 ) + (  i2 -  i1 ) Change in Score Here the  ’ is simply a regression on the difference or change in scores.

As social scientists we are often substantively interested in whether a specific event has occurred.

Survival Data – Time to an event In the medical area… Duration from treatment to death. Time to return of pain after taking a pain killer.

Survival Data – Time to an event Social Sciences… Duration of unemployment. Duration of time on a training scheme. Duration of housing tenure. Duration of marriage. Time to conception.

Consider a binary outcome or two-state event 0 = Event has not occurred 1 = Event has occurred

Start of Study End of Study t1 t2 t3

These durations are a continuous Y so why can’t we use standard regression techniques?

Start of StudyEnd of Study CENSORED OBSERVATIONS 0

Start of StudyEnd of Study 1 B CENSORED OBSERVATIONS A

These durations are a continuous Y so why can’t we use standard regression techniques? What should be the value of Y for person A and person B at the end of our study (when we fit the model)?

Cox Regression is a method for modelling time-to-event data in the presence of censored cases. Explanatory variables in your model (continuous and categorical). Estimated coefficients for each of the covariates. Handles the censored cases correctly.

Start of StudyEnd of Study CENSORED OBSERVATIONS 0 UNEMPLOYMENT AND RETURNING TO WORK STUDY 0 = Unemployed; 1 = Returned to work

Y variable = duration with censored observations X1X1 X3X3 X2X2 A Statistical Model

Y variable = duration with censored observations Previous Occupation Educational Qualifications A Statistical Model Length of Work experience A continuous covariate

More complex event history analysis

Start of Study End of Study 0 t1 t2 t3 0 = Unemployed; 1 = Returned to work 11 UNEMPLOYMENT AND RETURNING TO WORK STUDY 0

Start of Study End of Study 0 t1 0 = Unemployed; 1 = Returned to work UNEMPLOYMENT AND RETURNING TO WORK STUDY Spell or Episode

Start of Study End of Study 0 t1 0 = Unemployed; 1 = Returned to work 1 UNEMPLOYMENT AND RETURNING TO WORK STUDY Transition = movement from one state to another

Recurrent Events Analysis

The structure of many large-scale studies results in survey data being collected at a number of discrete occasions. In this situation, rather than being continuous, time lends itself to be conceptualized as a sequence of discrete events. Furthermore, social scientists are often substantively interested in whether a specific event has occurred. Taken together, these two issues appeal to the adoption of a discrete-time or event history approach.

Recurrent events are merely outcomes that can take place on a number of occasions. A simple example is unemployment measured month by month. In any given month an individual can either be employed or unemployed. If we had data for a calendar year we would have twelve discrete outcome measures (i.e. one for each month).

Social scientists now routinely employ statistical models for the analysis of discrete data, most notably logistic and log- linear models, in a wide variety of substantive areas. I believe that the adoption of a recurrent events approach is appealing because it is a logical extension of these models.

Willet and Singer (1995) conclude that discrete-time methods are generally considered to be simpler and more comprehensible, however, mastery of discrete-time methods facilitates a transition to continuous-time approaches should that be required. Willet, J. and Singer, J. (1995) Investigating Onset, Cessation, Relapse, and Recovery: Using Discrete-Time Survival Analysis to Examine the Occurrence and Timing of Critical Events. In J. Gottman (ed) The Analysis of Change (Hove: Lawrence Erlbaum Associates).

STATISTICAL ANALYSIS FOR BINARY RECURRENT EVENTS (SABRE) Fits appropriate models for recurrent events. It is like GLIM. It can be downloaded free.

Consider a binary outcome or two-state event 0 = Event has not occurred 1 = Event has occurred In the cross-sectional situation we are used to modelling this with logistic regression.

0 = Unemployed; 1 = Returned to work UNEMPLOYMENT AND RETURNING TO WORK STUDY – A study for six months

Months obs Constantly unemployed

Months obs Constantly employed

Months obs Employed in month 1 then unemployed

Months obs Unemployed but gets a job in month six

Months obs obs obs obs Mixed employment patterns

Here we have a binary outcome – so could we simply use logistic regression to model it? Months obs000000

Yes and No!

SABRE fits two models that are appropriate to this analysis. Model 1 = Pooled Cross-Sectional Logit Model

POOLED CROSS-SECTIONAL LOGIT MODEL x it is a vector of explanatory variables and  is a vector of parameter estimates.

POOLED CROSS-SECTIONAL LOGIT MODEL In conventional logistic regression models, where each observation is assumed to be independent, a logistic link function is used, the contribution to the likelihood by the i th case and the t th event is given by the equation above.

This approach can be regarded as a naïve solution to our data analysis problem.

We need to consider a number of issues….

Months Y 1 Y 2 obs00 Pickle’s tip - In repeated measured analysis we would require something like a ‘paired’ t test rather than an ‘independent’ t test because we can assume that Y 1 and Y 2 are related.

SABRE fits two models that are appropriate to this analysis. Model 2 = Random Effects Model (or logistic mixture model)

Repeated measures data violate an important assumption of conventional regression models. The responses of an individual at different points in time will not be independent of each other. This problem has been overcome by the inclusion of an additional, individual-specific error term.

The random effects model extends the pooled cross-sectional model to include a case-specific random error term to account for residual heterogeneity. For a sequence of outcomes for the i th case, the basic random effects model has the integrated (or marginal likelihood) given by the equation.

Davies and Pickles (1985) have demonstrated that the failure to explicitly model the effects of residual heterogeneity may cause severe bias in parameter estimates. Using longitudinal data the effects of omitted explanatory variables can be overtly accounted for within the statistical model. This greatly improves the accuracy of the estimated effects of the explanatory variables

Movers and Stayers When considering data on recurrent events there will be individuals for whom there will be zero (or very low) probabilities of change in outcome from one event to the next. These individuals are termed as ‘stayers’.

Months obs This person is a stayer!

Months obs So is this person.

An awareness of the issue of ‘stayers’ is important for technical reasons. A limitation of a parametric modelling approach is that the tail behaviour of the normal distribution is inconsistent with ‘stayers’ and they will tend to be underestimated (see Spilerman 1972). Spilerman, S. (1972) ‘Extensions of the Mover-Stayer Model’, American Journal of Sociology, 78, pp

Recurrent events may be analysed using other software but SABRE is specifically designed to handle stayers and this feature increases SABRE’s flexibility in representing residual heterogeneity (Barry, Francis, Davies, and Stott 1998). Barry, J., Francis, B., Davies, R.B. and Stott,D. (1998) SABRE Users Guide

Past Behaviour Current Behaviour STATE DEPENDENCE

Unemployed Employed Young People Aged 19 MAY APRIL Different Probabilities of Employment

This is called a MARKOV model A Markov model helps to control for a previous outcome (or behaviour).

ACCOUNTS FOR PREVIOUS OUTCOME ( y t - 1 )

Unemployed Explanatory Variables Employed Explanatory Variables The Model Provides TWO sets of estimates MAY APRIL

This is a ‘two-state’ MARKOV model But we can make it more complicated.

Months Y 1 Y 2 obs00 First Order Markov Model

Months Y 1 Y 2 Y 3 obs000 Second Order Markov Model

FINAL POINT – A THOUGHT!

Months obs obs obs obs Mixed employment patterns

abcde f g Observations Months Individuals Hierarchical or Multilevel Data Structure

Is the recurrent events model simply a multilevel model fitted at the single level? A controversial point! More later…..