Modelling Longitudinal Data General Points Single Event histories (survival analysis) Multiple Event histories.

Slides:



Advertisements
Similar presentations
The analysis of survival data in nephrology. Basic concepts and methods of Cox regression Paul C. van Dijk 1-2, Kitty J. Jager 1, Aeilko H. Zwinderman.
Advertisements

What is Event History Analysis?
Multilevel Event History Modelling of Birth Intervals
What is Event History Analysis?
Survival Analysis. Key variable = time until some event time from treatment to death time for a fracture to heal time from surgery to relapse.
If we use a logistic model, we do not have the problem of suggesting risks greater than 1 or less than 0 for some values of X: E[1{outcome = 1} ] = exp(a+bX)/
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Simple Logistic Regression
Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.
SC968: Panel Data Methods for Sociologists
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Introduction to Survival Analysis
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
Introduction to Survival Analysis Seminar in Statistics 1 Presented by: Stefan Bauer, Stephan Hemri
Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data.
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Analysis of Complex Survey Data
Lecture 16 Duration analysis: Survivor and hazard function estimation
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
17. Duration Modeling. Modeling Duration Time until retirement Time until business failure Time until exercise of a warranty Length of an unemployment.
Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.
1 Regression Models with Binary Response Regression: “Regression is a process in which we estimate one variable on the basis of one or more other variables.”
Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic.
Longitudinal Data Analysis for Social Science Researchers Thinking About Event Histories
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
G Lecture 121 Analysis of Time to Event Survival Analysis Language Example of time to high anxiety Discrete survival analysis through logistic regression.
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
Assessing Survival: Cox Proportional Hazards Model
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
“Further Modeling Issues in Event History Analysis by Robert E. Wright University of Strathclyde, CEPR-London, IZA-Bonn and Scotecon.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Modelling Longitudinal Data Survival Analysis. Event History. Recurrent Events. A Final Point – and link to Multilevel Models (perhaps).
Linear correlation and linear regression + summary of tests
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Three Statistical Issues (1) Observational Study (2) Multiple Comparisons (3) Censoring Definitions.
Lecture 12: Cox Proportional Hazards Model
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Survival Analysis/Event History Analysis:
1 Lecture 6: Descriptive follow-up studies Natural history of disease and prognosis Survival analysis: Kaplan-Meier survival curves Cox proportional hazards.
‘Interpreting coefficients from longitudinal models’ Professor Vernon Gayle and Dr Paul Lambert (Stirling University) Wednesday 1st April 2009.
Warsaw Summer School 2015, OSU Study Abroad Program Advanced Topics: Interaction Logistic Regression.
Multiple Logistic Regression STAT E-150 Statistical Methods.
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
Satistics 2621 Statistics 262: Intermediate Biostatistics Jonathan Taylor and Kristin Cobb April 20, 2004: Introduction to Survival Analysis.
Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the.
Logistic Regression Analysis Gerrit Rooks
STATA WORKSHOP
Additional Regression techniques Scott Harris October 2009.
Multi-state piecewise exponential model of hospital outcomes after injury DE Clark, LM Ryan, FL Lucas APHA 2007.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
SURVIVAL ANALYSIS PRESENTED BY: DR SANJAYA KUMAR SAHOO PGT,AIIH&PH,KOLKATA.
DURATION ANALYSIS Eva Hromádková, Applied Econometrics JEM007, IES Lecture 9.
[Topic 11-Duration Models] 1/ Duration Modeling.
Program Evaluation Models
Introduction to logistic regression a.k.a. Varbrul
Statistics 103 Monday, July 10, 2017.
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Introduction to Logistic Regression
Presentation transcript:

Modelling Longitudinal Data General Points Single Event histories (survival analysis) Multiple Event histories

Motivation Attempt to go beyond more simple material in the first workshop. Begin to develop an appreciation of the notation associated with these techniques. Gain a little “hands-on” experience.

Statistical Modelling Framework Generalized Linear Models An interest in generalized linear models is richly rewarded. Not only does it bring together a wealth of interesting theoretical problems but it also encourages an ease of data analysis sadly lacking from traditional statistics….an added bonus of the glm approach is the insight provided by embedding a problem in a wider context. This in itself encourages a more critical approach to data analysis. Gilchrist, R. (1985) ‘Introduction: GLIM and Generalized Linear Models’, Springer Verlag Lecture Notes in Statistics, 32, pp.1-5.

Statistical Modelling Know your data. Start and be guided by ‘substantive theory’. Start with simple techniques (these might suffice). Remember John Tukey! Practice.

Willet and Singer (1995) conclude that discrete-time methods are generally considered to be simpler and more comprehensible, however, mastery of discrete-time methods facilitates a transition to continuous-time approaches should that be required. Willet, J. and Singer, J. (1995) Investigating Onset, Cessation, Relapse, and Recovery: Using Discrete-Time Survival Analysis to Examine the Occurrence and Timing of Critical Events. In J. Gottman (ed) The Analysis of Change (Hove: Lawrence Erlbaum Associates).

As social scientists we are often substantively interested in whether a specific event has occurred.

Survival Data – Time to an event In the medical area… Time from diagnosis to death. Duration from treatment to full health. Time to return of pain after taking a pain killer.

Survival Data – Time to an event Social Sciences… Duration of unemployment. Duration of housing tenure. Duration of marriage. Time to conception. Time to orgasm.

Consider a binary outcome or two-state event 0 = Event has not occurred 1 = Event has occurred

Start of Study End of Study t1 t2 t3 A B C

These durations are a continuous Y so why can’t we use standard regression techniques?

We can. It might be better to model the log of Y however. These models are sometimes known as ‘accelerated life models’.

Start of Study t1 t2 t3 t Birth Cohort Study Research Project 2060 (1 st August 2032 VG retires!) 1=Death A C B

Breast Feeding Study – Data Collection Strategy 1. Retrospective questioning of mothers 2. Data collected by Midwives 3. Health Visitor and G.P. Record

Birth 1995 Start of Study Breast Feeding Study – Age

Birth 1995 Start of Study t1 t2 t3 Breast Feeding Study – Age

Accelerated Life Model Log e t i =     x 1i +e i

Accelerated Life Model Log e t i =     x 1i +e i constant explanatory variable error term Beware this is log t

At this point something should dawn on you – like fish scales falling from your eyes – like pennies from Heaven.

Think about the l.h.s. Y i - Standard liner model Log e (odds) Y i -Standard logistic model Log e t i -Accelerated life model We can think of these as a single ‘class’ of models and (with a little care) can interpret them in a similar fashion (as Ian Diamond of the ESRC would say “this is phenomenally groovy”).     x 1i +e i is the r.h.s.

Start of StudyEnd of Study CENSORED OBSERVATIONS 0

Start of StudyEnd of Study 1 B CENSORED OBSERVATIONS A

These durations are a continuous Y so why can’t we use standard regression techniques? What should be the value of Y for person A and person B at the end of our study (when we fit the model)?

Cox Regression (proportional hazard model) is a method for modelling time-to-event data in the presence of censored cases. Explanatory variables in your model (continuous and categorical). Estimated coefficients for each of the covariates. Handles the censored cases correctly.

Cox, D.R. (1972) ‘Regression models and life tables’ JRSS,B, 34 pp

Childcare Study – Studying a cohort of women who returned to work after having their first child. 24 month study The focus of the study was childcare spell #2 341 Mothers (and babies)

Variables ID Start of childcare spell #2 (month) End of childcare spell #2 (month) Gender of baby (male; female) Type of care spell #2 (a relative; childminder; nursery) Family income (crude measure)

Describes the decline in the size of the risk set over time. Survival Function (or survival curve)

S(t) = 1 – F(t) = Prob (T>t) also S(t 1 ) S(t 2 ) for all t 2 > t 1 Survival Function

S(t) = 1 – F(t) = Prob (T>t) Survival Function survival probability complement Cumulative probability event time

S(t 1 ) S(t 2 ) for all t 2 > t 1 Survival Function All this means is… once you’ve left the risk set you can’t return!!!

Median Survival Times

Too hard to interpret except for the Rain Man

HAZARD In advanced analyses researchers sometimes examine the shape of something called the hazard. In essence the shape of this is not constrained like the survival function. Therefore it can potentially tell us something about the social process that is taking place.

For the very keen… Hazard – the rate at which events occur Or the risk of an event occurring at a particular time, given that it has not happened before t

For the even more keen… Hazard – The conditional probability of an event occurring at time t given that it has not happened before. If we call the hazard function h(t) and the pdf for the duration f(t) Then, h(t)= f(t)/S(t)

Y variable = duration with censored observations X1X1 X3X3 X2X2 A Statistical Model

Y variable = duration with censored observations Family income Gender of baby A Statistical Model Mother’s age A continuous covariate Type of childcare

For the keen.. Cox Proportional Hazard Model h(t)=h 0 (t)exp(bx)

Cox Proportional Hazard Model h(t)=h 0 (t)exp(bx) hazardbaseline hazard(unknown) exponential estimate X var

For the very keen.. Cox Proportional Hazard Model can be transformed into an additive model log h(t)=a(t) + bx Therefore…

For the very keen.. Cox Proportional Hazard Model log h(t)=  0 (t) +   x 1 This should look distressingly familiar!

Define the code for the event (i.e. 1 if occurred – 0 if censored)

Enter explanatory variables (dummies and continuous)

X var Estimate Standard error Chi-square related Un-logged estimate

What does this mean? Our Y the duration of childcare spell #2. Note we are modelling the hazard!

Significant Variables Family income p<.001 Gender baby p=.696 Mother’s age p=.262 Childminder p<.001 Nursery p<.001

Effects on the hazard Family income p<.001 £30K + Up to £30K Childminder p<.001 Nursery p<.001