Judith D. Singer & John B. Willett Harvard Graduate School of Education Discrete-time survival analysis ALDA, Chapters 10, 11, and 12 Times change, and.

Slides:



Advertisements
Similar presentations
1 Contact details Colin Gray Room S16 (occasionally) address: Telephone: (27) 2233 Dont hesitate to get in touch.
Advertisements

Dummy Dependent variable Models
What is Event History Analysis?
Probability Distributions
Doing data analysis with the multilevel model for change ALDA, Chapter Four “We are restless because of incessant change, but we would be frightened.
Multilevel Event History Modelling of Birth Intervals
What is Event History Analysis?
Statistical Analysis SC504/HS927 Spring Term 2008
Inferential Statistics
January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.
Transformations & Data Cleaning
Brief introduction on Logistic Regression
Logistic Regression Psy 524 Ainsworth.
Departments of Medicine and Biostatistics
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Chapter 11- Confidence Intervals for Univariate Data Math 22 Introductory Statistics.
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Judith D. Singer & John B. Willett Harvard Graduate School of Education Extending the discrete-time hazard model ALDA, Chapter Twelve “Some departure from.
John B. Willett & Judith D. Singer Harvard Graduate School of Education Introducing discrete-time survival analysis ALDA, Chapter Eleven “To exist is to.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 7-1 Introduction to Statistics: Chapter 8 Estimation.
CHAPTER 6 Statistical Analysis of Experimental Data
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
1 G Lect 11M Binary outcomes in psychology Can Binary Outcomes Be Studied Using OLS Multiple Regression? Transforming the binary outcome Logistic.
Model Checking in the Proportional Hazard model
Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1
1 B. The log-rate model Statistical analysis of occurrence-exposure rates.
Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Unit 4b: Fitting the Logistic Model to Data © Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 1
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.
Chapter 13: Inference in Regression
Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data.
G Lecture 121 Analysis of Time to Event Survival Analysis Language Example of time to high anxiety Discrete survival analysis through logistic regression.
© Willett & Singer, Harvard University Graduate School of Education S077/Week #4– Slide 1 S077: Applied Longitudinal Data Analysis Week #4: What Are The.
Unit 5b: The Logistic Regression Approach to Life Table Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 5b– Slide 1
Quantitative Skills 1: Graphing
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression.
Linear correlation and linear regression + summary of tests
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
HSRP 734: Advanced Statistical Methods July 17, 2008.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Section 3.3: The Story of Statistical Inference Section 4.1: Testing Where a Proportion Is.
© Willett & Singer, Harvard University Graduate School of Education S077/Week #5– Slide 1 S077: Applied Longitudinal Data Analysis Week #5: What Are The.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
Logistic Regression Analysis Gerrit Rooks
Handbook for Health Care Research, Second Edition Chapter 13 © 2010 Jones and Bartlett Publishers, LLC CHAPTER 13 Statistical Methods for Continuous Measures.
Dates Presentations Wed / Fri Ex. 4, logistic regression, Monday Dec 7 th Final Tues. Dec 8 th, 3:30.
Logistic regression (when you have a binary response variable)
Probability and odds Suppose we a frequency distribution for the variable “TB status” The probability of an individual having TB is frequencyRelative.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
1 ES Chapters 14 & 16: Introduction to Statistical Inferences E n  z  
Logistic Regression and Odds Ratios Psych DeShon.
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
© Willett, Harvard University Graduate School of Education, 6/13/2016S052/II.2(a3) – Slide 1 S052/II.2(a3): Applied Data Analysis Roadmap of the Course.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Methods of Presenting and Interpreting Information Class 9.
Logistic Regression APKC – STATS AFAC (2016).
Descriptive Statistics
Presentation transcript:

Judith D. Singer & John B. Willett Harvard Graduate School of Education Discrete-time survival analysis ALDA, Chapters 10, 11, and 12 Times change, and we change with them Anonymous, quoted in Holinsheds Chronicles, 1578

What we will cover…. Making sure were all on the same page: A quick review of basic descriptive statistics for discrete-time event data (Ch 10) using the Age at first intercourse study Specifying a discrete-time hazard model (§11.1 & 11.2)both heuristic and formal representations Model fitting, interpretation and comparison (§ )very similar to logistic regression Alternative specifications of the baseline in the discrete- time hazard model (§12.1)more parsimonious representations of TIME Including time-varying predictors (§12.3)use of a person-period data set makes them easy to include (although interpretations require care) Evaluating and relaxing the proportionality assumption (§12.5)not all predictors have time constant effects Making sure were all on the same page: A quick review of basic descriptive statistics for discrete-time event data (Ch 10) using the Age at first intercourse study Specifying a discrete-time hazard model (§11.1 & 11.2)both heuristic and formal representations Model fitting, interpretation and comparison (§ )very similar to logistic regression Alternative specifications of the baseline in the discrete- time hazard model (§12.1)more parsimonious representations of TIME Including time-varying predictors (§12.3)use of a person-period data set makes them easy to include (although interpretations require care) Evaluating and relaxing the proportionality assumption (§12.5)not all predictors have time constant effects © Singer & Willett, page 2

The life table: Describing the distribution of event occurrence over time (ALDA, Section 10.1, pp ) n at risk j n censored j n events j Recall the grade of 1 st intercourse study: 180 middle school boys were tracked from 7 th through 12th grades. By the end of data collection (at the end of 12 th grade), n=126 (70.0%) had had sex; n=54 (30%) were censored (still virgins) © Singer & Willett, page 3

The discrete-time hazard function: Assessing the conditional risk of event occurrence (ALDA, Section , pp ) Discrete-time hazard Conditional probability that individual i experiences the target event in time period j (T i = j) given that s/he didnt experience it in any earlier time period (T i j) h(t ij )=Pr{T i = j|T i j} Easy to estimate because each value of hazard is based on that intervals risk set. As a probability, discrete time hazard is bounded by 0 and 1. This is an issue for modeling well need to address Discrete-time hazard Conditional probability that individual i experiences the target event in time period j (T i = j) given that s/he didnt experience it in any earlier time period (T i j) h(t ij )=Pr{T i = j|T i j} Easy to estimate because each value of hazard is based on that intervals risk set. As a probability, discrete time hazard is bounded by 0 and 1. This is an issue for modeling well need to address © Singer & Willett, page 4

The survivor function (and median lifetime): Cumulating risk over time (ALDA, Section 10.2, pp ) Grade S(t) Discrete-time survival probability Probability that individual i will survive beyond time period j (T i > j) (i.e., will not experience the event until after time period j). S(t ij )=Pr{T i > j} By definition, at the beginning of time, S(t 0 )=1.0 Strategy for estimation: Since h(t ij ) tells us about the probability of event occurrence, 1-h(t ij ) tells us about the probability of non-occurrence (i.e., about survival) Discrete-time survival probability Probability that individual i will survive beyond time period j (T i > j) (i.e., will not experience the event until after time period j). S(t ij )=Pr{T i > j} By definition, at the beginning of time, S(t 0 )=1.0 Strategy for estimation: Since h(t ij ) tells us about the probability of event occurrence, 1-h(t ij ) tells us about the probability of non-occurrence (i.e., about survival) ML = 10.6 Estimated median lifetime © Singer & Willett, page 5

Towards a discrete time hazard model: Inspecting sample plots of within-group hazard functions: In raw and transformed scales (ALDA, Section , pp ) PT=1 PT=0 Questions to ask when examining sample hazard functions: What is the shape of each hazard function? Does the relative level of hazard differ across groups? Suggests the appropriateness of the dual partition introduced earlier, but how do we deal with the bounded nature of hazard? Questions to ask when examining sample hazard functions: What is the shape of each hazard function? Does the relative level of hazard differ across groups? Suggests the appropriateness of the dual partition introduced earlier, but how do we deal with the bounded nature of hazard? PT=1 PT=0 Transform into odds Solves the upper bound problem but not the lower bound Transform into odds Solves the upper bound problem but not the lower bound PT=1 PT=0 Transform into logits Usually regularizes distances between functionsstretches distances between small values and compresses distances between large values Not bounded at all (although you need to get used to negative #s) Transform into logits Usually regularizes distances between functionsstretches distances between small values and compresses distances between large values Not bounded at all (although you need to get used to negative #s) © Singer & Willett, page 6

What population model might have generated these sample data? Sample hazard estimates, alternative hypothesized models, parameterizing the DT hazard model (ALDA, Section , pp ) Flat population logit hazard, shifted when PT switches from 0 to 1 Linear population logit hazard, shifted when PT switches from 0 to 1 General population logit hazard, shifted when PT switches from 0 to 1 When PT=1, you shift this entire baseline vertically by 1 How do we fit this model to data? Baseline logit hazard function (when PT=0) (D 7 =1)(D 8 =1) (D 11 =1)(D 12 =1)... © Singer & Willett, page 7

The person-period data set: The key to fitting the discrete-time hazard model (ALDA, Section , pp ) Person level data set idTcensorpt All parameter estimates, standard errors, t- and z-statistics, goodness-of-fit statistics, and tests will be correct for the discrete- time hazard model Person-Period data set ptD12D11D10D9D8D7eventTIMEid © Singer & Willett, page 8

Model A: A baseline discrete-time hazard model with no substantive predictors (ALDA, Section , pp ) Because there are no predictors in Model A, this baseline is for the entire sample If estimates are approx equal, baseline is flat If estimates decline, hazard declines If estimates increase (as they do here), hazard increases Because there are no predictors in Model A, this baseline is for the entire sample If estimates are approx equal, baseline is flat If estimates decline, hazard declines If estimates increase (as they do here), hazard increases © Singer & Willett, page 9

Models B & C: Uncontrolled effects of substantive predictors (ALDA, Section & , pp ) Continuous predictors Antilogging still yields a estimated odds-ratio associated with a 1-unit difference in the predictor: The estimated odds of first intercourse are 1.56 times (just over 50% higher) for boys whose parents score one unit higher on this antisocial behavior index. The estimated odds of first intercourse for boys who have experienced a parenting transition are 2.4 times higher than the odds for boys who did not experience such a transition. Dichotomous predictors As in regular logistic regression, antilogging a yields the estimated odds-ratio associated with a 1-unit difference in the predictor: ^ © Singer & Willett, page 10

Comparing nested models using deviance statistics (and non-nested models information criteria) (ALDA, Section 11.6, pp ) TIME dummies Deviance smaller value, better fit, 2 dist., compare nested models AIC, BIC smaller value, better fit, compare non- nested models Model B vs. Model A provides an uncontrolled test of H 0 : PT =0 Deviance=17.30(1), p<.001 Model C vs. Model A provides an uncontrolled test of H 0 : PAS =0 Deviance=14.79(1), p<.001 Model D vs. Models B&C provide controlled tests [Both rejected as well] © Singer & Willett, page 11

Displaying fitted hazard and survivor functions Substitute in prototypical predictor values and compute fitted values (ALDA, Section , pp ) Model B In logit hazard scale, a constant vertical separation of In hazard scale, a non- constant vertical separation (no simple interpretation because this a proportional odds model, not a proportional hazards model!) Effect of PT cumulates into a large difference in estimated median lifetimes (9.9 vs years) © Singer & Willett, page 12

Pros and cons of the dummy specification for the main effect of TIME (ALDA, Section 12.1, pp ) The dummy specification for TIME is: Completely general, placing no constraints on the shape of the baseline (logit) hazard function; Easily interpretableeach associated parameter represents logit hazard in time period j for the baseline group Consistent with life-table estimates PRO The dummy specification for TIME is also: Nothing more than an analytic decision, not a requirement of the discrete-time hazard model Completely lacking in parsimony. If J is large, it requires the inclusion of many unknown parameters; A problem when it yields fitted functions that fluctuate erratically across time periods because of nothing more than sampling variation CON Three reasons for considering an alternative specification Your study involves many discrete time periods (because data collection is long or time is less coarsely discretized) Hazard is expected to be near 0 in some time periods (causing convergence problems) Some time periods have small risk sets (because either the initial sample is small or hazard and censoring dramatically diminish the risk set over time) Three reasons for considering an alternative specification Your study involves many discrete time periods (because data collection is long or time is less coarsely discretized) Hazard is expected to be near 0 in some time periods (causing convergence problems) Some time periods have small risk sets (because either the initial sample is small or hazard and censoring dramatically diminish the risk set over time) The variable PERIOD in the person-period data set can be treated as continuous TIME © Singer & Willett, page 13

0 ONE 1 (TIME-c) 2Linear1 Centering constant helps interpretation 0 ONE 1 (TIME-c) 2 (TIME-c) 2 3 (TIME-c) 3 4cubic3 0 ONE 1 (TIME-c) 2 (TIME-c) 2 3quadratic2 Common choices 0 ONE 1 (TIME-c) 2 (TIME-c) 2 3 (TIME- c) 3 4 (TIME-c) 4 53 stationary points 4 0 ONE 1 (TIME-c) 2 (TIME-c) 2 3 (TIME- c) 3 4 (TIME-c) 4 5 (TIME-c) 5 64 stationary points 5 Rarely adopted but gives a sense of whether you should stick with completely general specification 0 ONE 1Constant0 Always the worst fit (highest deviance) Comparing the general specification to an ordered set of polynomials Not necessarily the best, but a systematic set of informative choices (ALDA, Section , pp ) 1 D 1 + … + J D J JGeneraln/a Model: logit h(t ij )= n parameters Behavior of logit hazard Order of polynomial Always the best fit (lowest deviance) Strategy for model comparison Because each lower order model is nested within each higher order model, Deviance statistics can be directly compared to help make analytic decisions © Singer & Willett, page 14

Examining alternative polynomial specification for TIME : Deviance statistics and fitted logit hazard functions (ALDA, Section , pp ) The quadratic looks reasonably good, but can we test whether its good enough? General Constant Linear Quadratic Cubic Sample: 260 faculty members (who had received a National Academy of Education Post-Doc) Each was tracked for up to 9 years after taking his/her first academic job By the end of data collection, n=166 (63.8%) had received tenure; the other 36.2% were censored (because they might eventually receive tenure somewhere). Sample: 260 faculty members (who had received a National Academy of Education Post-Doc) Each was tracked for up to 9 years after taking his/her first academic job By the end of data collection, n=166 (63.8%) had received tenure; the other 36.2% were censored (because they might eventually receive tenure somewhere). Gamse and Conger (1997) Abt Associates Comparisons always worth making Is the added polynomial term necessary? Is this polynomial as good as the general spec? * Constant is terrible* Linear is better, but not as good as general * Quadratic is better still, and nearly as good as general * Cubic on up seem thoroughly unnecessary © Singer & Willett, page 15

Including time-varying predictors: Age of onset of 1 st depressive episode Sample: 1,393 adults ages 17 to (27.8%) reported a first depression onset between ages 4 and 39 Specification of baseline hazard function Many person-periods (36,997) and very few actual events (387) Annual data between ages 4 and 39 requires 36 TIME dummieshardly parsimonious A cubic function of TIME fits nearly as well ( 2 =34.51, 32 df, p>.25) as a completely general specification and measurably better ( 2 =5.83, 1 df, p<.05) than a quadratic Time-varying predictor: First parental divorce n=145 (10.4%) experienced a first parental divorce while still at risk of first depression onset PD is time-varying, indicating whether the parents of individual i divorced during, or before, time period j. PD ij =0 in periods before the divorce PD ij =1 in periods coincident with or subsequent to the divorce Sample: 1,393 adults ages 17 to (27.8%) reported a first depression onset between ages 4 and 39 Specification of baseline hazard function Many person-periods (36,997) and very few actual events (387) Annual data between ages 4 and 39 requires 36 TIME dummieshardly parsimonious A cubic function of TIME fits nearly as well ( 2 =34.51, 32 df, p>.25) as a completely general specification and measurably better ( 2 =5.83, 1 df, p<.05) than a quadratic Time-varying predictor: First parental divorce n=145 (10.4%) experienced a first parental divorce while still at risk of first depression onset PD is time-varying, indicating whether the parents of individual i divorced during, or before, time period j. PD ij =0 in periods before the divorce PD ij =1 in periods coincident with or subsequent to the divorce idagefemalepdevent ………… Data source: Blair Wheaton and colleagues (1997) Stress & adversity across the life course (ALDA, Section 12.3, p 428) ID 40: Reported first depression onset at 23; first parental divorce at age 9 © Singer & Willett, page 16

Including a time-varying predictor in the discrete-time hazard model (ALDA, Section , p ) What does 1 tell us ? Contrasts the population logit hazard for people who have experienced a parental divorce with those who have not, But because PD ij is time-varying, membership in the parental divorce group changes over time so were not always comparing the same people The predictor effectively compares different groups of people at different times! But, were still assuming that the effect of the time-varying predictor is constant over time. What does 1 tell us ? Contrasts the population logit hazard for people who have experienced a parental divorce with those who have not, But because PD ij is time-varying, membership in the parental divorce group changes over time so were not always comparing the same people The predictor effectively compares different groups of people at different times! But, were still assuming that the effect of the time-varying predictor is constant over time. Sample logit(proportions) of people experiencing first depression onset at each age, by PD status at that age Hypothesized population model (note constant effect of PD) Implicit particular realization of population model (for those whose parents divorce when theyre age 20) © Singer & Willett, page 17

Interpreting a fitted DT hazard model that includes a TV predictor (ALDA, Section , pp ) e =1.51 Controlling for gender, at every age from 4 to 39, the estimated odds of first depression onset are about 50% higher for individuals who experienced a concurrent, or previous, parental divorce e =1.73 Controlling for parental divorce, the estimated odds of first depression onset are 73% higher for women What about a woman whose parents divorced when she was 20? © Singer & Willett, page 18

The proportionality assumption: Is a predictors effect constant over time or might it vary? (ALDA, Section , pp ) Predictors effect is constant over time Predictors effect increases over time Predictors effect decreases over time Predictors effect is particularly pronounced in certain time periods © Singer & Willett, page 19

Discrete-time hazard models that do not invoke the proportionality assumption (ALDA, Section , pp ) A completely general representation: The predictor has a unique effect in each period A more parsimonious representation: The predictors effect changes linearly with time 1 assesses the effect of X 1 in time period c 2 describes how this effect linearly increases (if positive) or decreases (if negative) Another parsimonious representation: The predictors effect differs across epochs 2 assesses the additional effect of X 1 during those time periods declared to be later in time © Singer & Willett, page 20

The proportionality assumption: Uncovering violations and simple solutions (ALDA, Section 12.4, pp 443) Data source: Graham (1997) dissertation Sample: 3,790 high school students who participated in the Longitudinal Survey of American Youth (LSAY) Research design: Tracked from 10 th grade through 3 rd semester of collegea total of 5 periods Only n=132 (3.5%) took a math class for all of the 5 periods! RQs: When are students most at risk of dropping out of math? Whats the effect of gender? Does the gender differential vary over time? Data source: Graham (1997) dissertation Sample: 3,790 high school students who participated in the Longitudinal Survey of American Youth (LSAY) Research design: Tracked from 10 th grade through 3 rd semester of collegea total of 5 periods Only n=132 (3.5%) took a math class for all of the 5 periods! RQs: When are students most at risk of dropping out of math? Whats the effect of gender? Does the gender differential vary over time? Risk of dropping out zig-zags over time peaks at 12 th and 2 nd semester of college Magnitude of the gender differential varies over timesmallest in 11 th grade and increases over time Suggests that the proportionality assumption is being violated © Singer & Willett, page 21

Checking the proportionality assumption: Is the effect of FEMALE constant over time? (ALDA, Section , pp ) All models include a completely general specification for TIME using 5 time dummies: HS11, HS12, COLL1, COLL2, and COLL (4) ns 6.50 (1) p= © Singer & Willett, page 22