Duration models Bill Evans 1. timet0t0 t2t2 t 0 initial period t 2 followup period a b c d e f h g i Flow sample.

Slides:



Advertisements
Similar presentations
Day 7 Survival Analysis. CHD and coffee Is there an association between CHD and coffee consumption? Outcome: Time to onset of CHD.
Advertisements

Event History Models 1 Sociology 229A: Event History Analysis Class 3
SC968: Panel Data Methods for Sociologists Random coefficients models.
Introduction to Logistic Regression In Stata Maria T. Kaylen, Ph.D. Indiana Statistical Consulting Center WIM Spring 2014 April 11, 2014, 3:00-4:30pm.
Lecture 4 (Chapter 4). Linear Models for Correlated Data We aim to develop a general linear model framework for longitudinal data, in which the inference.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function F(Z) giving the probability is the cumulative standardized.
Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
SC968: Panel Data Methods for Sociologists
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
بسم الله الرحمن الرحیم. Generally,survival analysis is a collection of statistical procedures for data analysis for which the outcome variable of.
Part 21: Hazard Models [1/29] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
PH6415 Review Questions. 2 Question 1 A journal article reports a 95%CI for the relative risk (RR) of an event (treatment versus control as (0.55, 0.97).
Biostatistics in Research Practice Time to event data Martin Bland Professor of Health Statistics University of York
Event History Analysis 7
Ordered probit models.
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Parametric EHA Models Sociology 229A: Event History Analysis Class 6 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
Event History Models Sociology 229: Advanced Regression Class 5
Missing Data.. What do we mean by missing data? Missing observations which were intended to be collected but: –Never collected –Lost accidently –Wrongly.
Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Main Points to be Covered Cumulative incidence using life table method Difference between cumulative incidence based on proportion of persons at risk and.
Event History Analysis 1 Sociology 8811 Lecture 16 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Event History Models 2 Sociology 229A: Event History Analysis Class 4 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Model Checking in the Proportional Hazard model
Analysis of Complex Survey Data
Lecture 16 Duration analysis: Survivor and hazard function estimation
17. Duration Modeling. Modeling Duration Time until retirement Time until business failure Time until exercise of a warranty Length of an unemployment.
1 Survival Analysis with STATA Robert A. Yaffee, Ph.D. Academic Computing Services ITS p Office: 75 Third Avenue Level C-3.
Methods Workshop (3/10/07) Topic: Event Count Models.
Survival Analysis III Reading VGSM
Longitudinal Data Analysis for Social Science Researchers Thinking About Event Histories
1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
1 predcumi: A postestimation command for predicting and visualising cumulative incidence estimates after Cox regression models. Stephen Kaptoge Department.
Biostat 209 Survival Data l John Kornak April 2, 2013 Reading VGSM 3.5 (review) and epibiostat.ucsf.edu/biostat/vgs.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Survival Data John Kornak March 29, 2011
EHA: More On Plots and Interpreting Hazards Sociology 229A: Event History Analysis Class 9 Copyright © 2008 by Evan Schofer Do not copy or distribute without.
Applications of G-estimation using a new Stata command Jonathan Sterne Kate Tilling Department.
Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Regression models for binary and survival data PD Dr. C.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Lecture 3 Linear random intercept models. Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The.
Survival Analysis in Stata First, declare your survival-time variables to Stata using stset For example, suppose your duration variable is called timevar.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
DON’T WRITE DOWN THE MATERIAL ON THE FOLLOWING SLIDES, JUST LISTEN TO THE DISCUSSION AND TRY TO INTERPRET DIAGRAMS AND STATISTICAL RESULTS.
INTRODUCTION TO CLINICAL RESEARCH Survival Analysis – Getting Started Karen Bandeen-Roche, Ph.D. July 20, 2010.
Logistic Regression 2 Sociology 8811 Lecture 7 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
1 Ordinal Models. 2 Estimating gender-specific LLCA with repeated ordinal data Examining the effect of time invariant covariates on class membership The.
1 “The Effects of Sociodemographic Factors on the Hazard of Dying Among Aged Chinese Males and Females” Dudley L. Poston, Jr. and Hosik Min Department.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
Additional Regression techniques Scott Harris October 2009.
SURVIVAL ANALYSIS WITH STATA. DATA INPUT 1) Using the STATA editor 2) Reading STATA (*.dta) files 3) Reading non-STATA format files (e.g. ASCII) - infile.
DURATION ANALYSIS Eva Hromádková, Applied Econometrics JEM007, IES Lecture 9.
[Topic 11-Duration Models] 1/ Duration Modeling.
April 18 Intro to survival analysis Le 11.1 – 11.2
Event History Analysis 3
Econometric Analysis of Panel Data
Survival Analysis with STATA
Multiple logistic regression
Biost 513 Discussion Section Week 9
Presentation transcript:

Duration models Bill Evans 1

timet0t0 t2t2 t 0 initial period t 2 followup period a b c d e f h g i Flow sample

timet0t0 t1t1 t2t2 t 0 initial period t 1 people sampled t 2 followup period a b c d e f h g i Stock sample

Interpreting Coefficients This is the same for both Weibull, Exponential, and any other proportional hazard model For Weibull, λ(t i ) = ργ i t ρ-1 γ i = exp(β 0 + β 1 + x 2i β 2 …. x ki β k ) 4

Suppose x 1i is a dummy variable When x i1 =1, then γ i1 = exp(β 0 + β 1 + x 2i β 2 …. x ki β k ) When x i1 =0, then γ i0 = exp(β 0 + β 1 + x 2i β 2 …. x ki β k ) 5

Let λ i1 be hazard when x 1i =1 and λ i0 when x i1 =0 Percentage change in hazard (λ i1 – λ i0 )/ λ i0 (ργ i1 t ρ-1 – ργ i0 t ρ-1 ) /ργ i0 t ρ-1 = exp(β 1 ) -1 Percentage change in the hazard when x 1i turns from 0 to 1. STATA prints out exp(β 1 ), just subtract 1 6

Suppose x 2i is continuous Suppose we increase x 2i by 1 unit γ i1 = exp(β 0 + β 1 x 1i + x 2i β 2 …. x ki β k ) γ i2 = exp(β 0 + β 1 x 1i + (x 2i +1)β 2 …. x ki β k ) Can show that (λ i1 – λ i0 )/ λ i0 = (ργ i2 t ρ-1 – ργ i1 t ρ-1 ) / ργ i1 t ρ-1 = exp(β 2 ) – 1 Percentage change in the hazard for 1 unit increase in x 7

NLMS National longitudinal mortality survey Match of monthly CPS data sets to National Death Index Public Use version –Five monthly CPS data sets from –637,162 people –Each followed for 9 years (3288 days) Our sample –Males, 50-70, who were married at the time of the survey –Used to examine bereavement effect 8

Key Variables followh -- days of followup for husband (max is 3288) Deathh =1 if husband dies during followup Note if deathh=0, then followh=3288 Deathh identifies whether the data is censored. 9

Variable | Obs Mean Std. Dev. Min Max followh | followw | age | educ | income | raceh1 | raceh2 | deathh | deathw | hhid |

11

educh –=1 if <8 years –=2 if 9-11 years –=3 if 12 years –=4 if years –=5 if 16+ years 12

income (Family income) –1<$5K –2≥ $5K, < $10K –3≥ $10K, < $15K –4≥ $15K, < $20K –5≥ $20K, < $25K –6≥ $25K, < $50K –7≥ $50K 13

Duration Data in STATA Need to identify variable that measures duration stset length, failure(failvar) Length=duration variable Failvar=1 when durations end in failure, =0 for censored values If all data is uncensored, omit failure(failvar) In our case stset followh, failure(deathh) 14

Kaplan-Meier Curves Graph of raw data What fraction of people exit the sample in each period “Risk set” includes people who make it to the next period 15

Getting Kaplan-Meier Curves Tabular presentation of results sts list Graphical presentation sts graph Results by subgroup sts graph, by(educ) Graph hazard functions Sts graph, hazard 16

17

18

19

20

MLE of duration model with Covariates Basic syntax streg covariates, d(distribution) streg age raceh1 raceh2 _Ie* _Ii*, d(weibull) nohr; In this model, STATA will print out exp(β) If you want the coefficients, add ‘nohr’ option (no hazard ratio) 21

Whites have higher mortality than hispanics – Hispanic “paradox” Mortality falling In education but It is not monotonic 22

Mortality is monotonic in income Weibull parameter, hazard Is increasing in duration 23

24

The magnitude (> or < 1) of the parameters is informative –Hazard increasing in age –Whites, Blacks have higher mortality rates –Hazard decreases with income and age –P-value is for the test that parameter = 1 The Weibull parameter ρ = –Check 95% confidence interval (1.14, 1.19). Can reject null p=1 (exponential) –Low probability P<1 –Hazard is increasing over time 25

Interpret coefficients Age: every year of age hazard increases by 8% Black, non-Hispanics: have 41% greater hazard than Hispanics White, non-Hispanics: 24.5% greater hazard than Hispanics Notice results are –Monotonic in income –Nearly monotonic in education 26

Educ 5: those with college degree.762 – 1 = or a 32.8% lower hazard than those with <9 years of school Income 5, those with >$50K in income have a 0.44 – 1 = or a 54% lower hazard than those with income <$5K 27

. streg age raceh1 raceh2 _Ie* _Ii*, d(exp); Exponential regression -- log relative-hazard form No. of subjects = Number of obs = No. of failures = 7404 Time at risk = LR chi2(13) = Log likelihood = Prob > chi2 = _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] age | raceh1 | raceh2 | _Ieduc_2 | _Ieduc_3 | _Ieduc_4 | _Ieduc_5 | _Iincome_2 | _Iincome_3 | _Iincome_4 | _Iincome_5 | _Iincome_6 | _Iincome_7 | To run an exponential – just change the distrbution 28

Cox models. stcox age raceh1 raceh2 _Ie* _Ii*; 29

. * run cox proportional hazards model;. stcox age raceh1 raceh2 _Ie* _Ii*; failure _d: deathh analysis time _t: followh id: hhid Cox regression -- Breslow method for ties No. of subjects = Number of obs = No. of failures = 7404 Time at risk = LR chi2(13) = Log likelihood = Prob > chi2 = _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] age | raceh1 | raceh2 | _Ieduc_2 | _Ieduc_3 | _Ieduc_4 | _Ieduc_5 | _Iincome_2 | _Iincome_3 | _Iincome_4 | _Iincome_5 | _Iincome_6 | _Iincome_7 |

Comparing Hazard Ratios Expon.WeibullCox Age1.079 (0.0024) (0.0024) (0.0024) Raceh (0.1145) (0.1148) (0.1149) Raceh (0.141) (0.142) (0.142) 31

Comparing Hazard Ratios Expon.WeibullCox Educ (0.0357) (0.0357) (0.0357) Educ (0.0279) (0.0280) (0.0279) Educ (0.0407) (0.0419) (0.0419) Educ (0.0355) (0.0359) (0.0354) 32

Comparing Hazard Ratios Expon.WeibullCox Income (0.0342) (0.0359) (0.0340) Income (0.0294) (0.0293) (0.0293) Income (0.0347) (0.0345) (0.0344) 33

Time vary covariates The example so far have examines the impact of time invariant covariates on outcomes Can be the case that time varying covariates matter as well –What happens to jobless spell when UI benefits run out? 34

Example: Bereavement Effect Heightened mortality after the death of a spouse Especially pronounced in the 2 years after spouse’s death Measure many possible Time-varying covariate – the dummy variable turns on the day your spouse dies ahead of you 35

followh is the husband’s duration measure followw is the wife’s If followw<followh, wife dies before the husband 36

. stsplit bereavement, after(time=followw) at(0); (2771 observations (episodes) created). recode bereavement -1=0 0=1; (bereavement: changes made). stcox age raceh1 raceh2 _Ie* _Ii* bereavement; 37

. stcox age raceh1 raceh2 _Ie* _Ii* bereavement; Cox regression -- Breslow method for ties No. of subjects = Number of obs = No. of failures = 7404 Time at risk = LR chi2(14) = Log likelihood = Prob > chi2 = _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] age | raceh1 | raceh2 | _Ieduc_2 | _Ieduc_3 | _Ieduc_4 | _Ieduc_5 | _Iincome_2 | _Iincome_3 | _Iincome_4 | _Iincome_5 | _Iincome_6 | _Iincome_7 | bereavement |