Lecture 15: Time Varying Covariates Time-varying covariates.

Slides:



Advertisements
Similar presentations
Survival Analysis In many medical studies, the primary endpoint is time until an event occurs (e.g. death, remission) Data are typically subject to censoring.
Advertisements

Topic: Several Approaches to Modeling Recurrent Event Data Presenter: Yu Wang.
If we use a logistic model, we do not have the problem of suggesting risks greater than 1 or less than 0 for some values of X: E[1{outcome = 1} ] = exp(a+bX)/
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Survival Analysis-1 In Survival Analysis the outcome of interest is time to an event In Survival Analysis the outcome of interest is time to an event The.
Simple Logistic Regression
Survival Analysis. Statistical methods for analyzing longitudinal data on the occurrence of events. Events may include death, injury, onset of illness,
HSRP 734: Advanced Statistical Methods July 24, 2008.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Analysis of Time to Event Data
Intermediate methods in observational epidemiology 2008 Instructor: Moyses Szklo Measures of Disease Frequency.
Chapter 11 Survival Analysis Part 3. 2 Considering Interactions Adapted from "Anderson" leukemia data as presented in Survival Analysis: A Self-Learning.
PH6415 Review Questions. 2 Question 1 A journal article reports a 95%CI for the relative risk (RR) of an event (treatment versus control as (0.55, 0.97).
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Using time-dependent covariates in the Cox model THIS MATERIAL IS NOT REQUIRED FOR YOUR METHODS II EXAM With some examples taken from Fisher and Lin (1999)
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
Introduction to Survival Analysis PROC LIFETEST and Survival Curves.
1 Logistic Regression Homework Solutions EPP 245/298 Statistical Analysis of Laboratory Data.
Measures of disease frequency (I). MEASURES OF DISEASE FREQUENCY Absolute measures of disease frequency: –Incidence –Prevalence –Odds Measures of association:
Model Checking in the Proportional Hazard model
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Analysis of Complex Survey Data
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
HSRP 734: Advanced Statistical Methods July 10, 2008.
G Lecture 121 Analysis of Time to Event Survival Analysis Language Example of time to high anxiety Discrete survival analysis through logistic regression.
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
Assessing Survival: Cox Proportional Hazards Model
Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
1 Using martingale residuals to assess goodness of fit for sampled risk set data Ørnulf Borgan Department of Mathematics University of Oslo Based on joint.
Time-dependent covariates and further remarks on likelihood construction Presenter Li,Yin Nov. 24.
INTRODUCTION TO SURVIVAL ANALYSIS
Applied Epidemiologic Analysis Fall 2002 Patricia Cohen, Ph.D. Henian Chen, M.D., Ph. D. Teaching Assistants Julie KranickSylvia Taylor Chelsea MorroniJudith.
Lecture 13: Cox PHM Part II Basic Cox Model Parameter Estimation Hypothesis Testing.
HSRP 734: Advanced Statistical Methods July 17, 2008.
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
Tony Panzarella Princess Margaret Hospital / University of Toronto.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
HSRP 734: Advanced Statistical Methods July 31, 2008.
Lecture 19: Competing Risk Regression
Survival Analysis 1 Always be contented, be grateful, be understanding and be compassionate.
Lecture 12: Cox Proportional Hazards Model
Lecture 16: Regression Diagnostics I Proportional Hazards Assumption -graphical methods -regression methods.
1 Lecture 6: Descriptive follow-up studies Natural history of disease and prognosis Survival analysis: Kaplan-Meier survival curves Cox proportional hazards.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
Measures of Disease Frequency
Lecture 4: Likelihoods and Inference Likelihood function for censored data.
Lecture 3: Parametric Survival Modeling
Introduction to Frailty Models
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 6.1: Recurrent event data Intensity processes and rate functions Robust.
Additional Regression techniques Scott Harris October 2009.
Multi-state piecewise exponential model of hospital outcomes after injury DE Clark, LM Ryan, FL Lucas APHA 2007.
Direct method of standardization of indices. Average Values n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the.
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 8.1: Cohort sampling for the Cox model.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
SURVIVAL ANALYSIS PRESENTED BY: DR SANJAYA KUMAR SAHOO PGT,AIIH&PH,KOLKATA.
BIOST 513 Discussion Section - Week 10
Comparing Cox Model with a Surviving Fraction with regular Cox model
April 18 Intro to survival analysis Le 11.1 – 11.2
Statistics 262: Intermediate Biostatistics
MUSC Summer Institute May 24-25, 2018 Dr. Bethany Wolf
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Bernard Rosner Channing Division of Network Medicine
Lecture 4: Likelihoods and Inference
Lecture 4: Likelihoods and Inference
Kaplan-Meier survival curves and the log rank test
Presentation transcript:

Lecture 15: Time Varying Covariates Time-varying covariates

Time-Dependent Covariates Thus far we’ve only considered “fixed” time covariates Examples of time varying covariates – Cumulative exposure – Smoking status – Blood pressure Now, data structure is – [T, d, Z(t); 0 < t < T]

CPHM with Time Varying Covariates The model looks like what we’ve been working with. Now however, Z is a function of t :

Likelihood Time Varying Covariates Again, we can use the partial likelihood estimation approach for estimating b But Z is now a function of t (as in the model statement): Otherwise, testing and estimation are the same as for fixed covariates

Example: Bone Marrow Transplant Main covariate of interest is disease type: – ALL – low risk AML – high risk AML Interest is in determining factors associated with disease-free survival (death or relapse)

BMT Fixed Time Covariates There are several fixed time covariates we’ve found to be important – Patient Age – Donor Age – FAB identification – Disease type – Hospital

BMT Time Varying Covariates There are also several time varying covariates – Acute graft vs. host disease (AGvHD) – Chronic graft vs. host disease (CGvHD) – Platelet recovery (PR) These all occur after BMT or not at all They can also vary over the course of the study

R: Time-Varying Covariates Expand data to describe all scenarios Need to consider the possible combinations of events Example: AGVHD and DFS – Possible scenarios at any point in time during the study for subject 1 No AGVDH: DFS? AGVHD: DFS? – For all patients with TTAGVHD < DFS, need two rows in dataset to describe variation – For all patients with TTAGVHD > DFS, need only one row in the dataset

Timeline Examples: Observed Event t 0 to t a : no AGVHD until t a, no event t a to t e : AGVHD, event t 0 to t e : no AGVHD, event t0t0 t0t0 tata tete tete

Timeline Examples: Censored Event t 0 to t a : no AGVHD, no event t a to t c : AGVHD, no event (censored) t 0 to t c : no AGVHD, no event t0t0 t0t0 tata tete tete tctc tctc

Time-Varying Covariates First, look at each time varying covariate Which (if any) are associated with DFS, adjusting for diagnosis Estimation and inference are the same as with fixed time covariates Difference – Data structure

Data Set-up >data[1:15,c(1,25,4:8)] ID Disease DFS Death Relapse Either TAGvH AGvH

Expansion Consider row 1 – Now, two rows – Row 1: start time = 0, stop time = 67, agvhd = 0, … – Row 2: start time = 67, stop time = 2081, agvhd = 1, … Consider row 2 – Still 1 row – Row 1: start time = 0, stop time = 1602, agvhd = 0, …

What About Dependence? You might be asking whether we need to worry about correlated data? In this case we do not need to worry about it. There two exceptions: – When subjects have multiple events – When a subject appears in overlapping intervals The 2 nd case is almost always a data error A subject can be at risk in multiple strata at the same time – Corresponds to being simultaneously at risk for two distinct outcomes.

R Expansion n<-nrow(bmt) adata<-bmt[, c(1:2,14:23)] #fixed time columns for (i in 1:n) { times1<-c(bmt$TAGvH[i], bmt$TCGvH[i], bmt$TRP[i], bmt$DFS[i]) events<-c(bmt$AGvH[i], bmt$CGvH[i], bmt$RP[i], bmt$Either[i]) times2<-times1[which(times1<=times1[4])] utimes<-sort(unique(times2)) for (j in 1:length(utimes)) { if (length(utimes)==1) {vec<-events} if (length(utimes)>1 & j==1) {vec<-c(0,0,0,0)} if (j>1 & j<length(utimes)){loc<-which(times1==utimes[j-1]) vec<-replace(vec, loc, events[loc]) } if (j>1 & j==length(utimes)) {loc<-which(times1==utimes[j-1]) vec<-replace(vec, c(loc,4), events[c(loc,4)])} if (j==1 & i==1) {bmt.long<-unlist(c(0, utimes[j], adata[i,], vec))} if (j==1 & i>1) {bmt.long<-rbind(bmt.long, c(0, utimes[j], adata[i,],vec))} if (j>1) {bmt.long<-rbind(bmt.long, c(utimes[j-1], utimes[j], adata[i,],vec))} } bmt.long<-as.data.frame(matrix(as.vector(unlist(bmt.long)), nrow=342, ncol=18, byrow=F)) colnames(bmt.long)<-c("Tstart","Tstop",colnames(adata),"AGvH","CGvH","PR","event") sum(bmt.long$event)

Expanded Data > bmt[1:2,] ID Disease TTD TTR Death Relapse Either TAGvH AGvH TCGvH CGvH TRP RP PtAge …. > bmt.long[1:8,] Tstart Tstop ID Disease PtAge AGvH CGvH PR event ….

Alternatively Use: expand.breakpoints Previous creates dataset per time-dependent covariate Above created by John Maindonald Expands dataset into rows per person using either observed number of times, or pre- specified number of times

expand.breakpoints Approach > bps<-sort(unique(c(bmt$DFS, bmt$TAGvH, bmt$TCGvH, bmt$TRP))) > bps [1] … [215] > bmt.long2<-expand.breakpoints(bmt, index="id", status="Either", tevent="DFS", breakpoints=bps) > bmt.long2 ID Tstart Tstop Either epoch Disease TTD TTR Death Relapse TAGvH AGvH TCGvH CGvH TRP RP …

Still Not Done That provides us with separate intervals per patient for all intervals of interest BUT, treats AGvHD, CGvHD, and PR as “fixed” time covariates We need to create time-dependent versions

R #create time-dependent covariates > bmt.long$AGvHt<-ifelse(bmt.long$TAGvH<=bmt.long$Tstart & bmt.long$AGvH==1, 1, 0) > bmt.long$CGvHt<-ifelse(bmt.long$TCGvH<=bmt.long$Tstart & bmt.long$CGvH==1, 1, 0) > bmt.long$PRt<-ifelse(bmt.long$TRP<=bmt.long$Tstart & bmt.long$PR==1, 1, 0) #Look again at pts 1 and 2 to see time dependent variables > bmt.long2$AGvH[which(bmt.long2$ID==1)] [1] … [175] > bmt.long$AGvHt[which(bmt.long$id==1)] [1] … [175]

Syntax in R To define time to event variable, there are two options: – Surv(time, y) – Surv(start.time, stop.time, y) For time varying covariates (or left-truncated data), usually simpler to use the latter convention In most other cases, simpler to use the former

Testing Time-Varying Covariates Controlling for Diagnosis #Acute graft vs. host disease #Chronic graft vs. host disease #Platelet recovery time rega<-coxph(Surv(Tstart, Tstop, event)~ AGvHDt+factor(Disease), data=bmt.long2) regc<-coxph(Surv(Tstart, Tstop, event)~ CGvHDt+factor(Disease), data=bmt.long2) regp<-coxph(Surv(Tstart, Tstop, event)~ PRt+factor(Disease), data=bmt.long2)

AGvHD > rega Call: coxph(formula = Surv(Tstart, Tstop, Either) ~ AGvHt + factor(Disease), data = bmt.long2) coef exp(coef) se(coef) z p AGvH factor(Disease) factor(Disease) Likelihood ratio test=14.7 on 3 df, p= n= 19070, number of events= 83

CGvHD > regc Call: coxph(formula = Surv(Tstart, Tstop, Either) ~ CGvHt + factor(Disease), data = bmt.long2) coef exp(coef) se(coef) z p CGvHt factor(Disease) factor(Disease) Likelihood ratio test=13.9 on 3 df, p= n= 19070, number of events= 83

Platelet Recovery > regp Call: coxph(formula = Surv(Tstart, Tstop, Either) ~ PRt + factor(Disease), data = bmt.long2) coef exp(coef) se(coef) z p PRt factor(Disease) factor(Disease) Likelihood ratio test=22.9 on 3 df, p=4.32e-05 n= 19070, number of events= 83

Interpretation? Patients with low risk AML have less risk of an event compare to ALL patients Patients with high risk AML have greater risk of an event relative to patients with ALL Patients who experience platelet recovery at a given time have less risk of an event relative to those who have not experienced platelet recovery

Back to Our Original Models Only platelet recovery is significantly associated with disease free survival Now investigate model that adjusts for previously mentioned fixed time covariates – Disease type – FAB – Donor/patient age and interaction – hospital

Models with and without PRt #Model w/ donor/patient age, intx, FAB, dx, hosp, & PR > st<-Surv(bmt.long2$Tstart, bmt.long2$Tstop, bmt.long2$Either) > reg.fixed<-coxph(st~factor(Disease)+FAB+PtAge+DonAge+ PtAge*DonAge, data=bmt.long2) > reg.tv<-coxph(st~factor(Disease)+PRt, data=bmt.long2) > reg.all<-coxph(st~factor(Disease)+FAB+PtAge+DonAge+ PtAge*DonAge+PRt, data=bmt.long2) > LRT<-2*(reg.all$loglik[2]-reg.tv$loglik[2]) > pchisq(LRT, 4, lower.tail=F) [1]

Recall Fixed Time Covariate Model > reg.fixed Call: coxph(formula = st ~ factor(Disease) + FAB + PtAge + DonAge + PtAge * DonAge, data = bmt.long) coef exp(coef) se(coef) z p factor(Disease) factor(Disease) FAB PtAge DonAge PtAge:DonAge Likelihood ratio test=32.8 on 6 df, p=1.14e-05 n= 342, number of events= 83

Time Covariate + Disease Type > reg.tv Call: coxph(formula = st ~ factor(Disease) + PR, data = bmt.long) coef exp(coef) se(coef) z p factor(Disease) factor(Disease) PR Likelihood ratio test=22.9 on 3 df, p=4.32e-05 n= 342, number of events= 83

Full Model > reg.all Call: coxph(formula = st ~ factor(Disease) + FAB + PtAge + DonAge + PtAge * DonAge + PR, data = bmt.long) coef exp(coef) se(coef) z p factor(Disease) factor(Disease) FAB PtAge DonAge PR PtAge:DonAge Likelihood ratio test=39.9 on 7 df, p=1.3e-06 n= 342, number of events= 83

Interactions Coding by Hand #Interaction coding #Diagnosis 2 (low risk AML)*PRT #Diagnosis 3 (hi risk AML)*PRT #FAB*PRT #PRT*donor age, PRT*patient age, PRT*Donor age*Patient age bmt.long2$ageint<-(bmt.long2$PtAge-28)* (bmt.long2$DonAge-28) bmt.long2$dx2.pr<-ifelse(bmt.long2$PRt==1 & bmt.long2$Disease==2, 1, 0) bmt.long2$dx3.pr<-ifelse(bmt.long2$PRt==1 & bmt.long2$Disease==3, 1, 0) bmt.long2$fab.pr<-bmt.long2$PRt*bmt.long2$FAB bmt.long2$dnr.pr<-bmt.long2$PRt*(bmt.long2$DonAge-28) bmt.long2$pt.pr<-bmt.long2$PRt*(bmt.long2$PtAge-28) bmt.long2$pt.pr.dnr<-bmt.long2$PRt*(bmt.long2$ageint)

Interactions 1.Diag 2 x PRT 2.Diag 3 x PRT 3.PRT x donor age 4.PRT x patient age 5.PRT x donor age x patient age (confusing) 1. “additional hazard of failure after platelet recovery in those with diagnosis of low risk AML vs. those with ALL” 2. “additional hazard of failure after platelet recovery in those with diagnosis of high risk AML vs. those with ALL” 3. “additional hazard of failure after platelet recovery with an increase in donor age” 4. “additional hazard of failure after platelet recovery with an increase in patient age” 5. “additional hazard of failure after platelet recovery with an increase in the interaction between the patient and donor age”

Series of Models reg1<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt, data=bmt.long2) reg2<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dx2.pr+dx3.pr, data=bmt.long2) reg3<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+fab.pr, data=bmt.long2) reg4<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dnr.pr+pt.pr+ pt.pr.dnr, data=bmt.long2) reg5<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dx2.pr+ dx3.pr+fab.pr, data=bmt.long2) reg6<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dx2.pr+ dx3.pr+dnr.pr+pt.pr+pt.pr.dnr, data=bmt.long2) reg7<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dx2.pr+ dx3.pr+fab.pr+dnr.pr+pt.pr+pt.pr.dnr, data=bmt.long2)

Full Model with Interactions > reg7 coef exp(coef) se(coef) z p factor(Disease) factor(Disease) FAB DonAge PtAge ageint PRt dx2.pr dx3.pr fab.pr dnr.pr pt.pr pt.pr.dnr Likelihood ratio test=63.6 on 13 df, p=1.19e-08 n= 19070, number of events= 83

Fitting Interactions Directly > reg7b Call: coxph(formula = st ~ factor(Disease) + FAB + DonAge + PtAge + DonAge * PtAge + PR + PR * factor(Disease) + PR * FAB + PR* DonAge + PR * PtAge + DonAge * PtAge * PR, data = bmt.long) coef exp(coef) se(coef) z p factor(Disease) factor(Disease) FAB DonAge PtAge PR DonAge:PtAge factor(Disease)2:PR factor(Disease)3:PR FAB:PR DonAge:PR PtAge:PR DonAge:PtAge:PR Likelihood ratio test=63.6 on 13 df, p=1.19e-08 n= 342, number of events= 83

Low Risk AML vs. ALL Interaction between diagnosis and platelet recovery Low-risk AML vs. ALL, prior to platelet recovery – b = – HR (95% CI): 3.76 (0.76, 18.76) Low-risk AML vs. ALL, after platelet recovery – b = (-3.06) = – HR (95% CI): 0.18 (0.08, 0.41)

R Code for the HR and 95% CI > betahr<-reg7$coef[1]+reg7$coef[8] > betahr factor(Disease) > seintx<-sqrt(reg7$var[1,1]+reg7$var[8,8]+2*reg7$var[1,8]) > seintx [1] > exp(betahr - qnorm(0.975)*seintx) factor(Disease) > exp(betahr + qnorm(0.975)*seintx) factor(Disease)

Other Interactions? High risk AML vs. ALL? High risk AML vs. Low Risk AML? Age? …

What About Continuous Covariates Continuous variables can change over time as well Given the times measurements are taken, we can expand the data in the same way. We are assuming the value is unchanging during the interval between which it was measured – A little unrealistic BUT… – This is no different from treating a single measure (e.g. blood pressure) as a fixed time covariate

Next Time Regression Diagnostics… checking the proportional hazards assumption.