Statistics 262: Intermediate Biostatistics May 18, 2004: Cox Regression III: residuals and diagnostics, repeated events Jonathan Taylor and Kristin Cobb Satistics 262
Residuals Residuals are used to investigate the lack of fit of a model to a given subject. For Cox regression, there’s no easy analog to the usual “observed minus predicted” residual of linear regression Satistics 262
Deviance Residuals Deviance residuals are based on martingale residuals: ci (1 if event, 0 if censored) minus the estimated cumulative hazard to ti (as a function of fitted model) for individual i: ci-H(ti,Xi,ßi) See Hosmer and Lemeshow for more discussion… Satistics 262
Deviance Residuals Behave like residuals from ordinary linear regression Should be symmetrically distributed around 0 and have standard deviation of 1.0. Negative for observations with longer than expected observed survival times. Plot deviance residuals against covariates to look for unusual patterns. Satistics 262
Deviance Residuals In SAS, option on the output statement: Ouput out=outdata resdev= Satistics 262
Schoenfeld residuals Schoenfeld (1982) proposed the first set of residuals for use with Cox regression packages Schoenfeld D. Residuals for the proportional hazards regresssion model. Biometrika, 1982, 69(1):239-241. Instead of a single residual for each individual, there is a separate residual for each individual for each covariate Based on the individual contributions to the derivative of the log partial likelihood (see chapter 6 in Hosmer and Lemeshow for more math details, p.198-199) Note: Schoenfeld residuals are not defined for censored individuals. Satistics 262
Schoenfeld residuals Where K is the covariate of interest, the Schoenfeld residual is the covariate-value, Xik, for the person (i) who actually died at time ti minus the expected value of the covariate for the risk set at ti (=a weighted-average of the covariate, weighted by each individual’s likelihood of dying at ti). Plot Schoenfeld residuals against time to evaluate PH assumption Satistics 262
Schoenfeld residuals In SAS: option on the output statement: ressch= Satistics 262
Influence diagnostics How would the result change if a particular observation is removed from the analysis? Satistics 262
Influence statistics Likelihood displacement (ld): measures influence of removing one individual on the model as a whole. What’s the change in the likelihood when this individual is omitted? DFBETA-how much each coefficient will change by removal of a single observation negative DFBETA indicates coefficient increases when the observation is removed Satistics 262
Influence statistics In SAS: option on the output statement: ld= dfbeta= Satistics 262
What about repeated events? Death (presumably) can only happen once, but many outcomes could happen twice… Fractures Heart attacks Pregnancy Etc… Satistics 262
Repeated events: 1 Strategy 1: run a second Cox regression (among those who had a first event) starting with first event time as the origin Repeat for third, fourth, fifth, events, etc. Problems: increasingly smaller and smaller sample sizes. Satistics 262
Repeated events:Strategy 2 Treat each interval as a distinct observation, such that someone who had 3 events, for example, gives 3 observations to the dataset Major problem: dependence between the same individual Satistics 262
Strategy 3 Stratify by individual (“fixed effects partial likelihood”) In PROC PHREG: strata id; Problems: does not work well with RCT data, however requires that most individuals have at least 2 events Can only estimate coefficients for those covariates that vary across successive spells for each individual; this excludes constant personal characteristics such as age, education, gender, ethnicity, genotype Satistics 262