Statistics 262: Intermediate Biostatistics

Statistics 262: Intermediate Biostatistics
May 18, 2004: Cox Regression III: residuals and diagnostics, repeated events
Jonathan Taylor and Kristin Cobb

Residuals
Residuals are used to investigate the lack of fit of a model to a given subject. For Cox regression, there's no easy analog to the usual "observed minus predicted" residual of linear regression

Deviance Residuals
Deviance residuals are based on martingale residuals: ci (1 if event, 0 if censored) minus the estimated cumulative hazard to ti (as a function of fitted model) for individual i: ci-H(ti,Xi,ßi)
See Hosmer and Lemeshow for more discussion…

Deviance Residuals
Behave like residuals from ordinary linear regression
Should be symmetrically distributed around 0 and have standard deviation of 1.0.
Negative for observations with longer than expected observed survival times.
Plot deviance residuals against covariates to look for unusual patterns.

Deviance Residuals
In SAS, option on the output statement:
Ouput out=outdata resdev=

Schoenfeld residuals
Schoenfeld (1982) proposed the first set of residuals for use with Cox regression packages
Schoenfeld D. Residuals for the proportional hazards regresssion model. Biometrika, 1982, 69(1):
Instead of a single residual for each individual, there is a separate residual for each individual for each covariate
Based on the individual contributions to the derivative of the log partial likelihood (see chapter 6 in Hosmer and Lemeshow for more math details, p )
Note: Schoenfeld residuals are not defined for censored individuals.

Schoenfeld residuals
Where K is the covariate of interest,
the Schoenfeld residual is the covariate-value, Xik, for the person (i) who actually died at time ti minus the expected value of the covariate for the risk set at ti (=a weighted-average of the covariate, weighted by each individual's likelihood of dying at ti).
Plot Schoenfeld residuals against time to evaluate PH assumption

Schoenfeld residuals
In SAS: option on the output statement: ressch=
Satistics 262

Influence diagnostics
How would the result change if a particular observation is removed from the analysis?

Influence statistics
Likelihood displacement (ld): measures influence of removing one individual on the model as a whole. What's the change in the likelihood when this individual is omitted?
DFBETA-how much each coefficient will change by removal of a single observation
negative DFBETA indicates coefficient increases when the observation is removed

Influence statistics
In SAS: option on the output statement:
ld= dfbeta=

What about repeated events?
Death (presumably) can only happen once, but many outcomes could happen twice…
Fractures
Heart attacks
Pregnancy
Etc…

Repeated events: Strategy 1
Strategy 1: run a second Cox regression (among those who had a first event) starting with first event time as the origin
Repeat for third, fourth, fifth, events, etc.
Problems: increasingly smaller and smaller sample sizes.

Repeated events: Strategy 2
Treat each interval as a distinct observation, such that someone who had 3 events, for example, gives 3 observations to the dataset
Major problem: dependence between the same individual

Strategy 3
Stratify by individual ("fixed effects partial likelihood")
In PROC PHREG: strata id;
Problems: does not work well with RCT data, however requires that most individuals have at least 2 events
Can only estimate coefficients for those covariates that vary across successive spells for each individual; this excludes constant personal characteristics such as age, education, gender, ethnicity, genotype

