03/20161 EPI 5344: Survival Analysis in Epidemiology Estimating S(t) from Cox models March 29, 2016 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine, University of Ottawa
03/20162 Objectives Theory of estimating S(t) SAS methods
03/20163 Cox methods do not require an equation for h(t) –does not give direct estimates of h(t), H(t) or S(t). If we could estimate any of these, we could get the others: Introduction (1) Assuming h(t) is constant in interval
03/20164 Introduction (2) With Cox, normally have explanatory/predictor variables Would like S(t) for different values of predictors. Proportional Hazards assumption makes this ‘easy’ once we get over basic hurdle
03/20165 Since hazards are proportional, all we need to know is h(t) or S(t) for the baseline group: h 0 (t) or S 0 (t) Introduction (3) So, estimate S 0 (t) and we can get S(t) for any ‘x’s’ S(t|x) is covariate adjusted survival curve
03/20166 How do we get estimate of S 0 (t)? Must use methods outside Cox regression. Two common approaches are used: –Generalize the Kaplan-Meier method to estimate S 0 (t) –Generalize Nelson-Aalen method to estimate H 0 (t) Implemented by using the BASELINE statement in SAS. First, a review of the technical background Introduction (4)
A bit of technical stuff We assume a piecewise constant hazard model based on risk sets –It keeps coming up, doesn’t it 03/20167
A bit of technical stuff What we want to do is estimate the piecewise hazard for the interval ending at each risk set. There are a number of ways to estimate this We’ll just look at one to get a general idea of what is being done 03/20168
9 Formula gives estimate of h(t) We don’t care about the origin. It assumes there are no ties. Risk set ’m’ occurs at time ‘t j ’ Where: ‘l’ is subject having event at t j ‘m’ is an index for the risk set at t j Collett, p 96
03/ Then, to get S(t), we use: Where: ‘k’ is subject whose survival curve we want
03/ Non-RCT study of therapy Hypernephroma (type of kidney cancer) All patients treated with –chemotherapy and –Immunotherapy Some also had their affected kidney removed. Questions –Does having a nephrectomy affect survival? –Does age affect survival? Example (1)
03/ Answer is YES A 2i = 1 if age is = 0 otherwise A 3i = 1 if age is 70+ = 0 otherwise N i = 1 if had nephrectomy = 0 if no nephrectomy A 3 & N are statistically significant. What do survival curves look like? Example (2)
03/201613
03/201614
03/201615
03/ Could try smoothing the h(t) curve Estimated baseline hazard curve
How did we produce these curves? Use the formulae given earlier. In SAS, getting one S(t) is easy. –ODS produces graph Having more flexibility requires more work –SAS uses the BASELINE statement –COVARIATES option is useful –ODS graphs or plot your own 03/ Example (3)
03/ The data
03/ To start, we need S(t) for one set of covariates –Can be ‘0’ for all variables –mean values –anything else. Can generate any survival curve once we have this. Here’s the simplest way to get an S(t) curve SAS Methods (1)
proc phreg data=njb1 plots=survival; class nephrec/param=ref ref=first order=internal; model time*vstatus(0) = nephrec ; run; 03/201620
03/201621
If you leave off the CLASS statement, you get a different curve (the ‘why’ comes later’) proc phreg data=njb1 plots=survival; model time*vstatus(0) = nephrec ; run; 03/201622
03/201623
03/201624
03/ How to go beyond this simple curve? SAS uses the BASELINE statement Needs to select covariate values What covariate values are used? –Can use the default –Can define your own –Can use more than one set at a time SAS Methods (2)
03/ Here’s the ‘why’ By default, SAS produces an S(t) using these covariate values: –The reference level of any variable mentioned in a class statement –The mean value of each other variable Let’s look at what the BASELINE statement does SAS Methods (3)
proc phreg data=njb1 plots=survival; model time*vstatus(0) = nephrec; baseline out = a survival = s lower = lcl upper = ucl ; run; proc print data=a (obs=10); run; 03/201627
03/ Why don’t we get the same curve?
03/ Content of data set ‘a’
Hmm? ‘nephrectomy’ is an ordinal variable (yes/no for surgery) Why is it given the value in generating S(t)? –Failed to put it in a CLASS statement –It had numerical values assigned (0/1) –SAS treated it as an interval variable = prob(nephrectomy= 1) 03/201630
Let’s fix the nephrectomy covariate value: proc phreg data=njb1 plots=survival; class nephrec/param=ref ref=first order=internal; model time*vstatus(0) = nephrec; baseline out = b survival = s lower = lcl upper = ucl ; run; proc print data=b (obs=10); run; 03/201631
03/ Same curve as before
03/ Content of data set ‘b’
BASELINE output What can we do with the output from the BASELINE statement? –Use as input to your own graphic plots –Use as data within PHREG to create more curves Next, we need to learn about the COVARIATES option 03/201634
What if you want S(t) for some other set of covariates? –Interested in specific target group –Want to contrast extremes of the range of variables Need to tell SAS what values to use for covariates. Use COVARIATES option in SAS 03/ COVARIATES option
data covals2a; input nephrec; format nephrec nephrec. ; datalines; 1 ; run; proc phreg data=njb1 plots=survival; class nephrec/param=ref ref=first order=internal; model time*vstatus(0) = nephrec; baseline covariates = covals2a; run; 03/201636
03/201637
What if you want to display the results of multiple covariate sets on the same graph? –Method #1 Run several of previous models, for different covariates. Combine the output datasets into one dataset Plot using SAS Graph, etc. –Method #2 list more than one set of covariates in the covariate data set ODS graphics has complex options to overlay plots 03/ COVARIATES
data covals2; input nephrec; format nephrec nephrec. ; datalines; 0 1 ; run; proc phreg data=njb1 plots(overlay)=survival; class nephrec/param=ref ref=first order=internal; model time*vstatus(0) = nephrec; baseline covariates = covals2; run; 03/201639
03/201640
data covals3; input nephrec agegrp; format agegrp agegrp. nephrec nephrec. ; datalines; ; run; 03/201641
proc phreg data=njb1 plots(overlay)=survival; class nephrec agegrp/param=ref ref=first order=internal; model time*vstatus(0) = nephrec agegrp; baseline covariates = covals3; run; 03/201642
03/201643
03/ With multiple covariates, ODS doesn’t give nice labels by default. Can get labels with ODS but it needs more code Producing your own graphs lets you label things. ODS graphs
03/ Allison produces S(t) using the ‘STRATA’ statement STRATA used to allow different baseline hazards for the variable in the statement STRATA sex; –allows different baseline hazard for men and women Can produce S(t) curves Using STRATA (1)
proc phreg data=njb1 plots(overlay=row)=survival; class nephrec/param=ref ref=first order=internal; model time*vstatus(0) = nephrec; strata nephrec; run; 03/201646
03/201647
03/201648
03/ Curves are different. Could generate the STRATA curves by running sex-specific Cox models Using STRATA (2)
proc phreg data=njb1 plots(range=0,115)=survival; class nephrec/param=ref ref=first order=internal; model time*vstatus(0) = nephrec; where nephrec = 0; run; proc phreg data=njb1 plots=survival; class nephrec/param=ref ref=first order=internal; model time*vstatus(0) = nephrec; where nephrec = 1; run; 03/201650
03/201651
03/ Two different sets of S(t) curves STRATA –Curves have different baseline hazards –Do not follow PH for the variable in the STRATA statement BASELINE –Curves have same baseline hazards –Follow PH for the variables defining curves. Using STRATA (3)
03/201653