Analysis of Lethal Cancer Risk Among a Cohort of Initially Disease-free Women Bernard Rosner Channing Division of Network Medicine Harvard Medical School Boston, MA 02115 Joint Statistical Meetings Vancouver, BC August 1, 2018 This work was supported by NCI grant T32 CA 9001 and CA 87969.
Background Pre-diagnosis factors may influence the likelihood that a cancer causes a patient’s death. Several methods have been used to evaluate associations with lethal cancer among an initially disease-free population, but each has limitations. In this talk we present a novel two-stage method that separately estimates association of pre-diagnosis risk factors with cancer incidence and with survival among cancer cases and combines them to yield a single measure of association.
Nurses’ Health Study Established in 1976 among 121,700 US female registered nurses, age 30-55. Followed every 2 years by mail questionnaire to inquire about lifestyle factors, health behaviors and medical history This analysis began in 1984 since this was the first year in which postmenopausal estrogen and progesterone (E+P) therapy was used in appreciable frequency. Follow-up was through 2010 Goal is to relate pre-diagnosis risk factors to death due to breast cancer. 8675 total breast cancer cases, 1382 deaths due to breast cancer.
Notation t0 = beginning of follow-up T1 = time from t0 to breast cancer diagnosis T2 = time from breast cancer diagnosis to breast cancer death. t = total follow-up time from t0 Xt = exposure of interest at time t Z1t = vector of pre-diagnosis covariates measured at time t included in models of breast cancer incidence Z2t = vector of pre-diagnosis covariates measured at time t included in models of breast cancer survival (among cases) Vt = vector of post-diagnosis covariates measured at time t included in models of breast cancer survival (among cases)
Types of models considered Time to event analysis using baseline covariates (TTE) Time to event analysis using time-dependent covariates (TDC) Time to event analysis with updated covariates through breast cancer diagnosis (TDX) 4. Ordered Multiple Event Analysis (Prentice, Williams and Peterson) (PWP) 5. Two-stage combined incidence and survival model (2S)
Time-to-event Analysis using Baseline Covariates (TTE) The time scale is time between baseline and end of follow-up or death whichever occurred first. Follow-up is censored at the date of death, but only women who died of breast cancer were counted as cases. Women diagnosed with breast cancer but who did not die during follow-up were censored in 2010.
TTE (continued) Non-cases were censored at the earliest of the date of diagnosis of any other cancer (except non-melanoma skin cancer), the time of death, or the end of follow-up Issue: There may be a long interval between baseline and the end of follow-up and many risk factors may have changed. If more recent exposure is important this may bias HR estimates.
Time-to-event analysis with time-dependent covariates (TDC) The Anderson-Gill method was used to update covariates every 2 years. Issue: Some covariates may change dramatically after diagnosis, thus biasing HR estimates if there is a different association between an exposure and incidence vs. an exposure and deaths among cases.
Time-to-event Analysis with Updated Covariates until Breast Cancer Diagnosis (TDX) Issues There may be a long time period between breast cancer diagnosis and death due to breast cancer (i.e., T2 large) Therefore, lethal cases may have exposure at an earlier time than non-cases. If there are secular trends in risk factors over time this may bias HR estimates.
Ordered Multiple Event Analysis (Prentice-Williams-Peterson conditional models; Biometrika 1981;68(2):373-79) non cases: cases: Issue: are assumed to be the same for all exposures before and after diagnosis, which may be unrealistic.
Two-stage method (2S) - Notation S1(t1) = survival function for incidence = where t1 = time from baseline. S2(t2) = survival function for mortality due to breast cancer = where is time from breast cancer diagnosis. The corresponding hazard functions are respectively.
Two-stage method (2S) - Rationale In a standard survival analysis we are interested in estimating the survival function S(t) and the hazard function h(t) where t = time from baseline to the event of interest (in this case death due to breast cancer). The risk set R(t) = set of subjects who are at risk, but have not developed the endpoint as of time t.
Two-stage method (2S) – Rationale (cont.) Usually, all subjects in R(t) are at risk of developing the endpoint at time t and the probability of getting the endpoint between time t and is while the probability of not developing the endpoint by time t is Hence, the risk of developing the endpoint between time t and is risk =
Two-stage method rationale (cont.) However, this construct doesn’t take into account that the only subjects who are at risk of dying of breast cancer between time t and are women who already have the disease at time t. But, we can separately estimate based on the survival curves for incidence and mortality mentioned previously:
Two-stage method – Rationale (cont.) = probability of getting breast cancer shortly after time t1 x probability of dying of breast cancer shortly after t2 = probability of not getting lethal breast cancer by time t1+t2 = probability of not getting breast cancer by time t1+t2 if disease incidence is low
Two-stage method – Rationale (cont.) (1) = probability of getting breast cancer between time t1 and time years and dying of breast cancer between t2 and years after diagnosis/( )/probability of remaining disease free over time t1+t2 years.
Two-stage model – Inclusion of Covariates Fit a separate Cox Proportional Hazards Model for incidence and mortality where (2) (3)
Two-stage model – Inclusion of Covariates (cont.) We now consider the log hazard ratio (HR) comparing a subject with exposure x+1 vs. a subject with exposure x at time 0, where all other pre-diagnosis covariates are the same at time 0 and post-diagnosis covariates are the same at time t2. Based on equations 1-3, this is given by: (4)
Two-stage Model – Inclusion of Covariates (cont.) Note that if we have time-dependent covariates (as we do in the Nurses’ Health Study), then t1 = 0 and equation 4 reduces to:
Two-stage Model – Inclusion of Covariates (cont.) Since are generally small we approximate equation 5 by a 1st order Taylor series expansion about which yields: (6) In general ln(HR) is a function of time since breast cancer diagnosis = t2. However, for cancers where both incidence and mortality rates are not high (7)
Two-stage model - Inference Standard methods of inference are performed based on either equations 6 or 7 assuming asymptotic normality.
Simulation Study - design We simulated data using an exponential distribution to mimic the HRs for breast cancer incidence and death due to breast cancer for a variety of risk factors. We simulated 4000 datasets of 100,000 observation each under 6 different combinations of HRincidence and HRsurvival with the expected HR simulated using the 2S method. 28-year breast cancer incidence rates and 30-year breast cancer survival rates were taken from NHS data (similar to SEER rates). The results for TTE, PWP, 2S (eq. 6), 2S (eq. 7) for a subset of the simulations are given on the next slide.
Simulation Study Results Methods HRincidence HRsurvival Variable TTE 2S (eq. 6) 2S (eq. 7) PWP 1.0 Bias* -0.00 Coverage** 95.1 95.0 2.0 Bias -0.02 0.00 -0.01 -0.10 Coverage 93.0 94.9 94.8 5.0 0.04 0.10 -0.50 88.2 88.1 0.0 -0.04 -0.60 75.9 88.6 * mean observed ** % of 95% confidence intervals that include the true
Simulation Study - Discussion Under the null hypothesis that exposure is not associated with either incidence or mortality, all the methods have little bias and adequate coverage. Under the alternative hypothesis that exposure is related to either incidence and/or survival, (Ha), the PWP method has substantial bias and low coverage probability. Under the alternative hypothesis (Ha) the TTE method has a slight negative bias and a coverage probability less than 95%. Most of the person-time for the TTE method is for incidence.
Simulation Study – Discussion (cont.) Under Ha, the 2S method incorporating survival probabilities at the 2nd stage has low bias and adequate coverage probability. Under Ha where the exposure has an effect on survival, the simplified 2S method has some positive bias and coverage probability < 95%. Overall, the 2S method incorporating survival probabilities at the 2nd stage performs best.
Nurses’ Health Study – Data Analysis Postmenopausal women – follow-up from 1984-2010, 2,532,073 person-years of follow-up 8675 incident breast cancer, 1382 deaths due to breast cancer Goal: to assess pre-diagnostic risk factors that predict deaths due to breast cancer. On the next slide we show results for weight change since age 18, an established risk factor for breast cancer, using different methods of analysis mentioned previously in this talk.
Nurses’ Health Study – Data Analysis Incidence of Breast Cancer Breast cancer survival among cases TTE* TDC** TDX*** 2S (eq. 6) 2S (eq. 7) PWP # cases/breast cancer deaths 8675 1382 Person-years 2,439,134 101,348 2,532,073 Weight change since age 18 stayed within 5 kg 1.0 (ref) Gained > 30 kg 1.56 1.11 2.32 0.73 1.41 1.73 1.72 1.49 95% CI (1.42-1.71) (0.86-1.43) (1.77-3.03) (0.56-0.95) (1.10-1.80) (1.33-2.25) (1.32-2.25) (1.13-1.96) * covariates are from the baseline questionnaire (1984) and not updated ** covariates are updated throughout follow-up; *** covariates are updated until diagnosis for cases and until 2010 for non-cases. + results are adjusted for age at menarche, age and type of menopause, age at each birth, hormone therapy, smoking (pack-years), history of benign breast disease, family Hx of breast cancer, physical activity (met-hrs/wk), BMI at age 18 and alcohol intake.
Nurses’ Health Study – Data Analysis - Discussion Weight change of > 30 kg since age 18 is associated with an increased incidence of breast cancer (HR = 1.56, 95% CI = 1.42-1.71) and a small but not statistically significant increase in breast cancer deaths among cases (HR = 1.11, 95% CI = 0.86-1.43). The TTE method reflecting early weight change (HR = 2.32, 95% CI = 1.77-3.03) and the TDC method of updating weight after breast cancer diagnosis (HR = 0.73, 95% CI = 0.56-0.95) provided dramatically different results. The TDX method of updating weight until diagnosis (HR = 1.41, 95% CI = 1.10-1.80) provided results intermediate between TTE and TDC.
Nurses’ Health Study – Data Analysis – Discussion (cont.) The TDC method is confounded by weight change in response to breast cancer treatment modalities and the TTE method is confounded by large changes in weight since 1984. The TDX method underestimates the HR because weight for cases is updated until diagnosis while weight for non-cases is updated until 2010, thus ignoring the secular trend of an increase in weight over time.
Nurses’ Health Study – Data Analysis – Discussion (cont.) The PWP method (HR = 1.49, 95% CI = 1.13-1.60) is essentially a weighted average of the HR for incidence and survival (emphasizing the former due to the larger number of person-years). The 2S method based on equation 6 (HR = 1.73, 95% CI = 1.33-2.25) integrates the HR for incidence and mortality into one cumulative HR and seems appropriate for this design. The simplified 2S method (eq. 7) yields essentially the same results as the original 2S method (eq. 6). Not all breast cancer risk factors showed differences as large between methods.
Summary We have presented several approaches for assessing risk factors for lethal cancer among disease-free women. The key difference between this design and the ordinary survival analysis design is that a subject must encounter two events, (a) getting breast cancer and (b) dying of breast cancer, to be considered a “case.” Thus, a person is not in the risk set for the 2nd event until they have realized the 1st event.
Summary (cont.) Some traditional approaches such as TTE or TDC don’t seem appropriate because in the former case, risk factors may change substantially over a long period of time and in the latter case, may be influenced by treatment variables after diagnosis. The TDX method is also inappropriate because it ignores secular trends in risk factors after diagnosis. The usual “multiple events” analysis such as models for breast cancer recurrence of which PWP is a prototype, also doesn’t seem appropriate since (a) there is an assumption that effects of risk factors are the same at each stage, and (b) the real goal is to add effects of a risk factor over multiple stages rather than to average them.
Summary (cont.) The 2S method seems appropriate for this design and with approximation can yield a single HR estimate and can be implemented using standard Cox regression software at each stage. It might be applicable to other diseases but would require an extension for diseases that are sometimes immediately fatal at the 1st stage, e.g., heart disease where some subjects may die immediately after a heart attack while others survive and subsequently may die of heart disease at a later age.