Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician Session 2: Aging and Survival
Case Study
Cancer Registry + Medicare Records Death reports through 2002: Maximum follow-up = 10 years. Diagnosed
Follow Up Time
Life Expectancy: Mean Time from Diagnosis to Any Cause of Death How many men were in these subgroups? Calculated?
Some Subgroup Sizes ~35,755 x 0.40 x x x / 3 = ~100 ~105 ~ 40 ~ 86 ~ 96 ~ 36
Life Expectancy: More Detail N < 110 at each age N < 100 at each age N < 45 at each age
Summary So Far This subgroup has ~100 men followed ≤10 years, mean follow-up < 7 years, N=? died. Mean time to death extrapolated to over 9 years beyond maximum follow-up with apparently good precision. How was 19.1 years determined?
A Fixed Cohort Approach as in clinical trial, but add extrapolation Extrapolated Median Years to Death 95% “CI” Fitted Assumed Model Not Done here: With N=100, likely more error than is depicted. 95% Confidence Bands
Fixed Cohort Approach If followed long enough or deaths occur fast enough, would not need to extrapolate. Otherwise, would require specifying a model for the survival curve in order to extrapolate. Confidence intervals become wide even with minor extrapolation and do not incorporate uncertainty in the chosen model. The stepped Kaplan-Meier “curve” is the real data. Let’s review Kaplan-Meier before continuing with the paper.
Kaplan-Meier: Cumulated Probabilities Suppose that we want the probability of surviving for 5 years. If no subjects dropped by 5 years, then this prob is the same as the proportion of subjects alive at that time. If some subjects are lost to us before 5 years, then we cannot use the proportion because we don’t know the outcome for the dropped subjects, and hence the numerator. We can divide the 5 years into intervals using the dropped times as interval endpoints. Ns are different in these intervals. Then, find proportions surviving in each interval and cumulate by multiplying these proportions to get the survival probability. See next slide for example.
Kaplan-Meier: Cumulated Probabilities The survival curve below for made-up data for 100 subjects gives the probability of being alive at 5 years as about Suppose 9 subjects dropped at 2 years and 7 dropped at 4 yrs and 20, 20, and 17 died in the intervals 0-2, 2-4, 4-5 yrs. Then, the 0-2 yr interval has 80/100 surviving. The 2-4 interval has 51/71 surviving; 4-5 has 27/44 surviving. So, 5-yr survival prob is (80/100)(51/71)(27/44)= Actually uses finer subdivisions than 0-2, 2-4, 4-5 years, with exact death times.
Fixed Cohort with Complete Follow Up Fixed N at age 67, no dropout, everyone has a time to death: We have a direct mean time to death = Total Follow-Up/N = PY/N. Mortality rate = N/PY = 1/MeanTime N N Subject # Age Death at end of line. Sum of line lengths = PY.
Complete Follow Up Usually Unlikely Unlikely to be able to follow a single cohort so long: Suppose we have death records for one year for everyone 67 or older. Find N died and PY → MeanTime = PY/N. Follow up times are still for ≥67 years, but now, from different birth cohorts, i.e., persons 67 at different calendar times. Can generalize to more than 1 year of records, as in this study. Actuarial or life table methods - next slide.
General Life Table Method Age of Death % Die Cum % Survive = 0.90* = 0.72* = 0.61* = 0.02* *N with this F/U 0.90*0.20*N with this F/U 0.72*0.15*N with this F/U Sum of line lengths = PY. Mean time = PY/N. Age Made-up Data Has structure of the fixed cohort with complete follow up:
Life Table Method Cont’d Again, in general, follow up contributions are from different birth cohorts. Here, for age-specific time of diagnosis, birth cohorts are <10 years apart. Does not consider chronic conditions developed after diagnosis. Here, neither fixed cohort with complete follow up, nor general life table with all ages included. Does that matter? Next slide.
Life Table without All Age Groups Suppose there were uniform times of death between 67 and 101 years of age: N N Subject # Age There are an equal number of deaths in both periods, but much more follow up in 67-84, so if a fixed cohort study ends before complete follow up, the mortality rate = deaths/PY is an under estimate. However, mortality from life table is valid if rate of diagnoses are in steady state over time
Proportional Hazards Survival Analysis Kaplan-Meier curves use raw data - no modeling. Need separate curve for any subgroup. Cannot use for sparse subgroups due to high stratification or adjustment for factors on a continuum. Note use of proportional hazards model instead of Kaplan-Meier to make multiple adjustments:
Proportional Hazards Survival Analysis Hazard ~= short-term incidence (deaths/person-time) at time t; i.e., is a function over time. Hazard may increase or decrease over time, e.g., aging effects and removing the frail. Constant hazard → exponential survival curve. Hazard function for a baseline condition is unconstrained. The model forces hazards for other conditions to be proportional to it at all times. This is “semi-parametric”. There is not a fitted functional form like exponential, but more structure than Kaplan-Meier is imposed.
Proportional Hazards Survival Analysis Time Hazard FunctionsSurvival Curves Time Surv Prob Proportionality assumption forces structure on survival probabilities that Kaplan-Meier does not. Validity of the assumption can be checked with the data. h=b h=5b h=7b 0 Chronic conditions 1-2 Chronic conditions ≥3 Chronic conditions 46% 37% 25%