Applications of G-estimation using a new Stata command Jonathan Sterne Kate Tilling Department of Social Medicine, University of Bristol UK
Outline Time varying confounding and G-estimation G-estimation in Stata Applications Discussion and future plans
A covariate is a time-varying confounder for the effect of exposure on outcome if: 1.past covariate values predict current exposure 2.current covariate value predicts outcome Example: 1.people with low CD4 are more likely to get HAART 2.Low CD4 is a risk factor for AIDS and death If, in addition, past exposure predicts current covariate value then standard survival analyses with time-updated exposure effects will give biased exposure effect estimates For example, CD4 count predicts HAART and HAART raises CD4 counts
G-estimation (1) Assume that subject i has an underlying counterfactual failure time U i - the time to failure had they never been exposed. This is unobservable for subjects who were exposed at any time Assume that exposure accelerates failure time by a factor exp(- ) - the causal survival time ratio. So if 0 exposure decreases survival If we knew , then for any subject who experienced the outcome event at time T i, the counterfactual failure time could be derived by: Example: if subject i experienced the outcome event at 5 years and was exposed for 3 years then U i =3 exp( )+2
G-estimation (2) Assume that there are no unmeasured confounders conditional on measured history (past and present confounders and past exposure) subjects’ present exposure is independent of their counterfactual failure time U i e.g. for 2 individuals with identical histories, the decision to quit smoking does not depend on underlying survival time Use logistic regression to search for a value of that satisfies this condition
No competing risks Replace U( ) with variable indicating whether individual would have been observed to fail both if they were exposed and if they were unexposed. Competing risks Assume that conditional on known covariates censoring due to competing risks is independent of failure time Estimate the cumulative probability of being free from competing risks until end of follow up, and weight by the inverse of this probability. Censoring
The stgest command Written for Stata User specifies exposure, covariates (including baseline and lagged covariates) and any censoring variables Data set up in Stata survival analysis format (i.e. start time, end time and failure indicator for each interval for each individual) Uses interval bisection method to search for G- estimate and 95% CI (or user can specify range and ‘step’ for grid search)
Caerphilly study –2512 men first examined 1979 to 1983, mean age at baseline 52 years –Three further follow up surveys with ascertainment of MI and deaths to August 2000 –Data from the first examination is used to provide baseline exposure measures, so follow- up starts from the second examination –1756 men included in analyses –244 had a first MI or died from CHD between the second examination and the end of follow up
Baseline smoking history, age, self-reported CHD, gout, diabetes, high blood pressure Every visit BP, BMI, smoking status, total cholesterol, CHD, gout, diabetes, fibrinogen Data
Four possibilities: –Not censored1175 (66.9%) –MI or MI death244 (13.9%) –Death from other cause231 (13.2%) –Lost106 (6.0%) Multinomial logistic regression estimate the probability that each id was censored (last two categories) as the product of the probability of censoring at each examination Censoring
list id visit examdat exitdate mi examdat2 cursmok if touse id visit examdat exitdate mi examdat2 cursmok sep jul jul jul mar jul mar jun jul sep sep sep sep nov sep nov oct sep oct dec sep sep oct oct oct nov oct nov nov oct nov dec oct1984 1
. stset exitdate, id(id) failure(mi) origin(time examdat2) scale(365.25) id: id failure event: mi ~= 0 & mi ~=. obs. time interval: (exitdate[_n-1], exitdate] exit on or before: failure t for analysis: (time-origin)/ origin: time examdat total obs obs. end on or before enter() obs. remaining, representing 1756 subjects 244 failures in single failure-per-subject data total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t =
. list id visit examdat exitdate mi _t0 _t _d _st if touse, noobs nodisp id visit examdat exitdate mi _t0 _t _d _st sep jul jul mar mar jun sep sep sep nov nov oct oct dec sep oct oct nov nov nov nov dec
. makebase cursmok hearta gout highbp diabet fibrin chol cholsq /* > */ bpsyst bpdias obese thin, firstvis(1) visit(visit) Baseline confounders storage display value variable name type format label variable label Bcursmok byte %9.0g Bhearta byte %9.0g Bgout byte %9.0g Bhighbp byte %9.0g Bdiabet byte %9.0g Bfibrin float %9.0g Bchol float %9.0g Bcholsq float %9.0g Bbpsyst int %9.0g Bbpdias int %9.0g Bobese byte %9.0g Bthin byte %9.0g
. makelag cursmok hearta gout highbp diabet fibrin chol cholsq /* > */ bpsyst bpdias obese thin, firstvis(1) visit(visit) Lagged confounders storage display value variable name type format label variable label Lcursmok byte %9.0g Lhearta byte %9.0g Lgout byte %9.0g Lhighbp byte %9.0g Ldiabet byte %9.0g Lfibrin float %9.0g Lchol float %9.0g Lcholsq float %9.0g Lbpsyst int %9.0g Lbpdias int %9.0g Lobese byte %9.0g Lthin byte %9.0g
. stcox cursmok Agegrp* hearta gout highbp diabet fibrin chol cholsq bpsyst bpdias obese thin B* L* failure _d: mi analysis time _t: (exitdate-origin)/ origin: time examdat2 id: id No. of subjects = 1756 Number of obs = 4621 No. of failures = 244 Time at risk = LR chi2(41) = Log likelihood = Prob > chi2 = _t | _d | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] cursmok | (remaining output omitted)
. stgest cursmok Agegrp* fibrin hearta gout highbp diabet chol cholsq bpsyst bpdias obese thin, visit(visit) firstvis(2) lagconf(cursmok fibrin hearta gout highbp diabet chol cholsq bpsyst bpdias obese thin) baseconf(fibrin hearta gout highbp cursmok chol cholsq diabet bpsyst bpdias obese thin) lasttime(mienddat) range(-2 2) saveres(caergestsmoknocens) replace causvar: cursmok visit: visit Range: -2 2, rnum: 2 Search method: interval bisection savres: caergestsmoknocens G estimate of psi for cursmok: (95% CI to 0.368) Causal survival time ratio for cursmok: (95% CI to 1.001)
. weibull _t cursmok Agegrp* hearta gout highbp diabet fibrin chol cholsq bpsyst bpdias obese thin B* L* if visit>=2, dead(_d) t0(_t0) hr _t | Haz. Ratio Std Err z P>|z| [95% Conf. Interval] cursmok | (rest of output omitted). gesttowb g-estimated hazard ratio 1.28 ( 1.00 to 1.47)
. * allowing for censoring due to competing risks;. stgest cursmok Agegrp* fibrin hearta gout highbp diabet chol cholsq bpsyst bpdias obese thin, visit(visit) firstvis(2) lagconf(fibrin hearta gout highbp diabet cursmok chol cholsq bpsyst bpdias obese thin) baseconf(fibrin hearta gout highbp cursmok chol cholsq diabet bpsyst bpdias obese thin) lasttime(mienddat) saveres(caergestsmok) replace idcens(idcrcens) range(-2 2) pnotcens(pnotcens) G estimate of psi for cursmok: (95% CI to 0.773) Causal survival time ratio for cursmok: (95% CI to 1.210). gesttowb g-estimated hazard ratio 1.34 ( 0.82 to 2.19)
Atherosclerosis Risk in Communities (ARIC) study 15, 792 members of 4 communities in the USA baseline exam between 1987 and follow-up exams at 3 year intervals followed up for death, CHD and stroke
ARIC data Baseline smoking history, education level, age, sex, ethnicity, self-reported stroke/CHD Every visit BP, BMI, smoking status, total, HDL and LDL cholesterol, diabetes status, use of anti-hypertensive medication
ARIC data persons with data on visits 1 and (55%) female Mean age =54 (min=45, max=65). CHD present in 625 (5%) 9754 (70%) not on anti-hypertensive medication at visits 1 or 2.
Methods Weibull analysis and G-estimation Outcomes - death, incident CHD. CHD as outcome - exclude those with CHD at baseline/1st visit, censor if die of other causes Exposures - BP, smoking, BMI, HDL,LDL BP - exclude those on anti-hypertensives at baseline, censor at anti hypertensive use.
Results Published in the American Journal of Epidemiology, April 15th Tilling K, Sterne JAC, Szklo M. G-estimation of the effects of cardiovascular risk factors on all-cause mortality and CHD: the ARIC study. AJE 2004; 155: Summary: effects tended to be under-estimated by Weibull compared to g-estimation.
Discussion - model specification Model specified that exposure at a given visit multiplies survival from that moment by a given amount. Alternatives: effect on survival only lasts for a given period (e.g. use of anti-hypertensives) effect on survival starts after a given period (e.g. possible lagged effect of smoking)
Future work and (we hope) collaboration Implement MSMs in Stata Effect of cardiovascular risk factors (e.g. smoking, fibrinogen) and anti-hypertensives in Caerphilly study Effect of treatments (e.g. anti-hypertensives, anti- platelet agents) on stroke recurrence using South London Stroke Register
Future work and (we hope) collaboration Causal effect of HAART –When to start –Effect of different drug combinations –Will require large collaborations between cohorts –Aim to build on an existing collaboration between 13 cohorts involving patients starting HAART