The parametric g-formula and inverse probability weighting Sara Lodi Harvard T.H. Chan School of Public Health slodi@hsph.harvard.edu February 26th 2016
Recap from yesterday Observational studies should emulate a target trial without baseline randomization G-methods are needed in the presence of treatment confounder feedback IPW is used to correct for time-varying confounding and informative censoring
Treatment confounder feedback At: Antiretroviral therapy Y: Death Lt: CD4 cell count U: Immunologic status A0 L1 A1 Y U The time-varying confounders are affected by previous treatment
IPW Inverse probability weighting of marginal structural models Pseudo-population with no confounding by L1 Any outcome model can be used (marginal structural model)
Inverse probability weighting of marginal structural models A model for treatment Covariates: time-varying confounders A model for outcome with weights Covariates: time-varying treatment and (optional) baseline confounders (no time-varying confounders)
Today.. Introduce the g-formula Another method to adjust for treatment confounder feedback… Discuss differences and similarities between the g-formula and IPW
Example: when to start antiretroviral treatment (ART) in HIV-positive patients Combined antiretroviral treatment (ART) is effective in reducing the risk of AIDS and mortality HIV is now considered a chronic disease Life-long treatment Debate on optimal time to initiate ART
Early initiation 500 or AIDS 350 or AIDS Lodi et al. CID 2011
When to start ART TARGET TRIAL Eligibility criteria (HIV-CAUSAL Collaboration. Lancet HIV 2015) TARGET TRIAL Eligibility criteria HIV-1-infected, ART-naïve, CD4 count>500 Treatment strategies Immediate ART initiation Initiation at CD4 cell count of 500 or AIDS Initiation at CD4 cell count of 350 or AIDS Outcome Death Start/End follow-up From randomization to death, loss f-u, 7 years Analysis plan Risk of death at 7 years under each treatment strategy
Dynamic treatment strategy
In the observational data… Administrative censoring 1 June 2015 HIV diagnosis CD4 400 HIV-RNA 1000 ART started Treatment change Last visit HIV diagnosis CD4 200 HIV-RNA 10000 TB diagnosis Death HIV diagnosis CD4 600 HIV-RNA 1000 ART started Last visit L
In the observational data… No baseline randomization ART is a time-varying treatment ART initiation depends on prognostic factors that vary over time (time-varying confounders) such as CD4 count and HIV-RNA viral load
Treatment-confounder feedback At: Antiretroviral therapy Y: Death Lt: CD4 cell count U: Immunologic status A0 L1 A1 Y U The time-varying confounders are affected by previous treatment
Could use inverse probability weighting…
… or the g-formula First proposed in 1986 by Robins First realistic application to a complex longitudinal study published in 2009 (Taubman et al. AJE 2009) SAS software + documentation developed and publicly available online www.hsph.harvard.edu/causal/software.htm
G-formula Standardized risk Allows estimation and comparison of risks under hypothetical interventions in the presence of treatment confound feedback Simulates outcomes and time-varying confounders if all subjects in the study, contrary to the fact, had followed the intervention Can be viewed as an imputation method
G-formula as standardized risk Time-fixed confounding Standardised risk for fixed exposure 𝑎 ∗ and time-fixed confounder L a* = antiretroviral treatment (ART=1) Y = death L= CD4 count stratum (<350 or >=350 cells/mm3
G-formula as standardized risk Time-fixed confounding Standardised risk for fixed exposure 𝑎 ∗ and time-fixed confounder L Time-varying confounding Standardised risk for treatment strategy g* and time-varying confounder L g-formula a* = antiretroviral treatment (ART=1) Y = death L= CD4 count stratum (<350 or >=350 cells/mm3
G-formula Notation: k=0,1,…,𝐾 time after randomization 𝑔 treatment strategy (example: initiation at 2nd CD4<350) 𝑌 𝑘+1 outcome at time k+1 (composite event: NAIDS, AIDS or death) 𝐴 𝑘 treatment history up to time k (example: 0,0,0,…,1,1) 𝐿 𝑘 history of time-varying confounders (CD4 and HIV-RNA) up to time k
Parametric g-formula STEP 1. Regression models on the observed data to estimate each factor in the sum
Parametric g-formula STEP 1 A model for each time-varying confounder and for the outcome Covariates: baseline and time-varying confounders, treatment When to start ART example: Linear model for CD4 count Linear model for HIV-RNA Logistic model for death Time-varying confounders Outcome
Parametric g-formula STEP 2. Monte Carlo simulations For each treatment strategy, we use the model parameters to simulate a dataset in which all subjects follow the treatment strategy Simulate time-varying covariates and outcome at each time point (0,1,2,…) Simulations carried forward in time
Parametric g-formula STEP 2 When to start ART example: Simulate a dataset where all individuals start ART immediately Simulate a dataset where all individuals start ART at CD4 count<500 or AIDS Simulate a dataset where all individuals start ART at CD4 count<350 or AIDS
Parametric g-formula STEP 3 Use the simulated datasets to compute and compare the risk of the outcome under different treatment strategies Use bootstrap to compute confidence intervals HIV-CAUSAL Collaboration. Lancet HIV 2015
Parametric g-formula Assumptions No unmeasured or residual baseline or time-varying confounding Models for time-varying confounders and outcome should be correctly specified Positivity – the probability of having every value of the treatment is greater than zero Limitation G-null paradox
Other applications of the g-formula Estimation of the risk of coronary heart disease under interventions on risk factors (smoking, exercise, diet and weight loss) Taubman et al 2009 Estimation of the risk of adult onset asthma under interventions on BMI and physical activity Garcia-Aymerich J et al 2014
Inverse probability weighting G-formula Inverse probability weighting OR
Inverse probability weighting G-formula vs IPW (similarities) G-formula Inverse probability weighting Adjustment for time-varying confounders affected by prior exposure Same Compare multiple strategies simultaneously Assumptions: no unmeasured confounding, correct model specification and positivity
Inverse probability weighting G-formula vs IPW (differences) G-formula Inverse probability weighting Models for each time-varying covariates and outcome A model for treatment and a model the outcome Sensitive to model misspecification (errors reverberate in the simulations) Sensitive to extreme observations (large weights) Fully parametric approach based on max likelihood estimation (more efficient) Semi parametric approach (less efficient)
Summary G-formula estimates the counterfactual risk under hypothetical interventions Both the g-formula and IPW can be used to estimate the effect of interventions in the presence of treatment confounding feedback G-formula requires many models but in general gives more precise estimates (smaller confidence intervals)
References IPW and dynamic treatment strategies HIV-CAUSAL Coll. When to initiate combined antiretroviral therapy to reduce mortality and AIDS-defining illness in HIV-infected persons. Ann Intern Med. 2011 Apr 19;154(8):509-15 Applications of the g-formula HIV-CAUSAL Coll. Comparative effectiveness of immediate antiretroviral therapy versus CD4-based initiation in HIV-positive individuals. Lancet HIV. 2015 Aug;2(8):e335-43. Taubam et al. Intervening on risk factors for coronary heart disease: an application of the parametric g-formula. Int J Epidemiol. 2009 Dec; 38(6): 1599–1611. Garcia-Aymerich et al. Incidence of adult-onset asthma after hypothetical interventions on body mass index and physical activity. Am J Epidemiol. 2014 Jan 1;179(1):20-6 G-formula macro www.hsph.harvard.edu/causal/software.htm