Applications of G-estimation using a new Stata command Jonathan Sterne Kate Tilling Department.

Slides:



Advertisements
Similar presentations
June 25, 2006 Propensity Score Adjustment in Survival Models Carolyn Rutter Group Health Cooperative AcademyHealth, Seattle WA.
Advertisements

Allison Dunning, M.S. Research Biostatistician
If we use a logistic model, we do not have the problem of suggesting risks greater than 1 or less than 0 for some values of X: E[1{outcome = 1} ] = exp(a+bX)/
Abstract Unmarried Working Men and Unhappily Married at Age Carry Excess Risk of 34-year Stroke Mortality Uri Goldbourt, Department of Epidemiology.
Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.
SC968: Panel Data Methods for Sociologists
Journal Club Alcohol, Other Drugs, and Health: Current Evidence January–February 2009.
Measures of Disease Association Measuring occurrence of new outcome events can be an aim by itself, but usually we want to look at the relationship between.
Duration models Bill Evans 1. timet0t0 t2t2 t 0 initial period t 2 followup period a b c d e f h g i Flow sample.
The Relationship Between Weight Status During Early Adulthood And Successful Aging In Elderly Canadian Males: The Manitoba Follow-up Study Dennis J. Bayomi,
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Using time-dependent covariates in the Cox model THIS MATERIAL IS NOT REQUIRED FOR YOUR METHODS II EXAM With some examples taken from Fisher and Lin (1999)
Cohort Studies.
Missing Data.. What do we mean by missing data? Missing observations which were intended to be collected but: –Never collected –Lost accidently –Wrongly.
1 Journal Club Alcohol, Other Drugs, and Health: Current Evidence November–December 2010.
Main Points to be Covered Cumulative incidence using life table method Difference between cumulative incidence based on proportion of persons at risk and.
Measures of disease frequency (I). MEASURES OF DISEASE FREQUENCY Absolute measures of disease frequency: –Incidence –Prevalence –Odds Measures of association:
Analysis of Complex Survey Data
C-REACTIVE PROTEIN, FIBRINOGEN, AND CARDIOVASCULAR DISEASE PREDICTION By Patrick Whitledge PA-S2 South University Physician Assistant Program.
Emily O’Brien, Emil Fosbol, Andrew Peng, Karen Alexander, Matthew Roe, Eric Peterson The Obesity Paradox: The Importance for Long-term Outcomes in Non-ST-Elevation.
Results of Monotherapy in ALLHAT: On-treatment Analyses ALLHAT Outcomes for participants who received no step-up drugs.
1 Journal Club Alcohol, Other Drugs, and Health: Current Evidence January–February 2014.
Longitudinal Data Analysis for Social Science Researchers Thinking About Event Histories
1 predcumi: A postestimation command for predicting and visualising cumulative incidence estimates after Cox regression models. Stephen Kaptoge Department.
Biostat 209 Survival Data l John Kornak April 2, 2013 Reading VGSM 3.5 (review) and epibiostat.ucsf.edu/biostat/vgs.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Survival Data John Kornak March 29, 2011
G Lecture 121 Analysis of Time to Event Survival Analysis Language Example of time to high anxiety Discrete survival analysis through logistic regression.
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
The life table LT statistics: rates, probabilities, life expectancy (waiting time to event) Period life table Cohort life table.
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
EHA: More On Plots and Interpreting Hazards Sociology 229A: Event History Analysis Class 9 Copyright © 2008 by Evan Schofer Do not copy or distribute without.
Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Predicting risk of cardiovascular disease and the cost-effectiveness of interventions in Thailand Stephen Lim On Behalf of the Setting Priorities using.
Lower the better; the case for glucose Professor Taner DAMCI Istanbul University Cerrahpaşa Medical School, TURKEY.
Studying mortality trends: The IMPACT CHD Policy Model
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
Lipoatrophy and lipohypertrophy are independently associated with hypertension: the effect of lipoatrophy but not lipohypertrophy on hypertension is independent.
HSRP 734: Advanced Statistical Methods July 31, 2008.
CREATE Biostatistics Core THRio Statistical Considerations Analysis Plan.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Chapter 2 Nature of the evidence. Chapter overview Introduction What is epidemiology? Measuring physical activity and fitness in population studies Laboratory-based.
Pro gradu –thesis Tuija Hevonkorpi.  Basic of survival analysis  Weibull model  Frailty models  Accelerated failure time model  Case study.
LEADING RESEARCH… MEASURES THAT COUNT Challenges of Studying Cardiovascular Outcomes in ADHD Elizabeth B. Andrews, MPH, PhD, VP, Pharmacoepidemiology and.
BIOST 536 Lecture 1 1 Lecture 1 - Introduction Overview of course  Focus is on binary outcomes  Some ordinal outcomes considered Simple examples Definitions.
Describing the risk of an event and identifying risk factors Caroline Sabin Professor of Medical Statistics and Epidemiology, Research Department of Infection.
Parametric Conditional Frailty Models for Recurrent Cardiovascular Events in the LIPID Study Dr Jisheng Cui Deakin University, Melbourne.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
Background There are 12 different types of medications to lower blood sugar levels in patients with type 2 diabetes. It is widely agreed upon that metformin.
Satistics 2621 Statistics 262: Intermediate Biostatistics Jonathan Taylor and Kristin Cobb April 20, 2004: Introduction to Survival Analysis.
1 Using dynamic path analysis to estimate direct and indirect effects of treatment and other fixed covariates in the presence of an internal time-dependent.
01/20151 EPI 5344: Survival Analysis in Epidemiology Quick Review from Session #1 March 3, 2015 Dr. N. Birkett, School of Epidemiology, Public Health &
Probability and odds Suppose we a frequency distribution for the variable “TB status” The probability of an individual having TB is frequencyRelative.
1 Ordinal Models. 2 Estimating gender-specific LLCA with repeated ordinal data Examining the effect of time invariant covariates on class membership The.
The parametric g-formula and inverse probability weighting
EPI 5344: Survival Analysis in Epidemiology Week 6 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine, University of Ottawa 03/2016.
Carina Signori, DO Journal Club August 2010 Macdonald, M. et al. Diabetes Care; Jun 2010; 33,
Kelsey Vonderheide, PA1.  Heart Failure—a large number of conditions affecting the structure and function of the heart that make it difficult for the.
Survival time treatment effects
Harvard T.H. Chan School of Public Health
Alcohol Consumption and Cardiac Biomarkers: The Atherosclerosis Risk in Communities (ARIC) Study M. Lazo, Y. Chen, J.W. McEvoy, C. Ndumele, S. Konety,
Copyright © 2007 American Medical Association. All rights reserved.
Statistics 103 Monday, July 10, 2017.
The percentage of subjects with de novo development of renal function impairment (GFR
Presenter: Wen-Ching Lan Date: 2018/03/28
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Body-mass index and cause-specific mortality in 900 000 adults: collaborative analyses of 57 prospective studies  Prospective Studies Collaboration  The.
Presentation transcript:

Applications of G-estimation using a new Stata command Jonathan Sterne Kate Tilling Department of Social Medicine, University of Bristol UK

Outline Time varying confounding and G-estimation G-estimation in Stata Applications Discussion and future plans

A covariate is a time-varying confounder for the effect of exposure on outcome if: 1.past covariate values predict current exposure 2.current covariate value predicts outcome Example: 1.people with low CD4 are more likely to get HAART 2.Low CD4 is a risk factor for AIDS and death If, in addition, past exposure predicts current covariate value then standard survival analyses with time-updated exposure effects will give biased exposure effect estimates For example, CD4 count predicts HAART and HAART raises CD4 counts

G-estimation (1) Assume that subject i has an underlying counterfactual failure time U i - the time to failure had they never been exposed. This is unobservable for subjects who were exposed at any time Assume that exposure accelerates failure time by a factor exp(-  ) - the causal survival time ratio. So if  0 exposure decreases survival If we knew , then for any subject who experienced the outcome event at time T i, the counterfactual failure time could be derived by: Example: if subject i experienced the outcome event at 5 years and was exposed for 3 years then U i =3  exp(  )+2

G-estimation (2) Assume that there are no unmeasured confounders conditional on measured history (past and present confounders and past exposure) subjects’ present exposure is independent of their counterfactual failure time U i e.g. for 2 individuals with identical histories, the decision to quit smoking does not depend on underlying survival time Use logistic regression to search for a value of  that satisfies this condition

No competing risks Replace U(  ) with variable indicating whether individual would have been observed to fail both if they were exposed and if they were unexposed. Competing risks Assume that conditional on known covariates censoring due to competing risks is independent of failure time Estimate the cumulative probability of being free from competing risks until end of follow up, and weight by the inverse of this probability. Censoring

The stgest command Written for Stata User specifies exposure, covariates (including baseline and lagged covariates) and any censoring variables Data set up in Stata survival analysis format (i.e. start time, end time and failure indicator for each interval for each individual) Uses interval bisection method to search for G- estimate and 95% CI (or user can specify range and ‘step’ for grid search)

Caerphilly study –2512 men first examined 1979 to 1983, mean age at baseline 52 years –Three further follow up surveys with ascertainment of MI and deaths to August 2000 –Data from the first examination is used to provide baseline exposure measures, so follow- up starts from the second examination –1756 men included in analyses –244 had a first MI or died from CHD between the second examination and the end of follow up

Baseline smoking history, age, self-reported CHD, gout, diabetes, high blood pressure Every visit BP, BMI, smoking status, total cholesterol, CHD, gout, diabetes, fibrinogen Data

Four possibilities: –Not censored1175 (66.9%) –MI or MI death244 (13.9%) –Death from other cause231 (13.2%) –Lost106 (6.0%) Multinomial logistic regression estimate the probability that each id was censored (last two categories) as the product of the probability of censoring at each examination Censoring

list id visit examdat exitdate mi examdat2 cursmok if touse id visit examdat exitdate mi examdat2 cursmok sep jul jul jul mar jul mar jun jul sep sep sep sep nov sep nov oct sep oct dec sep sep oct oct oct nov oct nov nov oct nov dec oct1984 1

. stset exitdate, id(id) failure(mi) origin(time examdat2) scale(365.25) id: id failure event: mi ~= 0 & mi ~=. obs. time interval: (exitdate[_n-1], exitdate] exit on or before: failure t for analysis: (time-origin)/ origin: time examdat total obs obs. end on or before enter() obs. remaining, representing 1756 subjects 244 failures in single failure-per-subject data total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t =

. list id visit examdat exitdate mi _t0 _t _d _st if touse, noobs nodisp id visit examdat exitdate mi _t0 _t _d _st sep jul jul mar mar jun sep sep sep nov nov oct oct dec sep oct oct nov nov nov nov dec

. makebase cursmok hearta gout highbp diabet fibrin chol cholsq /* > */ bpsyst bpdias obese thin, firstvis(1) visit(visit) Baseline confounders storage display value variable name type format label variable label Bcursmok byte %9.0g Bhearta byte %9.0g Bgout byte %9.0g Bhighbp byte %9.0g Bdiabet byte %9.0g Bfibrin float %9.0g Bchol float %9.0g Bcholsq float %9.0g Bbpsyst int %9.0g Bbpdias int %9.0g Bobese byte %9.0g Bthin byte %9.0g

. makelag cursmok hearta gout highbp diabet fibrin chol cholsq /* > */ bpsyst bpdias obese thin, firstvis(1) visit(visit) Lagged confounders storage display value variable name type format label variable label Lcursmok byte %9.0g Lhearta byte %9.0g Lgout byte %9.0g Lhighbp byte %9.0g Ldiabet byte %9.0g Lfibrin float %9.0g Lchol float %9.0g Lcholsq float %9.0g Lbpsyst int %9.0g Lbpdias int %9.0g Lobese byte %9.0g Lthin byte %9.0g

. stcox cursmok Agegrp* hearta gout highbp diabet fibrin chol cholsq bpsyst bpdias obese thin B* L* failure _d: mi analysis time _t: (exitdate-origin)/ origin: time examdat2 id: id No. of subjects = 1756 Number of obs = 4621 No. of failures = 244 Time at risk = LR chi2(41) = Log likelihood = Prob > chi2 = _t | _d | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] cursmok | (remaining output omitted)

. stgest cursmok Agegrp* fibrin hearta gout highbp diabet chol cholsq bpsyst bpdias obese thin, visit(visit) firstvis(2) lagconf(cursmok fibrin hearta gout highbp diabet chol cholsq bpsyst bpdias obese thin) baseconf(fibrin hearta gout highbp cursmok chol cholsq diabet bpsyst bpdias obese thin) lasttime(mienddat) range(-2 2) saveres(caergestsmoknocens) replace causvar: cursmok visit: visit Range: -2 2, rnum: 2 Search method: interval bisection savres: caergestsmoknocens G estimate of psi for cursmok: (95% CI to 0.368) Causal survival time ratio for cursmok: (95% CI to 1.001)

. weibull _t cursmok Agegrp* hearta gout highbp diabet fibrin chol cholsq bpsyst bpdias obese thin B* L* if visit>=2, dead(_d) t0(_t0) hr _t | Haz. Ratio Std Err z P>|z| [95% Conf. Interval] cursmok | (rest of output omitted). gesttowb g-estimated hazard ratio 1.28 ( 1.00 to 1.47)

. * allowing for censoring due to competing risks;. stgest cursmok Agegrp* fibrin hearta gout highbp diabet chol cholsq bpsyst bpdias obese thin, visit(visit) firstvis(2) lagconf(fibrin hearta gout highbp diabet cursmok chol cholsq bpsyst bpdias obese thin) baseconf(fibrin hearta gout highbp cursmok chol cholsq diabet bpsyst bpdias obese thin) lasttime(mienddat) saveres(caergestsmok) replace idcens(idcrcens) range(-2 2) pnotcens(pnotcens) G estimate of psi for cursmok: (95% CI to 0.773) Causal survival time ratio for cursmok: (95% CI to 1.210). gesttowb g-estimated hazard ratio 1.34 ( 0.82 to 2.19)

Atherosclerosis Risk in Communities (ARIC) study 15, 792 members of 4 communities in the USA baseline exam between 1987 and follow-up exams at 3 year intervals followed up for death, CHD and stroke

ARIC data Baseline smoking history, education level, age, sex, ethnicity, self-reported stroke/CHD Every visit BP, BMI, smoking status, total, HDL and LDL cholesterol, diabetes status, use of anti-hypertensive medication

ARIC data persons with data on visits 1 and (55%) female Mean age =54 (min=45, max=65). CHD present in 625 (5%) 9754 (70%) not on anti-hypertensive medication at visits 1 or 2.

Methods Weibull analysis and G-estimation Outcomes - death, incident CHD. CHD as outcome - exclude those with CHD at baseline/1st visit, censor if die of other causes Exposures - BP, smoking, BMI, HDL,LDL BP - exclude those on anti-hypertensives at baseline, censor at anti hypertensive use.

Results Published in the American Journal of Epidemiology, April 15th Tilling K, Sterne JAC, Szklo M. G-estimation of the effects of cardiovascular risk factors on all-cause mortality and CHD: the ARIC study. AJE 2004; 155: Summary: effects tended to be under-estimated by Weibull compared to g-estimation.

Discussion - model specification Model specified that exposure at a given visit multiplies survival from that moment by a given amount. Alternatives: effect on survival only lasts for a given period (e.g. use of anti-hypertensives) effect on survival starts after a given period (e.g. possible lagged effect of smoking)

Future work and (we hope) collaboration Implement MSMs in Stata Effect of cardiovascular risk factors (e.g. smoking, fibrinogen) and anti-hypertensives in Caerphilly study Effect of treatments (e.g. anti-hypertensives, anti- platelet agents) on stroke recurrence using South London Stroke Register

Future work and (we hope) collaboration Causal effect of HAART –When to start –Effect of different drug combinations –Will require large collaborations between cohorts –Aim to build on an existing collaboration between 13 cohorts involving patients starting HAART