Confounding adjustment: Ideas in Action -a case study Xiaochun Li, Ph.D. Associate Professor Division of Biostatistics Indiana University School of Medicine.

Slides:



Advertisements
Similar presentations
Controlling for Time Dependent Confounding Using Marginal Structural Models in the Case of a Continuous Treatment O Wang 1, T McMullan 2 1 Amgen, Thousand.
Advertisements

A workshop introducing doubly robust estimation of treatment effects
M2 Medical Epidemiology
1 Arlene Ash QMC - Third Tuesday September 21, 2010 (as amended, Sept 23) Analyzing Observational Data: Focus on Propensity Scores.
V.: 9/7/2007 AC Submit1 Statistical Review of the Observational Studies of Aprotinin Safety Part I: Methods, Mangano and Karkouti Studies CRDAC and DSaRM.
Departments of Medicine and Biostatistics
HSRP 734: Advanced Statistical Methods July 24, 2008.
ODAC May 3, Subgroup Analyses in Clinical Trials Stephen L George, PhD Department of Biostatistics and Bioinformatics Duke University Medical Center.
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
From last time….. Basic Biostats Topics Summary Statistics –mean, median, mode –standard deviation, standard error Confidence Intervals Hypothesis Tests.
1 Arlene Ash QMC - Third Tuesday September 21, 2010 Analyzing Observational Data: Focus on Propensity Scores.
Flexible modeling of dose-risk relationships with fractional polynomials Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical.
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 4. Further Issues.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Clustered or Multilevel Data
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 4. Further Issues.
Lecture 24: Thurs., April 8th
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
Classification and Prediction: Regression Analysis
Regression and Correlation
Stratification and Adjustment
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Unit 6: Standardization and Methods to Control Confounding.
Advanced Statistics for Interventional Cardiologists.
Concepts of Interaction Matthew Fox Advanced Epi.
Simple Linear Regression
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Epidemiology The Basics Only… Adapted with permission from a class presentation developed by Dr. Charles Lynch – University of Iowa, Iowa City.
1 Ratio estimation under SRS Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Article Review Cara Carty 09-Mar-06. “Confounding by indication in non-experimental evaluation of vaccine effectiveness: the example of prevention of.
Data Analysis – Statistical Issues Bernd Genser, PhD Instituto de Saúde Coletiva, Universidade Federal da Bahia, Salvador
1 Modeling Coherent Mortality Forecasts using the Framework of Lee-Carter Model Presenter: Jack C. Yue /National Chengchi University, Taiwan Co-author:
Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR 5/06.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Propensity Scoring and Beyond: Why? and How? Midwest Biopharmaceutical Statistics Workshop, 2009.
Osteoarthritis Initiative Analytic Strategies for the OAI Data December 6, 2007 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and.
Generalizing Observational Study Results Applying Propensity Score Methods to Complex Surveys Megan Schuler Eva DuGoff Elizabeth Stuart National Conference.
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
Lecture 12: Cox Proportional Hazards Model
Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics.
Matching STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Using Propensity Score Matching in Observational Services Research Neal Wallace, Ph.D. Portland State University February
1 Hester van Eeren Erasmus Medical Centre, Rotterdam Halsteren, August 23, 2010.
Local Control Bob Obenchain, PhD, FASA Risk Benefit Statistics LLC Yin = Dark = Evil = Risk Yang = Light = Good = Benefit.
1 Statistical Review of the Observational Studies of Aprotinin Safety Part II: The i3 Drug Safety Study CRDAC and DSaRM Meeting September 12, 2007 P. Chris.
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
Transparency in the Use of Propensity Score Methods
Todd Wagner, PhD February 2011 Propensity Scores.
Analysis of Covariance KNNL – Chapter 22. Analysis of Covariance Goal: To Compare treatments (1-Factor or Multiple Factors) after Controlling for Numeric.
Summary: connecting the question to the analysis(es) Jay S. Kaufman, PhD McGill University, Montreal QC 26 February :40 PM – 4:20 PM National Academy.
Propensity Score Matching in SPSS: How to turn an Audit into a RCT
(ARM 2004) 1 INNOVATIVE STATISTICAL APPROACHES IN HSR: BAYESIAN, MULTIPLE INFORMANTS, & PROPENSITY SCORES Thomas R. Belin, UCLA.
Uses of Diagnostic Tests Screen (mammography for breast cancer) Diagnose (electrocardiogram for acute myocardial infarction) Grade (stage of cancer) Monitor.
Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.
Matching methods for estimating causal effects Danilo Fusco Rome, October 15, 2012.
Bootstrap and Model Validation
Ling Ning & Mayte Frias Senior Research Associates Neil Huefner
Sec 9C – Logistic Regression and Propensity scores
12 Inferential Analysis.
Statistical Methods For Engineers
Logistic Regression --> used to describe the relationship between
Presenter: Wen-Ching Lan Date: 2018/03/28
12 Inferential Analysis.
The European Statistical Training Programme (ESTP)
15.1 The Role of Statistics in the Research Process
Chapter: 9: Propensity scores
Presentation transcript:

Confounding adjustment: Ideas in Action -a case study Xiaochun Li, Ph.D. Associate Professor Division of Biostatistics Indiana University School of Medicine

2 Description of the data set Quantity to be estimated Summary of baseline characteristics Approaches to data analyses Results Discussion Outline

3 Linder Center data described and analyzed in Kereiakes et al. (2000) 6 month follow-up data on 996 patients who  underwent an initial Percutaneous Coronary Intervention (PCI)  were treated with “usual care” alone or usual care plus a relatively expensive blood thinner (IIB/IIIA cascade blocker has10 variables  Y: 2 outcomes, mort6mo (efficacy) and cardcost (cost)  X: 1 treatment variable, and 7 baseline covariates, stent, height, female, diabetic, acutemi, ejecfrac and ves1proc Simulation Setup

4 Baseline characteristics Stentcoronary stent deployment femalepatient sex diabeticdiabetes mellitus acutemiacute myocardial infarction ves1procnumber of vessels involved in initial PCI heightIn centimeter ejecfracleft ejection fraction %

5 Simulation data set was based on the Linder Center data 17 copies of the clustered Lindner data, with fudge factors added to ejfract and hgt, and some clipping  same correlation among covariates, same clustering patterns Contains the values of 10 simulated variables for 10,325 hypothetical patients To simplify analyses, the data contain no missing values. Details and dataset available from Bob’s website The “LSIM10K” dataset

6 The population average treatment effect (ATE), i.e., E(Y 1 ) - E(Y 0 ) Y 1 and Y 0 are conterfactual outcomes In plain words: what if scenarios The expected response if treatment had been assigned to the entire study population minus the expected response if control had been assigned to the entire study population What do we want to estimate?

7 Baseline covariate balance assessment VariableC (Usual care alone) T (Usual care + Abciximab) P value stent63%69%<0.001 female33%34%0.36 diabetic23%19%<0.001 acutemi7%15%<0.001 ves1proc 1.4 ( ± 0.6)1.3 ( ± 0.6) <0.001 height (cm) ( ± 10)171.5 ( ± 10) <0.001 ejfract 53 ( ± 8)50 ( ± 10) <0.001

8 Visualizing overall imbalance C Deep blue = high values T

9 The following methods were applied to lsim10k Outcome regression adjustment (OR) Propensity score (PS) stratification Inverse-probability-treatment-weighted (IPTW) Doubly robust estimation Matching by  Mahalonobis distance  PS only Analytical Methods for confounding adjustment

A NALYSIS OF MORT 6 MO OR model for mort6mo : treatment indicator (trtm) main effect terms for all seven covariates quadratic terms for both height and ejfract Residual deviance: on degrees of freedom PS model: saturated model for the five categorical covariates (main effects and interaction terms up to fifth-order) main effects and quadratic terms for height and ejfract

Covariates Balance Evaluations based on PS Quintiles

1212 Stent

1313 Female

1414 Diabetic

1515 Acutemi

1616 Ves1proc

1717 Height strata 2 (0.95 cm) and 3 (-1.50cm)

1818 Height Existence of residual confounding after adjusting for PS quintiles The within-stratum between-group height difference means.d.p  Stratum 2:  Stratum 3:

1919 Ejfract strata 1 (0.81), 2 (-1.32) and 3 (-0.72)

2020 Existence of residual confounding after adjusting for PS quintiles The within-strata between-group height difference means.d.p-value  Stratum 1:  Stratum 2: e-5  Stratum 3: Ejfract

2121 Residual confounding within strata In PS stratification method, height and ejfract are further adjusted stratum specific  Treatment effect  Height, ejfract main effects and their quadratic terms PS Stratification

2 Results – mort6mo Methodu1u1 u0u0 △ SE Outcome Regression PS strat IPTW IPTW DR Match Mahalanobis PS NA Results of all methods are consistent, providing evidence of treatment effectiveness at preventing death at 6 months. True △=-0.036

A NALYSIS OF CARDCOST cardcost model: treatment indicator (trtm) main effect terms for all seven covariates quadratic terms for both height and ejfract PS MODEL : SAME AS BEFORE cardcost model of CA with PS stratification: stratum specific Treatment effect Height, ejfract main effects and their quadratic terms

2424 Model checking – OR Adjusted R-squared:

2525 Model checking – OR (log transformed) Adjusted R-squared:

2626 Results – cardcost Methodu1u1 u0u0 △ SE OR: original scale OR: Log transformed PS strat IPTW IPTW DR Match Mahalanobis PS NA

2727 IPTW 1 vs 2

2828 All methods give consistent results on the 2 outcomes All PS based results have similar variance except IPTW1 IPTWs depend on approx. correct PS model OR depends on approx. correct outcome model DR is a fortuitous combination of OR and IPTW: depends on one of models being right DR is a fortuitous combination of OR and IPTW: depends on one of models being right Nonparametric models of either models may be an alternative to parametric models Discussion

2929 Double Robustness MethodPSoutcome △ SE IPTW2wrongNA DR wrong right wrong right wrong wrong PS model: adjust for one covariate ‘acutemi’ only wrong OR model for card cost: adjust for the treatment indicator ‘trtm’ and the ‘acutemi’ covariate By “right”, we mean approximately.

3030 The majority applications in literature use a parametric logistic regression model that assume covariates are linear and additive on the log odds scale  May include selected interactions and polynomial terms Accurate PS estimation is impeded by  High dimensional covariates – which ones should we de- confound?  Unknown functional form – how do they relate to the treatment selection PS model misspecification can substantially bias the estimated treatment effect Nonparametric approach is flexible to accommodate nonlinear/non-additive relationship of covariates to treatment assignment, e.g., trees Propensity score estimation

3131 Nonparametric regression techniques Generalized Boosted Models (GBM) to estimate the propensity score function  Friedman, 2001; Madigan and Ridgeway, 2004; McCaffrey, Ridgeway, and Morral, 2004  R package: twang Regression tree model to predict cardcost  Ripley, 1996; Therneau and Atkinson, 1997  R package: rpart

3232 A multivariate nonparametric regression technique Sum of a large set of simple regression trees modelling log-odds  gbm finds mle of g(x)=log(p(x)/(1-p(x)), p(x)=P(T=1|x) Predict treatment assignment from a large number of pretreatment covariates – adaptively choose them Nonlinear No need to select variables Can model complex interactions Invariant to monotone transformations of x  E.g, same PS estimates whether use age, log(age) or age 2 Outperforms alternative methods in prediction error Generalized Boosted Models (GBM)

3 Results – cardcost nonparametric approach Methodu1u1 u0u0 △ SE DR: parametric models DR: Gbm + parametric model DR: Gbm + tree

3434 People try quintiles, deciles for propensity score stratification – need data driven approach (based on bias-variance tradeoff) for number of strata Model selection: PS model, and outcome model  Nonparametric estimation of models may be intuitive, but not clear about the properties of the causal estimates  Nonparametric caveat: still need to define a set of “confounders” based on knowledge of causal relationship among treatment, outcome and covariates rather than conditioning indiscriminatly on all covariates that have associations with treatment and outcome Future research