Comparing high-dimensional propensity score versus lasso variable selection for confounding adjustment in a novel simulation framework Jessica Franklin.

Slides:



Advertisements
Similar presentations
The Application of Propensity Score Analysis to Non-randomized Medical Device Clinical Studies: A Regulatory Perspective Lilly Yue, Ph.D.* CDRH, FDA,
Advertisements

Comparator Selection in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
A workshop introducing doubly robust estimation of treatment effects
Cross Sectional Designs
1 Arlene Ash QMC - Third Tuesday September 21, 2010 (as amended, Sept 23) Analyzing Observational Data: Focus on Propensity Scores.
Connie N. Hess, MD, Bimal R. Shah, MD, MBA, S. Andrew Peng, MS, Laine Thomas, PhD, Matthew T. Roe, MD, MHS, Eric D. Peterson, MD, MPH Relationship of Early.
FULL COVERAGE FOR PREVENTIVE MEDICATIONS AFTER MYOCARDIAL INFARCTION NEW ENGLAND JOURNAL OF MEDICINE 2011; DOI: /NEJMSA Niteesh K. Choudhry,
Sensitivity Analysis for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
Experimental Design making causal inferences. Causal and Effect The IV precedes the DV in time The IV precedes the DV in time The IV and DV are correlated.
Presentations in this series 1.Introduction 2.Self-matching 3.Proxies 4.Intermediates 5.Instruments 6.Equipoise Avoiding Bias Due to Unmeasured Covariates.
ODAC May 3, Subgroup Analyses in Clinical Trials Stephen L George, PhD Department of Biostatistics and Bioinformatics Duke University Medical Center.
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
Is low-dose Aspirin use associated with a reduced risk of colorectal cancer ? a QResearch primary care database analysis Prof Richard Logan, Dr Yana Vinogradova,
1 Lauren E. Finn, 2 Seth Sheffler-Collins, MPH, 2 Marcelo Fernandez-Viña, MPH, 2 Claire Newbern, PhD, 1 Dr. Alison Evans, ScD., 1 Drexel University School.
Clustered or Multilevel Data
NACC National Alzheimer’s Coordinating Center Time Dependent Exposure in Case-Control Studies Roger Higdon, PhD Senior Biostatistician NACC, University.
RACIAL DISPARITIES IN PRESCRIPTION DRUG UTILIZATION AN ANALYSIS OF BETA-BLOCKER AND STATIN USE FOLLOWING HOSPITALIZATION FOR ACUTE MYOCARDIAL INFARCTION.
Safety and effectiveness of bivalirudin in routine care of patients undergoing percutaneous coronary intervention JA Rassen, MA Mittleman, RJ Glynn, A.
Covariate Selection for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
DOES MEDICARE SAVE LIVES?
THE PREVALENCE AND PREDICTORS OF LOW-COST GENERIC PROGRAM USE IN A NATIONALLY REPRESENTATIVE ADULT POPULATION: IMPLICATIONS FOR PATIENTS, RESEARCH, AND.
The Economic Impact of Intensive Case Management on Costly Uninsured Patients in Emergency Departments: An Evaluation of New Mexico’s Care One Program.
Advanced Statistics for Interventional Cardiologists.
Preventive Health Care Use in Elderly Uterine Cancer Survivors Division of Health Policy and Management School of Public Health University of Minnesota.
Evidence-Based Medicine 4 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
Simple Linear Regression
1 Is Managed Care Superior to Traditional Fee-For-Service among HIV-Infected Beneficiaries of Medicaid? David Zingmond, MD, PhD UCLA Division of General.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Presentations in this series 1.Overview and Randomization 2.Self-matching 3.Proxies 4.Intermediates 5.Instruments 6.Equipoise Avoiding Bias Due to Unmeasured.
Epidemiology The Basics Only… Adapted with permission from a class presentation developed by Dr. Charles Lynch – University of Iowa, Iowa City.
The Effect of Quality Improvement on Racial Disparities in Diabetes Care Thomas D. Sequist, MD MPH Alyce S. Adams, PhD Fang Zhang, MS Dennis Ross-Degnan,
Evidence-Based Medicine 3 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
Thomas S. Rector, PhD, Inder S. Anand, MD, David Nelson, PhD, Kristine Ensrud, MD and Ann Bangerter, MS CHF QUERI NETWORK November 8, 2007 VA Medical Center,
Types of study designs Arash Najimi
The Hilltop Institute was formerly the Center for Health Program Development and Management. Emergency Room Use by Individuals with Disabilities Enrolled.
The Impact of Retail Clinics on Cost & Utilization Are They Substitutes or Complements to Physician Services? Stephen T. Parente University of Minnesota.
Handling Attrition and Non- response in the 1970 British Cohort Study Tarek Mostafa Institute of Education – University of London.
Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR 5/06.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Presentations in this series 1.Introduction 2.Self-matching 3.Proxies 4.Intermediates 5.Instruments 6.Equipoise Avoiding Bias Due to Unmeasured Covariates.
THE URBAN INSTITUTE Examining Long-Term Care Episodes and Care History for Medicare Beneficiaries: A Longitudinal Analysis of Elderly Individuals with.
Lecture 9: Analysis of intervention studies Randomized trial - categorical outcome Measures of risk: –incidence rate of an adverse event (death, etc) It.
Generalizing Observational Study Results Applying Propensity Score Methods to Complex Surveys Megan Schuler Eva DuGoff Elizabeth Stuart National Conference.
Association of C-Reactive Protein and Acute Myocardial Infarction in HIV-Infected Patients Virginia A. Triant, MD, MPH, James B. Meigs, MD, MPH, and Steven.
Can Mental Health Services Reduce Juvenile Justice Involvement? Non-Experimental Evidence E. Michael Foster School of Public Health, University of North.
1 Lecture 6: Descriptive follow-up studies Natural history of disease and prognosis Survival analysis: Kaplan-Meier survival curves Cox proportional hazards.
Data Analysis in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine October.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
A Claims Database Approach to Evaluating Cardiovascular Safety of ADHD Medications A. J. Allen, M.D., Ph.D. Child Psychiatrist, Pharmacologist Global Medical.
Applying Causal Inference Methods to Improve Identification of Health and Healthcare Disparities, and the Underlying Mediators and Moderators of Disparities.
1 EPI235: Epi Methods in HSR April 5, 2005 L3 Evaluating Health Services using administrative data 2: Advanced Topics in Risk Adjustment (Dr. Schneeweiss)
2007May221 Journal Club for Analysis of Complex Datasets Frost FJ, Petersen H, Tollestrup K, Skipper B. Influenza and COPD mortality protection as pleiotropic,
1 Statistical Review of the Observational Studies of Aprotinin Safety Part II: The i3 Drug Safety Study CRDAC and DSaRM Meeting September 12, 2007 P. Chris.
Transparency in the Use of Propensity Score Methods
Carina Signori, DO Journal Club August 2010 Macdonald, M. et al. Diabetes Care; Jun 2010; 33,
Table 1. Methodological Evaluation of Observational Research (MORE) – observational studies of incidence or prevalence of chronic diseases Tatyana Shamliyan.
Case Control study. An investigation that compares a group of people with a disease to a group of people without the disease. Used to identify and assess.
Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.
Association Between Serotonergic Antidepressant Use During Pregnancy and Autism Spectrum Disorder in Children Hilary K. Brown, PhD; Joel G. Ray, MD, MSc,
Population level estimation best practices
PCI related in-hospital mortality based on race and gender in the USA
autoimmune disorders in Iraq and Afghanistan Veterans
Presenter: Wen-Ching Lan Date: 2018/08/01
Presenter: Wen-Ching Lan Date: 2018/03/28
Annals of Internal Medicine • Vol. 167 No. 12 • 19 December 2017
What is Regression Analysis?
Linear Model Selection and regularization
WHICH PSM METHOD TO USE? THE ASSOCIATION BETWEEN CHOSEN PROPENSITY SCORE METHOD AND OUTCOMES OF RETROSPECTIVE REAL-WORLD TREATMENT COMPARISIONS: EVALUATION.
Improving Overlap Farrokh Alemi, Ph.D.
Evaluating the performance of advanced causal inference methods in real world data through large-scale replication of randomized controlled trials.
Presentation transcript:

Comparing high-dimensional propensity score versus lasso variable selection for confounding adjustment in a novel simulation framework Jessica Franklin Instructor in Medicine Division of Pharmacoepidemiology & Pharmacoeconomics Brigham and Women’s Hospital and Harvard Medical School QMC, Department of Quantitative Health Sciences University of Massachusetts Medical School April 15, 2014

Background Administrative healthcare claims data are a popular data source for nonrandomized studies of interventions. Because treatments are not randomized, addressing confounding is the primary methodological challenge.

Claims Data Comprehensive claims databases contain information on patient insurance enrollment and demographics, as well as every healthcare encounter, including: Diagnoses Procedures Hospitalizations Medications dispensed Dates of encounters provide a complete longitudinal record of patients’ healthcare interactions.

Follow-up for outcome events New user design Potential confounders are measured prior to initiation of exposure. Active treatment comparator group reduces biases associated with non-user comparators. End of: Data Enrollment Exposure initiation Covariates assessed Follow-up for outcome events

Principles of variable selection Brookhart et al. (2006) showed that the best PS model is the model that includes all predictors of outcome (regardless of whether they are associated with exposure). Pearl (2010) and Myers et al. (2011) further noted that including instrumental varaibles (IVs) can increase bias from unmeasured confounding. IVs are associated with exposure, but not associated with outcome except through exposure.

hd-PS variable selection The high-dimensional propensity score (hd-PS) algorithm screens thousands of diagnoses, medications, and procedure codes and ranks variables according to likelihood of confounding. Relies on the idea that a large number of “proxy” variables can reduce bias from unmeasured confounding. Empirical evidence has shown a reduction in bias.

Shrinkage methods Greenland (2008) suggested regularization methods as preferable to variable selection. Shrinking coefficients allows for efficient estimation, even in models with many degrees of freedom. Lasso regression provides both shrinkage and principled variable selection. Shrinkage allows for direct modeling of the outcome even with many potential confounders Some coefficients are shrunk all the way to 0.

Objective To compare the performance of hd-PS variable selection Ridge regression of the outcome on all potential confounders Lasso regression of the outcome on all potential confounders The goal is maximum reduction in confounding bias.

Comparing high-dimensional methods How can we answer this question? Empirical studies are useful when we “know” the true treatment effect, but even then we can’t determine the contributions of bias and variance to overall error. Ordinary simulation techniques with completely synthetic data cannot capture the complex correlation structure among covariates in claims data.

Plasmode simulation We start with a real empirical cohort study: 49,653 patients Exposed to either ns-NSAIDs or Cox-2 inhibitors (X) Followed for gastrointestinal events (Y) Pre-defined covariates include age, sex, race, and 16 diagnosis/medication/procedure variables (C1) To get reasonable values for associations between covariates and outcome, we estimated a model with: Y ~ X + all pre-defined covariates + interactions between age and binary covariates

Simulation setup True outcome generation model: Estimated coefficient values from the observed outcome model Except for the coefficient on exposure: . To create simulated datasets: Sample with replacement rows from (X, C) Calculate for each patient in the sample. Simulate outcome We created 500 datasets, each of size 30,000, outcome prevalence set to 5%, exposure prevalence set to 40%.

True causal diagram Any variables associated with exposure remain associated with exposure. C Any correlations among covariates and true confounders remain intact. C1 X Y Associations with outcome are determined by chosen simulation model. C1 = True confounders, a subset of C = all measured covariates.

Outcome generation Variable True OR Age 1.030928413 Black race 0.668385082 Male gender 1.418991333 Congestive heart failure 1.220575229 Coronary disease 1.184633001 Prior bleeding 10.62470195 Prior ulcer 0.777704249 Recent hospitalization 4.537106069 Recent nursing home admission 2.222756726 Warfarin 1.011494072 Gastrointestinal drugs 1.858528101

The mechanics of hd-PS For each diagnosis, procedure, medication code, hd- PS creates 3 potential variables: Code observed ≥ 1 time during baseline period Code observed ≥ median number of times Code observed ≥ 75th percentile number of times There are 2 potential ranking methods: Exposure-based: A simple RR association measure between exposure and each variable. Bias-based: Bross’s bias formula that considers the association of each varaible with exposure and outcome

hd-PS Analyses PSs were constructed using: The top 500 exposure-ranked variables + demographics The top 500 bias-ranked variables + demographics The top 30 exposure-ranked variables + demographics The top 30 bias-ranked variables + demographics Logistic regression on exposure + deciles of each PS

Shrinkage analyses Regression of the outcome on all hdPS-screened variables (4800 – those that never occur) + exposure + demographics Ridge regression Lasso regression We apply no shrinkage to the coefficient on exposure. Calculate the crude estimate for comparison

Combination approaches Using the variables selected by the lasso regression: Include them in a PS analysis Include them in an ordinary logistic regression outcome model Using the 500 variables chosen by bias-based hd-PS: Include them in a lasso outcome model Include them in a ridge outcome model

Results – Variable selection Lasso selected 103 variables on average. 66% were also selected by at least one hdPS algorithm IQR: 62-70% Age was selected in 100% of simulations. Race was selected in 28%.

Results - Bias

Results - Bias Crude confounding bias of 0.19.

Results - Bias Ridge and lasso regression with all variables reduces bias by 41% and 63%, respectively.

Results - Bias Ridge and lasso do better when they start with pre-screened variables. Bias is reduced by 70% and 83%, respectively.

Results - Bias Ordinary regression and PS approaches performed better. Exposure-based hdPS with 500 variables completely eliminated bias.

Results - Bias Bias-based hdPS varaible selection also performed well, with 93% and 91% bias reduction in the PS and ordinary regression models.

Results - Bias PS and regular regression models performed well using lasso variable selection as well (95% and 96% bias reduction).

Results - Bias When restricting variables to a very small set, bias-based hdPS was much preferred.

Conclusion The variable selection method had relatively little importance. The estimation method mattered much more. Shrinkage of coefficient estimates led to insufficient bias control. Focus on including a large number of potential confounders or confounder proxies.

Limitations There are many “instruments” in current simulation setup. Variables associated with exposure that are not included in the outcome simulation model are essentially IVs, which is unrealistic. There is no unmeasured confounding in these data. Variable selection is an easier task when all important confounders are measured.

Future work Enrich the outcome model Vary the true treatment effect Non-linear associations, more interactions, more true confounders Vary the true treatment effect Modify the coefficient on treatment in the outcome generation model. Vary exposure prevalence Can be accomplished by sampling within exposure group. Vary outcome prevalence Modify the intercept in the outcome generation model. Unmeasured confounding Set aside one or more true confounders and don’t allow methods to utilize these variables. Other base datasets

Thanks! Co-authors: Contact: Wesley Eddings Jeremy A Rassen Robert J Glynn Sebastian Schneeweiss Contact: jmfranklin@partners.org www.drugepi.org/faculty-staff-trainees/faculty/jessica- franklin/