Martijn Schuemie, Marc Suchard, Patrick Ryan, Tomas Bergvall

Slides:



Advertisements
Similar presentations
FDA/Industry Workshop September, 19, 2003 Johnson & Johnson Pharmaceutical Research and Development L.L.C. 1 Uses and Abuses of (Adaptive) Randomization:
Advertisements

Residuals Residuals are used to investigate the lack of fit of a model to a given subject. For Cox regression, there’s no easy analog to the usual “observed.
Agency for Healthcare Research and Quality (AHRQ)
V.: 9/7/2007 AC Submit1 Statistical Review of the Observational Studies of Aprotinin Safety Part I: Methods, Mangano and Karkouti Studies CRDAC and DSaRM.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Sensitivity Analysis for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
JSM 2012, San Diego1 Caution should be used in applying propensity scores estimated in a full cohort to adjust for confounding in subgroup analyses Sue.
Statistics Micro Mini Threats to Your Experiment!
1 Predictors of customer perceived software quality Paul Luo Li (ISRI – CMU) Audris Mockus (Avaya Research) Ping Zhang (Avaya Research)
Vanderbilt Sports Medicine Chapter 4: Prognosis Presented by: Laurie Huston and Kurt Spindler Evidence-Based Medicine How to Practice and Teach EBM.
Simple Linear Regression
1 1 Using Marginal Structural Model to Estimate and Adjust for Causal Effect of Post- discontinuation Chemotherapy on Survival in Cancer Trials Y. Wang,
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Lecture 12: Cox Proportional Hazards Model
Chapter 3: Statistical Significance Testing Warner (2007). Applied statistics: From bivariate through multivariate. Sage Publications, Inc.
A Claims Database Approach to Evaluating Cardiovascular Safety of ADHD Medications A. J. Allen, M.D., Ph.D. Child Psychiatrist, Pharmacologist Global Medical.
Transparency in the Use of Propensity Score Methods
Additional Regression techniques Scott Harris October 2009.
Population level estimation workgroup Martijn Schuemie Marc Suchard.
Chapter 6 Selecting a Design. Research Design The overall approach to the study that details all the major components describing how the research will.
Matching methods for estimating causal effects Danilo Fusco Rome, October 15, 2012.
PSY 626: Bayesian Statistics for Psychological Science
Graphical Data Engineering
Constructing OHDSI network studies
Trials Adrian Boyle.
Statistics Use of mathematics to ORGANIZE, SUMMARIZE and INTERPRET numerical data. Needed to help psychologists draw conclusions.
BINARY LOGISTIC REGRESSION
The comparative self-controlled case series (CSCCS)
FeatureExtraction v2.0 Martijn Schuemie.
Lurking inferential monsters
Harvard T.H. Chan School of Public Health
Population level estimation best practices
Principles of Quantitative Research
OHDSI Method Evaluation
A critical assessment of the utility of the case-control design in population-based databases Martijn Schuemie.
OHDSI : Design and implementation of a comparative cohort study in observational health care data 아주대학교 의료정보학과 유승찬.
Two-way ANOVA with significant interactions
Population level estimation workgroup
Sec 9C – Logistic Regression and Propensity scores
Statistical Approaches to Support Device Innovation- FDA View
Reproducing Graham et al using the CohortMethod package
Martijn Schuemie, PhD Janssen Research and Development
Martijn Schuemie, PhD Janssen Research and Development
Biostatistics Case Studies 2016
Designing and executing OHDSI studies
Method benchmark update
OHDSI population-level estimation workgroup meeting
Donald E. Cutlip, MD Beth Israel Deaconess Medical Center
Method evaluation Martijn Schuemie.
Randomized Trials: A Brief Overview
OHDSI estimation-methods papers in progress
CJT 765: Structural Equation Modeling
Statistics 262: Intermediate Biostatistics
12 Inferential Analysis.
Chapter Eight: Quantitative Methods
PSY 626: Bayesian Statistics for Psychological Science
S1316 analysis details Garnet Anderson Katie Arnold
Methods of Economic Investigation Lecture 12
Martijn Schuemie, Peter Rijnbeek, Jenna Reps, Marc Suchard
Annals of Internal Medicine • Vol. 167 No. 12 • 19 December 2017
What is Regression Analysis?
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
12 Inferential Analysis.
Analysing RWE for HTA: Challenges, methods and critique
Non-Experimental designs: Correlational & Quasi-experimental designs
How to assess an abstract
Two Halves to Statistics
Additional Regression techniques
Longitudinal Data & Mixed Effects Models
Regression and Clinical prediction models
Presentation transcript:

Martijn Schuemie, Marc Suchard, Patrick Ryan, Tomas Bergvall OHDSI Methods Library Martijn Schuemie, Marc Suchard, Patrick Ryan, Tomas Bergvall

Quick recap of previous meetings Martijn held almost same talk in western (4 participants) and eastern (16 participants) hemisphere Sceptical about evidence generated through observation research so far Root causes of problem: study bias, publication bias, and p-hacking Defined workgroup objective and started discussion on best practices Nicole and Martijn started discussion on SCCS best practices

Workgroup objective Develop scientific methods for observational research leading to population level estimates that are Accurate Reliable Reproducable And enable researchers to use these methods

Methods library Population-level estimation analysis designs New-user cohort method using propensity scores Self-Controlled Case Series Self-Controlled Cohort IC Temporal Pattern Discovery Implemented as open source R packages Run against the CDM In (almost) any environment Windows, Linux, Mac PostgreSQL, Oracle, SQL Server, Amazon RedShift, Microsoft APS Lots of flexibility ‘Validated’ (unit tests + simulations)

Where? https://github.com/OHDSI

CohortMethod package cmd <- getDbCohortMethodData(connectionDetails, cdmDatabaseSchema = cdmSchema, targetId = 1118084, comparatorId = 1124300, outcomeId = 192671, washoutPeriod = 183, firstExposureOnly = TRUE, removeDuplicateSubjects = TRUE, excludeDrugsFromCovariates = TRUE, covariateSettings = createCovariateSettings()) studyPop <- createStudyPopulation(cohortMethodData = cmd, removeSubjectsWithPriorOutcome = TRUE, minDaysAtRisk = 1, riskWindowStart = 0, riskWindowEnd = 30, addExposureDaysToEnd = TRUE) ps <- createPs(cmd, studyPop) plotPs(ps) stratPop <- matchOnPs(ps, caliper = 0.25, caliperScale = "standardized", maxRatio = 1) plotPs(stratPop, ps) balance <- computeCovariateBalance(strata, cmd) plotCovariateBalanceScatterPlot(balance) plotCovariateBalanceOfTopVariables(balance) outcomeModel <- fitOutcomeModel(stratPop, cmd useCovariates = TRUE, modelType = "cox", stratified = TRUE) plotKaplanMeier(stratPop, includeZero = FALSE) drawAttritionDiagram(stratPop) outcomeModel

CohortMethod package These 13 statements Implement a full study cmd <- getDbCohortMethodData(connectionDetails, cdmDatabaseSchema = cdmSchema, targetId = 1118084, comparatorId = 1124300, outcomeId = 192671, washoutPeriod = 183, firstExposureOnly = TRUE, removeDuplicateSubjects = TRUE, excludeDrugsFromCovariates = TRUE, covariateSettings = createCovariateSettings()) studyPop <- createStudyPopulation(cohortMethodData = cmd, removeSubjectsWithPriorOutcome = TRUE, minDaysAtRisk = 1, riskWindowStart = 0, riskWindowEnd = 30, addExposureDaysToEnd = TRUE) ps <- createPs(cmd, studyPop) plotPs(ps) stratPop <- matchOnPs(ps, caliper = 0.25, caliperScale = "standardized", maxRatio = 1) plotPs(stratPop, ps) balance <- computeCovariateBalance(strata, cmd) plotCovariateBalanceScatterPlot(balance) plotCovariateBalanceOfTopVariables(balance) outcomeModel <- fitOutcomeModel(stratPop, cmd useCovariates = TRUE, modelType = "cox", stratified = TRUE) plotKaplanMeier(stratPop, includeZero = FALSE) drawAttritionDiagram(stratPop) outcomeModel These 13 statements Implement a full study Celecoxib vs diclofenac for risk of GI bleed Interact directly with database in CDM Many covariates constructed Demographics Every drug (+ class) Every condition (+ group) Every procedure Every observation Charleston, CHAD2, etc. Propensity model using LASSO 1-on-1 matching on propensity score Fitting a Cox model including same covariates used in PS model

CohortMethod package Specify the two exposure groups and the outcome. cmd <- getDbCohortMethodData(connectionDetails, cdmDatabaseSchema = cdmSchema, targetId = 1118084, comparatorId = 1124300, outcomeId = 192671, washoutPeriod = 183, firstExposureOnly = TRUE, removeDuplicateSubjects = TRUE, excludeDrugsFromCovariates = TRUE, covariateSettings = createCovariateSettings()) studyPop <- createStudyPopulation(cohortMethodData = cmd, removeSubjectsWithPriorOutcome = TRUE, minDaysAtRisk = 1, riskWindowStart = 0, riskWindowEnd = 30, addExposureDaysToEnd = TRUE) ps <- createPs(cmd, studyPop) plotPs(ps) stratPop <- matchOnPs(ps, caliper = 0.25, caliperScale = "standardized", maxRatio = 1) plotPs(stratPop, ps) balance <- computeCovariateBalance(strata, cmd) plotCovariateBalanceScatterPlot(balance) plotCovariateBalanceOfTopVariables(balance) outcomeModel <- fitOutcomeModel(stratPop, cmd useCovariates = TRUE, modelType = "cox", stratified = TRUE) plotKaplanMeier(stratPop, includeZero = FALSE) drawAttritionDiagram(stratPop) outcomeModel Specify the two exposure groups and the outcome. Here using drug_era and condition_era tables (default) Typically using Circe-defined cohorts

CohortMethod package HR = .83 (0.67 – 1.01) cmd <- getDbCohortMethodData(connectionDetails, cdmDatabaseSchema = cdmSchema, targetId = 1118084, comparatorId = 1124300, outcomeId = 192671, washoutPeriod = 183, firstExposureOnly = TRUE, removeDuplicateSubjects = TRUE, excludeDrugsFromCovariates = TRUE, covariateSettings = createCovariateSettings()) studyPop <- createStudyPopulation(cohortMethodData = cmd, removeSubjectsWithPriorOutcome = TRUE, minDaysAtRisk = 1, riskWindowStart = 0, riskWindowEnd = 30, addExposureDaysToEnd = TRUE) ps <- createPs(cmd, studyPop) plotPs(ps) stratPop <- matchOnPs(ps, caliper = 0.25, caliperScale = "standardized", maxRatio = 1) plotPs(stratPop, ps) balance <- computeCovariateBalance(strata, cmd) plotCovariateBalanceScatterPlot(balance) plotCovariateBalanceOfTopVariables(balance) outcomeModel <- fitOutcomeModel(stratPop, cmd useCovariates = TRUE, modelType = "cox", stratified = TRUE) plotKaplanMeier(stratPop, includeZero = FALSE) drawAttritionDiagram(stratPop) outcomeModel HR = .83 (0.67 – 1.01)

All-by-all support For: Sensitivity analyses Target – (Comparator) - Outcome Analysis settings Target – (Comparator) - Outcome Analysis settings Target – (Comparator) - Outcome Analysis settings Drug – (Comparator) - Outcome Analysis settings Method For: Sensitivity analyses Including negative controls Methods research Safety surveillance? Estimates, Diagnostics

Validation Unit tests Simulations

Unit tests A unit test is a piece of code that tests a function: test_that("Simple 1-on-1 matching", { rowId <- 1:5 treatment <- c(1, 0, 1, 0, 1) propensityScore <- c(0, 0.1, 0.3, 0.4, 1) data <- data.frame(rowId = rowId, treatment = treatment, propensityScore = propensityScore) result <- matchOnPs(data, caliper = 0, maxRatio = 1) expect_equal(result$stratumId, c(0, 0, 1, 1)) }) All unit tests are executed every time a change is made to the package:

Simulation Using simulation for more complicated functionality E.g: SCCS seasonality modeling:

Validation Unit tests Simulations Not Double coding

Discussion on validation Are unit tests and simulations enough? Do we need (and believe in) double coding? Can we formulate best practices around validation?

Implemented methods New-user cohort method using propensity scores Self-Controlled Case Series Self-Controlled Cohort IC Temporal Pattern Discovery

New-user cohort method Compare new users of drug A to drug B for outcome X Automatic propensity score construction Trim, stratify, or match on PS Outcome modeling Cox, Poisson, or logistic Option to include same 50k+ covariates in outcome model Diagnostics PS distribution overlap Covariate balance after matching Kaplan-Meier plot

New-user cohort method Strengths Compare apples to apples (near zero residual bias) Subjects with same baseline characteristics Comparable point in time (initiation of a new treatment) Comparative effectiveness (is drug A safer than drug B?) Stop-go diagnostics Easy to interpret (similar to RCT) Weaknesses ‘Absolute’ safety (does drug A cause outcome X?) Data hungry Only works with a good comparator Sensitive to unmeasured between-person confounding

Self-Controlled Case Series Compare time on drug to time not on drug Self-controlled: adjusting for all constant patient characteristics Flexible risk window definitions Correct for age and seasonality Constant effect within a calendar month Spline smoothing across months Add other drugs to model (e.g. concomittant use) User picked All other drugs (MSCCS model using regularized Poisson regression) Adjust for event-dependent observation end

Self-Controlled Case Series Strengths ‘Absolute’ safety (does drug A cause outcome X?) Adjust for all patient characteristics that are constant in time Weaknesses Comparative effectiveness (is drug A safer than drug B?) Sensitive to within-person time-varying confounding Lots of things happen when you take a drug Sensitive to violations of underlying assumptions Does the event affect probability of exposure? Doest the event affect probability of subsequent events? Does the event affect probability of censoring (e.g. death)? Problematic for both chronic exposures and lethal outcomes Hard to interpret

Self-Controlled Cohort Compare time on drug to time just prior to starting drug Risk window definitions Length of risk window and control window Filter subjects who do not have full control period available

Self-Controlled Cohort Strengths Fast Adjusts for patient baseline characteristics Best performer in OMOP experiment Weaknesses Positively biased when there’s contra-indication Sensitive to within-person time-varying confounding Lots of things happen when you take a drug

IC Temporal Pattern Discovery Similar to Self-Controlled Cohort Pick different control period (e.g. one year ago) Adjust for typical pattern when initiating a drug Produces an IC statistic instead of an effect size estimate

IC Temporal Pattern Discovery Strengths Fast Adjusts for patient baseline characteristics More robust to contra-indications Weaknesses IC statistic is hard to interpret Sensitive to within-person time-varying confounding Lots of things happen when you take a drug

Methods Library R packages New-user cohort studies using large-scale regression for propensity and outcome models Cohort Method s Self-Controlled Case Series analysis using few or many predictors, includes splines for age and seasonality. Self-Controlled Case Series s A self-controlled cohort design, where time preceding exposure is used as control. Self-Controlled Cohort s A self-controlled design, but using temporal patterns around other exposures and outcomes to correct for time-varying confounding. IC Temporal Pattern Disc. Estimation methods s Automatically extract large sets of features for user-specified cohorts using data in the CDM. Feature Extraction s Build and evaluate predictive models for user-specified outcomes, using a wide array of machine learning algorithms. Patient Level Prediction Prediction methods s Use negative control exposure-outcome pairs (where relative risk is assumed to be 1) to profile and calibrate a particular analysis design. Empirical Calibration s Use real data and established reference sets as well as simulations injected in real data to evaluate the performance of methods. Method Evaluation Method characterization s Connect directly to a wide range of database platforms, including SQL Server, Oracle, and PostgreSQL. Database Connector s Generate SQL on the fly for the various SQL dialects. Sql Render s Highly efficient implementation of regularized logistic, Poisson and Cox regression. Cyclops s Support tools that didn’t fit other categories, including tools for maintaining R libraries. Ohdsi R Tools Supporting packages Under construction

Which methods? Implemented: New-user cohort method using propensity scores Self-Controlled Case Series Self-Controlled Cohort IC Temporal Pattern Discovery Not (yet) implemented: Case – control Case – crossover ? Not dealing with time-varying confounding between exposure start and outcome

Discussion Method package: Black box or best practice? Packages become prescriptive, e.g. Large scale regression is easy, hand-picking covariates is hard New-user CM + PS is easy, case-control is hard Template protocols per method package? Point-and-click interface? Training?

Funding opportunities Anyone?

Topic of next meeting(s)? Method evaluation Identifying the important questions that can be answered using observational research CohortMethod package in-depth Replicating RCTs in observational data ?

Next workgroup meeting(s) Western hemisphere: April 27 6pm Central European time 5pm UK time Noon Eastern Time (New York) 9am Pacific Coast Time (LA) Eastern hemisphere: May 4 3pm Hong Kong / Taiwan 4pm South Korea 4:30pm Adelaide (9am Central European time) http://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:est-methods

Under the hood CDM JDBC SqlRender SQL Database Connector FF Data processing Results FF Cyclops