An Open Collaborative Approach for Rapid Evidence Generation

Slides:



Advertisements
Similar presentations
CDM Registry Project Dr. Richard Lewanczuk Regional Medical Director Chronic Disease Management Capital Health.
Advertisements

UC BRAID: Co-creating and evaluating performance in a regional laboratory for conducting translational science UC BRAID Executive Committee: Steven Dubinett.
Actionable Barriers and Gaps: Nursing Initiatives Bonnie L. Westra, PhD, RN, FAAN Associate Professor, University of Minnesota School of Nursing Director,
Amy Sheide Clinical Informaticist 3M Health Information Systems USA Achieving Data Standardization in Health Information Exchange and Quality Measurement.
University of Pittsburgh Department of Biomedical Informatics Healthcare institutions have established local clinical data research repositories to enable.
Theresa Tsosie-Robledo MS RN-BC February 15, 2012
LEVERAGING THE ENTERPRISE INFORMATION ENVIRONMENT Louise Edmonds Senior Manager Information Management ACT Health.
The NIH Roadmap for Medical Research
A Robust Health Data Infrastructure P. Jon White, MD Director, Health IT Agency for Healthcare Research and Quality
Supported by the Patient-Centered Outcomes Research Institute (PCORI) Contract CDRN PopMedNet in the pSCANNER Network PMN User Meeting.
JumpStart the Regulatory Review: Applying the Right Tools at the Right Time to the Right Audience Lilliam Rosario, Ph.D. Director Office of Computational.
Consent2Share Linking Cohort Discovery to Consent David R Nelson MD Assistant Vice President for Research Professor of Medicine Director, Clinical and.
1 Federal Health IT Ontology Project (HITOP) Group The Vision Toward Testing Ontology Tools in High Priority Health IT Applications October 5, 2005.
HIT Policy Committee Quality Measures Workgroup October 28, 2010 Fred D Rachman, MD.
Lisa A. Lang, MPP Assistant Director for Health Services Research Information Head, National Information Center on Health Services Research and Health.
Applying Science to Transform Lives TREATMENT RESEARCH INSTITUTE TRI science addiction Mady Chalk, Ph.D Treatment Research Institute CADPAAC Conference.
Division of Population Health Sciences Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Connected health: collaborative opportunities.
Component 6 - Health Management Information Systems Unit 1-2 What is Health Informatics?
Wisconsin Genomics Initiative W isconsin M edical R esearch T riangle Wisconsin Medical Discovery Triangle.
Amar K. Das, MD, PhD Associate Professor of Biomedical Data Science, Psychiatry and Health Policy & Clinical Practice Geisel School of Medicine at Dartmouth.
Uses of the NIH Collaboratory Distributed Research Network Jeffrey Brown, PhD for the DRN Team Harvard Pilgrim Health Care Institute and Harvard Medical.
Comparative Effectiveness Research (CER) and Patient- Centered Outcomes Research (PCOR) Presentation Developed for the Academy of Managed Care Pharmacy.
1 CDC Health Information Exchange (HIE) Accelerating State-wide Public Health Situational Awareness in New York Through Health Information Exchanges August.
© 2016 Chapter 6 Data Management Health Information Management Technology: An Applied Approach.
Tim Friede Department of Medical Statistics
REDCap General Overview
Development of system for semi-automatic GIS visualization based on common data model OMOP-CDM : AEGIS (Application for Epidemiological Geographic Information.
FeatureExtraction v2.0 Martijn Schuemie.
Presentation Developed for the Academy of Managed Care Pharmacy
Research using Registries
Using Technology to Support Evidence Based Practice
Patrick Ryan, PhD Janssen Research and Development
Table 1: Patient Demographics
OHDSI : Design and implementation of a comparative cohort study in observational health care data 아주대학교 의료정보학과 유승찬.
Shyam Visweswaran, MD, PhD
Overview of MDM Site Hub
Automation of systematic reviews: the reviewer’s viewpoint
Improving PCOR Methods: Causal Inference
pSCANNER’s Value: Beckstrom’s Law at Work
Clinical Studies Continuum
Supplementary Table 1. PRISMA checklist
Federal Health IT Ontology Project (HITOP) Group
Mid-South CDRN Sustainability Models
Data Quality: Practice, Technologies and Implications
Reporting Approaches and Best Practices Jennifer Benjamin NCQA
Architecture-Specific Considerations and Methods for Data Quality Assessment in Collaborative Clinical Data Research Networks Chunhua Weng, PhD Associate.
“Next Generation of Connected Health”
Conformation Window Size Joint Sensitivity Analysis
Leveraging Payer Data to Jumpstart RHIOs
PCORTF Common Data Model Harmonization Project and the Oncology Use case January 30, 2018.
Electronic Health Information Systems
Comparing automated mental health screening to manual processes in a health care system Josh biber.
Presentation Developed for the Academy of Managed Care Pharmacy
Semiannual Report, March 2015
Sandy Jones, Public Health Advisor
HIMSS Advocacy Day Washington, DC April 1, 2004
Outline I have trouble working with ODBs
Friends of Cancer Research
Wisconsin Genomics Initiative
Clinical and Translational Science Awards Program
Metadata Construction in Collaborative Research Networks
Shyam Visweswaran, MD, PhD
Trial Funding and Engagement: The NIH Sponsored CTSA Program
RegionAl: an Optimized Regional Classifier to Predict Mortality in
Regulatory Perspective of the Use of EHRs in RCTs
From Innovation to Commercialization Access to Data
Penn State’s Center for Health Organization Transformation (CHOT)
Presentation Developed for the Academy of Managed Care Pharmacy
Using clinical trial data as real-world evidence
REACHnet: Research Action for Health Network
Presentation transcript:

An Open Collaborative Approach for Rapid Evidence Generation David K. Vawdrey, PhD George Hripcsak MD, MS Jon D. Duke MD, MS Patrick Ryan PhD Nigam H. Shah MBBS, PhD AMIA Joint Summits on Translational Science March 25, 2015 Sample 1

Introduction David K. Vawdrey, PhD NewYork-Presbyterian Hospital Columbia University New York, USA

What is OHDSI? Video Introduction of OHDSI

What is OHDSI? The Observational Health Data Sciences and Informatics (OHDSI) collaborative is an international network of researchers and observational health databases The goal of OHDSI is to bring out the value of health data through large-scale analytics The Observational Health Data Sciences and Informatics (OHDSI) program is a multi-stakeholder, interdisciplinary collaborative to bring out the value of health data through large-scale analytics

What is OHDSI? OHDSI builds on the Observational Medical Outcomes Partnership (OMOP), and maintains the OMOP Common Data Model (CDM) OHDSI provides a suite of tools and algorithms for conducting observational research using large data sets All OHDSI solutions are open-source OHDSI has established an international network of researchers and observational health databases with a central coordinating center housed at Columbia University

OHDSI Mission To transform medical decision-making by creating reliable scientific evidence about disease natural history, healthcare delivery, and the effects of medical interventions through large-scale analysis of observational health databases for population-level estimation and patient-level predictions.

OHDSI Vision OHDSI collaborators access a network of one billion patients to generate evidence about all aspects of healthcare. Patients and clinicians and all other decision-makers around the world use OHDSI tools and evidence every day.

OHDSI Objectives To establish a research community for observational health data sciences that enables active engagement across multiple disciplines and stakeholder groups To establish a research community for observational health data sciences that enables active engagement across multiple disciplines (including statistics, computer science, epidemiology, physics, informatics, and biomedical sciences) spanning multiple stakeholder groups (including academia, healthcare providers, payers, medical product manufacturers, and government agencies)

OHDSI Objectives To develop and evaluate analytical methods that use observational health data to study the effects of medical interventions and predict health outcomes for patients, and to generate the empirical evidence base necessary to establish best practices in observational analysis

OHDSI Objectives To apply scientific best practices in the design and implementation of open-source systems for observational analysis to enable medical product risk identification, comparative effectiveness research, patient-level predictions, and healthcare improvement

OHDSI Objectives To generate evidence about disease natural history, healthcare delivery, and the effects of medical interventions, supporting medical decision-making in a way that is credible, consistent, transparent, and personalized to patients and providers

OHDSI Objectives To establish educational opportunities to train students, practitioners, and consumers about the foundational science of observational health data analysis

George Hripcsak MD, MS Professor and Chair Department of Biomedical Informatics Columbia University Jon D. Duke MD, MS Senior Scientist Director, Drug Safety Informatics Program Regenstrief Institute Patrick Ryan, PhD Sr. Director and Head, Epidemiology Analytics Janssen Research and Development Nigam H. Shah MBBS, PhD Assistant Professor Dept. of Medicine (Biomedical Informatics) Stanford University

OHDSI OMOP CDM at Columbia and Use of CDM in Clinical Data Research Networks (CDRNs) George Hripcsak, MD, MS Biomedical Informatics Columbia University New York, USA

Comparative effectiveness How OHDSI Works OHDSI Coordinating Center Source data warehouse, with identifiable patient-level data Standardized, de-identified patient-level database (OMOP CDM v5) Data network support Analytics development and testing Research and education ETL Standardized large-scale analytics Summary statistics results repository OHDSI.org Consistency Temporality Strength Plausibility Experiment Coherence Biological gradient Specificity Analogy Comparative effectiveness Predictive modeling Analysis results OHDSI Data Partners

OHDSI Information Architecture Each site retains its own data Use a common information model Concepts, terminologies, conceptual relations “OMOP Common Data Model (v4, v5)” Strictly defines terminology, mappings Supports world-wide queries Advanced observational research methods Aggregate the results centrally

OMOP Common Data Model (CDM) v. 5.0

ACHILLES

ACHILLES

NYC-CDRN New York City Clinical Data Research Network Partner Organization Health System Clinical Directors Network Columbia University College of Physicians and Surgeons Montefiore Medical Center and Albert Einstein College of Med Mount Sinai Health System and Icahn School of Medicine NewYork-Presbyterian Hospital NYU Langone Medical Center and NYU School of Medicine Weill Cornell Medical College Research Infrastructure Biomedical Research Alliance of New York Cornell NYC Tech Campus New York Genome Center Rockefeller University Health Information Exchange Bronx RHIO (Bronx Regional Informatics Center) Healthix Patient Organizations American Diabetes Association Center for Medical Consumers Consumer Reports Cystic Fibrosis Foundation New York Academy of Medicine NYS Department of Health

NYC-CDRN New York City Clinical Data Research Network Data Aggregation Activities Patient Matching De-Duplication Integrating Additional Data Sources inc. Patient Reported Outcomes, Bio-repository, Claims System 2 System 3 System 4 System 1 System 6 System 5 System 7 NYC-CDRN Approval of Queries Data Request Data Response Centralized IRB Approval of Studies NYC-CDRN and National Research and Population Health Activities

Scalable Collaborative Infrastructure for a Learning Health Care System (SCILHS) Boston Children’s Hospital Boston Health Net (Boston Med Center, etc.) Partners HealthCare System (Mass General, Brigham & Women’s) Wake Forest Baptist Medical Center Beth Israel Deaconess Medical Center Cincinnati Children’s Hospital University of Texas Health Science Center/Houston Columbia U Medical Center and NewYork-Presbyterian Morehouse/Grady/RCMI U Mississippi Medical Center/RCMI

SCILHS

Advance Clinical Trials (ACT) CTSA-driven, NCATS funded Promote innovation and efficiency in participant recruitment into multi-site studies 21 CTSA sites i2b2, SHRINE

OHDSI and i2b2 Opportunity Information model Distinct from data schema i2b2 flexible but slows cross-entity research OHDSI highly defined Can use i2b2 or OHDSI schema, but OHDSI information model

OHDSI and i2b2 PCORI Clinical Data Research Network (CDRN) in U.S. 4 OHDSI/OMOP sites, 7 i2b2 sites (of 11) Store in OHDSI or i2b2 Convert between them and convert to PCORnet

CDRN Alignment Tasks Construct CDRN Data Model (DM) and CDRN Vocabulary Based on OMOP DM/Vocabulary Address PCOR requirements Address CDRN local needs Align with OMOP V5 development Align with other CDRN centers Address versioning Develop Map-Sets Develop vocabulary map-sets: Sources-to-OMOP i2b2-OMOP PCOR-OMOP Facilitate development of ETL processes Deliverables Design Person table Design terminology back-end Select/create demographics controlled terminology Create mappings of site terminology to controlled terminology for submitting sites Provide QA recommendations Document decisions and artifacts

Columbia CDRN Approach New York City Columbia PCORnet NYC CDRN i2b2 NYC CDRN OHDSI OMOP EHR OHDSI OMOP i2b2 SCILHS (Boston), ACT SHRINE OHDSI i2b2

Next panelist is Jon Duke

Population and Cohort Characterization Using the OMOP CDM Jon D Population and Cohort Characterization Using the OMOP CDM Jon D. Duke MD, MS Regenstrief Institute

Characterization in OHDSI In OHDSI, characterization = generating a comprehensive overview of a patient dataset Clinical (e.g., conditions, medications, procedures) Metadata (e.g., observation periods, data density) Supports Feasibility studies Hypothesis generation Data quality assessment Data sharing (aggregate-level)

OHDSI Tools for Characterization Population-Wide ACHILLES (Automated Characterization of Health Information at Large-scale Longitudinal Evidence Systems) Specific Cohorts HERACLES (Health Enterprise Resource and Care Learning Exploration System)

ACHILLES https://10.16.1.23/Achilles/#/RI/dashboard

ACHILLES & Data Quality

ACHILLES Needs to be run only once per CDM Hybrid R / web-based application Can specify minimum cell size to enable sharing where possible

Cohort Characterization CDM Cohorts can be created in a variety of ways Manual queries

Cohort Characterization CDM Cohorts can be created in a variety of ways Manual queries Cohort building tool (CIRCE)

Cohort Characterization CDM Cohorts can be created in a variety of ways Manual queries Cohort building tool (CIRCE) Import of externally defined patient list

HERACLES Demo

Have some index specific analyses

HERACLES = Specialist Can limit to specific analyses (e.g., just procedures) Can target specific concepts (e.g., a drug class, a particular condition) Can window on cohort-specific date ranges

HERACLES Designed to be run many times per CDM New cohorts New target areas of interest Official release in April Both ACHILLES and HERACLES are part of a suite of OHDSI tools available on GitHub

Population-level Estimation Patrick Ryan, PhD Janssen Research and Development 25 March 2015

Questions OHDSI Seeks to Answer from Observational Data Clinical characterization: Natural history: Who are the patients who have diabetes? Among those patients, who takes metformin? Quality improvement: what proportion of patients with diabetes experience disease-related complications? Population-level estimation Safety surveillance: Does metformin cause lactic acidosis? Comparative effectiveness: Does metformin cause lactic acidosis more than glyburide? Patient-level prediction Given everything you know about me and my medical history, if I start taking metformin, what is the chance that I am going to have lactic acidosis in the next year?

Opportunities for Standardization in the Evidence Generation Process Data structure : tables, fields, data types Data content : vocabulary to codify clinical domains Data semantics : conventions about meaning Cohort definition : algorithms for identifying the set of patients who meet a collection of criteria for a given interval of time Covariate construction : logic to define variables available for use in statistical analysis Analysis : collection of decisions and procedures required to produce aggregate summary statistics from patient-level data Results reporting : series of aggregate summary statistics presented in tabular and graphical form Protocol

Data Evidence Sharing Paradigms Single study Write Protocol Developcode Executeanalysis Compile result Patient-level data in OMOP CDM Real-time query evidence Develop app Design query Submit job Review result Large-scale analytics Develop app Execute script Explore results One-time Repeated

Standardized Large-scale Analytics Tools Under Development within OHDSI HERACLES: Cohort characterization ACHILLES: Database profiling AMIA CRI CALYPSO: Feasibility assessment PLATO: Patient-level predictive modeling AMIA CRI AMIA CRI Patient-level data in OMOP CDM CIRCE: Cohort definition OHDSI Methods Library: CYCLOPS CohortMethod SelfControlledCaseSeries SelfControlledCohort TemporalPatternDiscovery Empirical Calibration HOMER: Population-level causality assessment HERMES: Vocabulary exploration AMIA CRI LAERTES: Drug-AE evidence base http://github.com/OHDSI

Standardizing Analytic Decisions in Cohort Studies Decisions a researcher needs to make  parameters a standardized analytic routine needs to accommodate: Washout period length Nesting cohorts within indication Comparator population Time-at-risk Propensity score covariate selection strategy Covariate eligibility window Propensity score adjustment strategy (trimming, stratification, matching) Outcome model

Standardized Analytics to Enable Reproducible Research http://github.com/OHDSI

Open-source Large-scale Analytics through R Why is this a novel approach? Large-scale analytics, scalable to ‘big data’ problems in healthcare: millions of patients millions of covariates millions of questions End-to-end analysis, from CDM through evidence No longer de-coupling ‘informatics’ from ‘statistics’ from ‘epidemiology’

Standardize Covariate Construction

Standardize Model Diagnostics

Standardize Analysis and Results Reporting

To Go Forward, We Must Go Back “What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?” Strength Consistency Temporality Plausibility Experiment Coherence Biological gradient Specificity Analogy Austin Bradford Hill, “The Environment and Disease: Association or Causation?,” Proceedings of the Royal Society of Medicine, 58 (1965), 295-300. http://omop.org/2013Symposium

HOMER Implementation of Hill’s Viewpoints Consistency Temporality Strength Plausibility Experiment Coherence Biological gradient Specificity Analogy Comparative effectiveness Predictive modeling http://omop.org/2013Symposium

Concluding Thoughts We need to build informatics solutions to enable reliable, scalable evidence generation for population-level estimation Open-source large-scale analytics on a common data platform are required to facilitate efficient, transparent, and reproducible science A multi-disciplinary, community approach can greatly accelerate the research and development of shares solutions

Personalized Risk Prediction Nigam H Personalized Risk Prediction Nigam H. Shah MBBS, PhD Assistant Professor Dept. of Medicine (Biomedical Informatics) Stanford University

Phenotyping and Risk Prediction Time of “X” Phenotyping Patient’s Medical Record ###.## ###.## ###.## ###.## ###.## ###.## ###.## ###.##

Dataset and Prediction Task 1,182,751 wound assessments spanning 5 years 2008 2013 68 Healogics Wound Care Centers in 26 states Center Code 59,958 patients Age Gender Insurance Zip code 180,716 wounds Wound type Wound location Dimensions Edema Erythema Rubor Other wound qualities Each wound assessment:

Setup and Feature Engineering Wound characteristics e.g., Erythema (Census data linked by zip code) Socio-economic variables 90,166 training wounds Wound Dimensions Wound Location Center Code Wound Type Insurance ICD9 Codes Gender Age 30,055 validation wounds 180k wounds split 60:20:20 into training, validation and test sets. 81,990 training examples 27,470 validation 27,412 hold out test “Not Healed” wounds eliminated from training set. Other filters applied to wound status in training set… Note - all results here are on validation set Patient features Wound features First Wound Assessment features Total: 1,079 features 30,056 test wounds Hold out for final evaluation

Summary Advanced care decisions for wound care specialists Outlier at first visit AUROC 0.857 Specificity 0.724 Sensitivity 0.796 Precision/PPV 0.280 Screening for referral to wound care center

Phenotyping and Risk Prediction Time of “X” Phenotyping Patient’s Medical Record ###.## ###.## ###.## ###.## ###.## ###.## ###.## ###.##

Electronic Phenotyping All features Features used List of CUIs (ICD codes) Time aware Handling Time Ignore time The dimensions of electronic phenotyping: features, time, and complexity. We represent the phenotyping problem, in terms of three dimensions: a feature dimension (i.e., what features are used to characterize a phenotype), a time dimension (does the representation support a time variant definition of the phenotype) and a complexity dimension (how complex is the representation corresponding to a phenotype definition). “query” inclusion + exclusion rules Definition Regression model A generative model

XPRESS- EXtraction of Phenotypes from clinical Records using Silver Standards Input: getPatients.R -- config.R, keywords.tsv, ignore.tsv Output: feature_vectors.Rda Default model is LASSO (can also be changed to random forest), using 5-fold cross validation using 75% testing and 25% training data. Input: config.R has the model to build settings, and the number of cases and controls used to build the models. More configuration options are on the way, we are open to suggestions. Output: model.Rda contains de model built and the predictors used (features). This assures that we will be able to use the trained model to classify other patients in the future. Input: config.R – with term search settings Output: keywords.tsv and ignore.tsv Input: buildModel.R -- config.R, feature_vectors.Rda Output: model.Rda

XPRESS- EXtraction of Phenotypes from clinical Records using Silver Standards Acute Myocardial Infarction (OMOP) Chronic T2DM (PheKB) predict.R Input: config.R, model.Rda, classify.txt Output: predictions.txt Acc PPV Time 0.87 0.84 ? -- Can we train a phenotype for an outcome of interest, using noisy labeled data (using regression models, random forests, etc.)? -- Take a training set that we know is not perfect, but does have the true positives correctly labeled. -- At the end, characterize trade off between hours spent and accuracy attained for these two approaches. Want to make this trade-off explicit. Have a library of models that are reusable. Acc PPV Time 0.89 0.86 2hr Acc PPV Time 0.98 0.96 1900 Acc PPV Time 0.89 0.90 2hr

The Sources of Features (Weber et al.)

Questions and Discussion An Open Collaborative Approach for Rapid Evidence Generation David K. Vawdrey, PhD Jon D. Duke MD, MS George Hripcsak MD, MS Patrick Ryan PhD Nigam H. Shah MBBS, PhD AMIA Joint Summits on Translational Science March 25, 2015 Sample 1