Constructing OHDSI network studies

Slides:



Advertisements
Similar presentations
Connecting to Databases. relational databases tables and relations accessed using SQL database -specific functionality –transaction processing commit.
Advertisements

AD User Import From SIMS.NET
Configuration management
Copyright Hub Software Engineering Ltd 2010All rights reserved Hub Document Manager Product Overview.
Kyle Thurow, Kyle Neuschaefer, Alexander Matusiak, and Justin Carroll.
| imodules.com RE Adapter for Encompass (v2.0) Encompass and The Raiser's Edge® Integrated Data Solution CONFIDENTIAL.
Test Case Management and Results Tracking System October 2008 D E L I V E R I N G Q U A L I T Y (Short Version)
HP Quality Center Overview.
July 11 th, 2005 Software Engineering with Reusable Components RiSE’s Seminars Sametinger’s book :: Chapters 16, 17 and 18 Fred Durão.
Background Info The UK Mirror Service provides mirror copies of data and programs from many sources all over the world. This enables users in the UK to.
Software Configuration Management (SCM)
"Piled Higher and Deeper" by Jorge Cham
Introduction to the Enterprise Library. Sounds familiar? Writing a component to encapsulate data access Building a component that allows you to log errors.
WIKI IN EDUCATION Giti Javidi. W HAT IS WIKI ? A Wiki can be thought of as a combination of a Web site and a Word document. At its simplest, it can be.
Benefits of PL/SQL. 2 home back first prev next last What Will I Learn? In this lesson, you will learn to: –List and explain the benefits of PL/SQL –List.
Version control Using Git Version control, using Git1.
Architectural Design Identifying system components and their interfaces.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.
Database Management Systems (DBMS)
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Introduction HNDIT DBMS 1. Database Management Systems Module code HNDIT Module title Database Management Systems Credits2HoursLectures15.
Infrastructure as code. “Enable the reconstruction of the business from nothing but a source code repository, an application data backup, and bare metal.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
CMPE 226 Database Systems April 19 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
NCQA’s Approach to the New Quality Measurement Landscape
Systems Analysis and Design in a Changing World, Fifth Edition
SharePoint 101 – An Overview of SharePoint 2010, 2013 and Office 365
Build and Test system for FairRoot
Martijn Schuemie, Marc Suchard, Patrick Ryan, Tomas Bergvall
Development Environment
Unit Testing.
FeatureExtraction v2.0 Martijn Schuemie.
What are they? The Package Repository Client is a set of Tcl scripts that are capable of locating, downloading, and installing packages for both Tcl and.
Population level estimation best practices
Shared Services with Spotfire
Data Virtualization Tutorial: Introduction to SQL Script
OHDSI Method Evaluation
PLM, Document and Workflow Management
The Use of AMET and Automated Scripts for Model Evaluation
Version control, using Git
Designing and executing OHDSI studies
NURS 736: Technology Solutions for Knowledge Generation in Healthcare
OHDSI population-level estimation workgroup meeting
Perspectives on the intersection between computer science and psychology Developing reproducible – and reusable – methods through research software engineering.
Software Support Framework
Distribution and components
Introduction to R Programming with AzureML
Enterprise Computing Collaboration System Example
THE ARCHITECTURE AND FUNCTIONALITIES OF SELECTED MODULES.
Systems Analysis – ITEC 3155 Evaluating Alternatives for Requirements, Environment, and Implementation.
C2CAMP (A Working Title)
Computerized and Manual Systems
X in [Integration, Delivery, Deployment]
JD Edwards Support and Oracle Cloud Infrastructure: A Successful Path to Oracle Cloud
Design and Programming
Automated Testing and Integration with CI Tool
Course: Module: Lesson # & Name Instructional Material 1 of 32 Lesson Delivery Mode: Lesson Duration: Document Name: 1. Professional Diploma in ERP Systems.
Project Information Management Jiwei Ma
REAL-TIME, INTERACTIVE DOCUMENT AUTOMATION
JENKINS TIPS Ideas for making your life with Jenkins easier
External Validation of Existing Stroke Risk Models
Internet Protocols IP: Internet Protocol
CS 240 – Advanced Programming Concepts
Prepared by Peter Boško, Luxembourg June 2012
SOFTWARE DEVELOPMENT LIFE CYCLE
David Cleverly – Development Lead
Web Application Development Using PHP
Preparing for the Windows 8.1 MCSA
Presentation transcript:

Constructing OHDSI network studies Marc A. Suchard

A journey from data set to paper Most epidemiologists view a study as a journey from data set to paper. The protocol might be your map You will come across obstacles that you will have to overcome Several steps will require manual intervention In the end, it will be impossible to retrace your exact steps

Current epi studies are non-reproducible How do we know what happened? How do we know if it was done correctly? How do we know how well it worked? How could we be more efficient? How can we deal with more complex studies? How can multiple people work together on the same analysis? How could other reproduce this study on a different database?

What should OHDSI studies look like? Paper Database A study should be like a pipeline A fully automated and reproducible process from database to paper ‘Performing a study’ = building the pipeline

Distributed research network Many observational databases in OHDSI large numbers large diversity We cannot share patient-level data Solution: analysis code ‘visits’ the data only population-level data is shared

Hub and spoke network Data site A Data site F Data site B Coordinating center Data site E Data site C Data site D

OHDSI network Stanford UCLA IMS Columbia University University of Hong Kong Taipei Medical University Regenstrief University of South Australia Janssen Ajou School of Medicine

Treatment pathway study Stanford UCLA IMS Columbia University University of Hong Kong Taipei Medical University Regenstrief University of South Australia Janssen Ajou School of Medicine

Everyone can initiate and lead a study See the Wiki Collaborative Study FAQ: http://www.ohdsi.org/web/wiki/doku.php?id=research:studies:faq Post preliminary protocol on Wiki Invite community review Post final protocol on Wiki can be used for IRB approval Develop study code, post on GitHub Test code at at least 2 sites Invite sites to join

Implementation Standards: Study coordinator Data site Standards: PostgreSQL, Oracle, SQL Server, RedShift, or APS OMOP Common Data Model Windows, macOs, Linux R

Implementation Mini-Sentinel: SAS EU-ADR: Java application (Jerboa) Study coordinator Data site Content: R package Delivery: GitHub (StudyProtocols repo) E.g. https://github.com/OHDSI/StudyProtocols/tree/master/KeppraAngioedema Mini-Sentinel: SAS EU-ADR: Java application (Jerboa) Why R? Open source Efficient in deploying advanced computing code Easy to integrate different modules Can we written by person ≠ Marc

Implementation Content: zip file containing Delivery: Plain text Study coordinator Data site Content: zip file containing Plain text CSV (comma-separated values) PNG (plots) … Needs to be: Non-identifiable information Human reviewable Delivery: E-mail Amazon S3

Study package User experience Under the hood Examples: Keppra-angioedema study packages

Study package user experience Steps for the user: Install the package (and dependencies) Run the analyis Review the results Share the results

Study package user experience Steps for the user: Install the package (and dependencies) Run the analyis Review the results Share the results

Installing the package Using R’s package infrastructure to deploy packages Not all packages are in CRAN. There’s an OHDSI package repository (using “drat”)

Study package user experience Steps for the user: Install the package (and dependencies) Run the analyis Review the results Share the results

Running the analysis Ideally a single command to run the entire analysis This should create a folder called ‘export’ with all files that will be shared

Study package user experience Steps for the user: Install the package (and dependencies) Run the analyis Review the results Share the results

Review the results

Study package user experience Steps for the user: Install the package (and dependencies) Run the analyis Review the results Share the results

Share the results Push the StudyResults.zip file to an S3 bucket S3 is Amazon’s storage service with secure up and download One bucket per study Two accounts per bucket: Study coordinator can read and write Study participants can write Contact the coordinating center for your study-specific bucket

Under the hood Executing the analysis typically entails: Instantiating cohorts Running some SQL against the CDM database Do some further processing in R Write results to the export folder

Example: Keppra – angioedema study OHDSI study: Does exposure to Keppra (levetiracetam) lead to an increased risk of angioedema? Compared to phenytoin https://github.com/OHDSI/StudyProtocols/tree/master/KeppraAngioedema

Full traceability Study package contains Cohort definitions (e.g. angioedema definition) OhdsiRTools::insertCohortDefinitionInPackage(2193, "Angioedema") All analysis details for the CohortMethod package CohortMethod package describes data extraction Code to generate tables and figures Code to generate full report

We can check for correctness We can review the study code We should make the study code publicly available as part of the paper Large parts of the study are automatically checked using unit tests

We can evaluate how well the study worked Included 100 negative control outcomes Results show little residual confounding when using propensity score matching

Writing the study was very efficient Reuse of R code in CohortMethod, DatabaseConnector, SqlRender, EmpiricalCalibration, etc. Implementation took days instead of months Next study will be faster

Complexity is not a problem Use software engineering approaches to deal with complexity: Abstraction Encapsulation Writing clear code Re-use

Several people can work on the same analysis through version control Mary Version 2 History of changes: How did the final analysis come about? Version control system Version 1 Version 2 Use for everything: Protocol Analysis code Paper X Me Version 1 Version 2 Version 3 Version 2 Time

Commit log

Easy to rerun on different data The Keppra – Angioedema study was run on: Columbia University EHR Stanford EHR Cerner (University of Texas) Pharmetrics Plus (IMS) Optum Truven CCAE Truven MDCD Truven MDCR …

Viewing a study as a pipeline has many advantages Full traceability Ability to check for correctness Ability to evaluate using controls More efficient Ability to deal with complexity Ability to work with several people on one analysis Easy to rerun on different data

Study package structure R contains all user-executable R code

Study package structure extras contains code used by the study coordinator: Code for importing CIRCE definitions Testing code Code for aggregating data across sites

Study package structure inst is for non-R content of the payload SQL Settings files …

Study package structure DESCRIPTION tells R about package dependencies (E.g. methods in the methods library)

Study package structure README.md is what is shown in the GitHub front page

Essentials Reading OHDSI Collobarative Studies FAQ: http://www.ohdsi.org/web/wiki/doku.php?id=research:studies:faq R packages manual: http://r-pkgs.had.co.nz/ SqlRender vignette: https://raw.githubusercontent.com/OHDSI/SqlRender/master/inst/doc/UsingSqlRender.pdf OHDSI R and SQL code styles: http://www.ohdsi.org/web/wiki/doku.php?id=development:ohdsi_code_style_for_r http://www.ohdsi.org/web/wiki/doku.php?id=development:ohdsi_code_style_for_sql Sites: SqlRender test site: http://sqlrenderweb.ohdsi.org:2121/ Contacts: schuemie@ohdsi.org msuchard@ucla.edu

Conclusions Anyone can initiate a study and be a study coordinator Follow the steps described in the FAQ Need to have a clear research question Need someone with R and SQL skills OHDSI standards for distributed analysis are evolving Check latest study package for most recent ideas Martijn and Marc can help with the writing of the package End-product: fully reproducible (and replicable) study with demonstrated operating characteristics