Evan Sholle, MS Weill Cornell Medicine Twitter: #AMIA2017 Secondary Use of Patients’ Electronic Records (SUPER): An Approach for Meeting Specific Data Needs of Clinical and Translational Researchers Methods for Identification, Classification, and Association using EHR Data S23 Evan Sholle, MS Weill Cornell Medicine Twitter: #AMIA2017
Disclosure Neither I nor my spouse have any relevant relationships with commercial interests to disclose. AMIA 2017 | amia.org
Learning Objectives After participating in this session the learner should be better able to: Describe Weill Cornell Medicine's methodology and infrastructure for aggregating and integrating patient electronic data for secondary use. AMIA 2017 | amia.org
Introduction Obtaining electronic patient data for research is challenging Multiple electronic health record (EHR) systems Transformation of clinical data into scientific variables Regulatory approval No “one-size-fits-all” approach exists Clinical data warehouse Centralized resource Normalizes data to clinical reference terminologies Consistent information model AMIA 2017 | amia.org
Methods: institutional setting Weill Cornell Medicine Multispecialty outpatient group practice Over 1000 clinical faculty members Over 650,000 patients/year treated Epic Ambulatory since 2000 NewYork-Presbyterian Hospital 2600 beds Over 2 million visits/year Allscripts Sunrise Clinical Manager (SCM) since 2007 Ancillary EHR systems to cover specific use cases AMIA 2017 | amia.org
Methods: data sources SUPER: Secondary Use of Patients’ Electronic Records Stores data from multiple clinical and research information systems Entirety of data gathered to support clinical, billing, and research activities across WCM/NYP, including Epic Clarity Genomic information systems Clinical trials management system REDCap Allscripts SCM Eagle Perioperative ancillary EHR systems (CompuRecord, OR Manager, ProVation, etc.) AMIA 2017 | amia.org
Methods: data acquisition process AMIA 2017 | amia.org
Methods: ETL, indexing, terminology ETL code management Tracked using Subversion (SVN) Scheduled to avoid conflict Paver allows for automation and modularization Indexing Cron job powered by Paver syncs index generation with ETL completion Terminology management None conducted – reliance on data harmonization implemented by source teams AMIA 2017 | amia.org
Methods: documentation, hardware Documentation and workflow management Multiple off-the-shelf work management tools ServiceNow Jira Slack Confluence Weekly code review Hardware Four Microsoft SQL Server 2014 database servers Five Linux virtual machines AMIA 2017 | amia.org
Results SUPER supported multiple tools for secondary use of patient data Cohort discovery via i2b2 De-identified Identified/custom EHR analytics OMOP CDM Customized data marts Data capture REDCap DDP plugin (SUPER REDCap) Multi-institutional data sharing TriNetX NYC-CDRN SUPER SUPER AMIA 2017 | amia.org
Results: data flow to sources SUPER AMIA 2017 | amia.org
Discussion SUPER supported multiple scientific workflows SUPER integrated both raw and transformed data from disparate information systems, including clinical data warehouses AMIA 2017 | amia.org
Discussion: what is SUPER? Clinical data warehouse Centralized resource Normalizes data to clinical reference terminologies Consistent information model Data lake Ad hoc accession Storage in untransformed format Data marts Designed to address specific research questions Data models built-to-purpose per specifications AMIA 2017 | amia.org
What is SUPER? Clinical data warehouse Data lake Data marts Centralized resource Normalizes data to clinical reference terminologies Consistent information model Data lake Ad hoc accession Storage in untransformed format Data marts Designed to address specific research questions Data models built-to-purpose per specifications AMIA 2017 | amia.org
What is SUPER? Clinical data warehouse Data lake Data marts Centralized resource Normalizes data to clinical reference terminologies Consistent information model Data lake Ad hoc accession Storage in untransformed format Data marts Designed to address specific research questions Data models built-to-purpose per specifications AMIA 2017 | amia.org
What is SUPER? Clinical data warehouse Data lake Data marts Centralized resource Normalizes data to clinical reference terminologies Consistent information model Data lake Ad hoc accession Storage in untransformed format Data marts Designed to address specific research questions Data models built-to-purpose per specifications AMIA 2017 | amia.org
Discussion: SUPER as data kitchen AMIA 2017 | amia.org
Acknowledgements Funding Joint Clinical Trials Office Clinical and Translational Science Center (UL1 TR000457) ARCH Leadership Curtis Cole, MD Stephen Johnson, PhD John Leonard, MD Jyotishman Pathak, PhD Vinay Varughese Research Informatics Prakash Adekkanattu Cindy Chen David Kraemer Steven Flores Joseph Kabariti Ryan McGregor Julian Schwartz Jacob Weiser Anthony DiFazio Sean Pompea Marcos Davila, MS AMIA 2017 | amia.org
Questions A researcher affiliated with an academic medical center has engaged with a clinical research informatics group to obtain a data set for a specific group of patients: women aged 40-74 with a diagnosis of small cell lung cancer. This researcher would like to examine all instances of these patients' CBC results, as well as instances of specific medications ordered. Which of the following methodologies is the clinical research informatics group most likely to employ in addressing the researcher's needs? Grant the researcher access to a de-identified data warehouse and instruct her to write her own code to access the data Work with the researcher to define the parameters in terms of structured clinical reference terminologies, then create a data mart with the variables the researcher requests Request that the researcher refine her query further before re-engaging the clinical research informatics group Suggest that the researcher use i2b2 to obtain the data Grant the researcher access to a limited data set with all labs and medication for all women AMIA 2017 | amia.org
Answer Grant the researcher access to a de-identified data warehouse and instruct her to write her own code to access the data Work with the researcher to define the parameters in terms of structured clinical reference terminologies, then create a data mart with the variables the researcher requests Request that the researcher refine her query further before re-engaging the clinical research informatics group Suggest that the researcher use i2b2 to obtain the data Grant the researcher access to a limited data set with all labs and medication for all women Explanation: Researchers with specific requests for defined sets of rows-and-columns data are best served by a data mart, which is tailored to their individual needs. Granting the researcher access to a data warehouse presumes her familiarity with the raw data model, as well as the technical skills required to query it. Requesting that the researcher refine her query is a good idea, but this is better accomplished as part of the definitional work required to build a data mart. i2b2 may be able to address this query, depending on its local implementation parameters, but the researcher would still need help to generate the data set. Granting the researcher access to a limited data set with all labs and medications for women would go beyond the minimum necessary PHI disclosure for the request. AMIA 2017 | amia.org
AMIA is the professional home for more than 5,400 informatics professionals, representing frontline clinicians, researchers, public health experts and educators who bring meaning to data, manage information and generate new knowledge across the research and healthcare enterprise. AMIA 2017 | amia.org
Thank you! evs2008@med.cornell.edu