Preparing Electronic Health Records for Multi-Site CER Studies Michael G. Kahn 1,3,4, Lisa Schilling 2 1 Department of Pediatrics, University of Colorado, Denver 2 Department of Medicine, University of Colorado, Denver 3 Colorado Clinical and Translational Sciences Institute 4 Department of Clinical Informatics, Children’s Hospital Colorado AcademyHealth Annual Research Meeting Building a Data Infrastructure for Multi-stakeholder Comparative Effectiveness Research 26 June 2012 Funding provided by AHRQ 1R01HS (Scalable Architecture for Federated Translational Inquiries Network)
Setting the context: AHRQ Distributed Research Networks AHRQ ARRA OS: Recovery Act 2009: Scalable Distributed Research Networks for Comparative Effectiveness Research (R01) Goal: enhance the capability and capacity of electronic health networks designed for distributed research to conduct prospective, comparative effectiveness research on outcomes of clinical interventions. Funding provided by AHRQ 1R01HS (Scalable Architecture for Federated Translational Inquiries Network)
AHRQ Distributed Research Networks Funded Projects SAFTINet: Scalable Architecture for Federated Therapeutic Inquiries Network –Lisa M. Schilling, University of Colorado Denver (R01 HS ) SCANNER: Scalable National Network for Effectiveness Research –Lucila Ohno-Machado, University of California San Diego (R01 HS ) SPAN: Scalable PArtnering Network for CER: Across Lifespan, Conditions, and Settings –John F. Steiner, Kaiser Foundation Research Institute (R01 HS ) Funding provided by AHRQ 1R01HS (Scalable Architecture for Federated Translational Inquiries Network)
SAFTINet Partners Clinical partners –Colorado Community Managed Care Network and the Colorado Associated Community Health Information Enterprise Colorado Federally Qualified Health Centers –Denver Health and Hospital Authority –Cherokee Health Systems, Tennessee Technology partners –University of Utah, Center for High Performance Computing –QED Clinical, Inc., d/b/a CINA Medicaid partners –Colorado Health Care Policy & Financing –Utah Department of Public Health (partnership in development) –TennCare and Tennessee managed care organizations (partnership in development) Leadership –University of Colorado Denver –American Academy of Family Physicians, National Research Network
Key Differences between EHR and CER data EHR DataCER DataEHR->CER task Fully identifiedLDS or de-identifiedStrip identifiers; keep mappings? Local codes and valuesStandardized codes and values Terminology and value set mapping (manual!) Broad data domainsFocused data domainsFiltering by patient, encounter, date, facility Variable data quality; high level of missingness Substantial data quality processes applied Data profiling; iterative investigations Lots of free textFully coded data onlyNLP or ignore free text Local access onlyShared accessDistributed or centralized data access Single data sourceMultiple data sourcesRecord linkage
Key Differences between EHR and CER data EHR DataCER DataEHR->CER task Fully identifiedLDS or de-identifiedStrip identifiers; keep mappings? Local codes and valuesStandardized codes and values Terminology and value set mapping (manual!) Broad data domainsFocused data domainsFiltering by patient, encounter, date, facility Variable data quality; high level of missingness Substantial data quality processes applied Data profiling; iterative investigations Lots of free textFully coded data onlyNLP or ignore free text Local access onlyShared accessDistributed or centralized data access Single data sourceMultiple data sourcesRecord linkage
Key Differences between EHR and CER data EHR DataCER DataEHR->CER task Fully identifiedLDS or de-identifiedStrip identifiers; keep mappings? Local codes and valuesStandardized codes and values Terminology and value set mapping (manual!) Broad data domainsFocused data domainsFiltering by patient, encounter, date, facility Variable data quality; high level of missingness Substantial data quality processes applied Data profiling; iterative investigations Lots of free textFully coded data onlyNLP or ignore free text Local access onlyShared accessDistributed or centralized data access Single data sourceMultiple data sourcesRecord linkage
Key Differences between EHR and CER data EHR DataCER DataEHR->CER task Fully identifiedLDS or de-identifiedStrip identifiers; keep mappings? Local codes and valuesStandardized codes and values Terminology and value set mapping (manual!) Broad data domainsFocused data domainsFiltering by patient, encounter, date, facility Variable data quality; high level of missingness Substantial data quality processes applied Data profiling; iterative investigations Lots of free textFully coded data onlyNLP or ignore free text Local access onlyShared accessDistributed or centralized data access Single data sourceMultiple data sourcesRecord linkage
Key Differences between EHR and CER data EHR DataCER DataEHR->CER task Fully identifiedLDS or de-identifiedStrip identifiers; keep mappings? Local codes and valuesStandardized codes and values Terminology and value set mapping (manual!) Broad data domainsFocused data domainsFiltering by patient, encounter, date, facility Variable data quality; high level of missingness Substantial data quality processes applied Data profiling; iterative investigations Lots of free textFully coded data onlyNLP or ignore free text Local access onlyShared accessDistributed or centralized data access Single data sourceMultiple data sourcesRecord linkage
Key Differences between EHR and CER data EHR DataCER DataEHR->CER task Fully identifiedLDS or de-identifiedStrip identifiers; keep mappings? Local codes and valuesStandardized codes and values Terminology and value set mapping (manual!) Broad data domainsFocused data domainsFiltering by patient, encounter, date, facility Variable data quality; high level of missingness Substantial data quality processes applied Data profiling; iterative investigations Lots of free textFully coded data onlyNLP or ignore free text Local access onlyShared accessDistributed or centralized data access Single data sourceMultiple data sourcesRecord linkage
Key Differences between EHR and CER data EHR DataCER DataEHR->CER task Fully identifiedLDS or de-identifiedStrip identifiers; keep mappings? Local codes and valuesStandardized codes and values Terminology and value set mapping (manual!) Broad data domainsFocused data domainsFiltering by patient, encounter, date, facility Variable data quality; high level of missingness Substantial data quality processes applied Data profiling; iterative investigations Lots of free textFully coded data onlyNLP or ignore free text Local access onlyShared accessDistributed or centralized data access Single data sourceMultiple data sourcesRecord linkage
A common data model is critical! CINA CDR Other EHR Local Data Warehouse Other EHR Existing Clinical Registries Other EHR Limited Data Set Common Data Model Common Terminology Common Query Interface Limited Data Set Common Data Model Common Terminology Limited Data Set Common Data Model Common Terminology Crossing the CER chasm !! CER
ROSITA-GRID-PORTAL
Grid Portal
Why ROSITA? ROSITA: Reusable OMOP and SAFTINet Interface Adaptor ROSITA: The only bilingual Muppet Converts EHR data into research limited data set 1.Replaces local codes with standardized codes 2.Replaces direct identifiers with random identifiers 3.Supports clear-text and encrypted record linkage 4.Provides data quality metrics 5.Pushes data sets to grid node for distributed queries
ROSITA: transforming EHR data for comparative effectiveness research
SAFTINet ETL specifications
SAFTINet ETL Specifications
Transforming EHR Data: What does ROSITA do?
What does ROSITA do?
Why ROSITA? Converts EHR data into research limited data set 1.Replaces local codes with standardized codes 2.Replaces direct identifiers with random identifiers 3.Supports clear-text and encrypted record linkage 4.Provides data quality metrics 5.Pushes data sets to grid node for distributed queries
Do not have Medicaid figured out
ROSITA Security Discussion Framework
ROSITA: Current Status Software development underway –In Phase 1: 16 week development clinical data only; no Medicaid –Phase 2: Medicaid + record linkage OMOP data model V4 finalized! –Clinical & financial extensions All SAFTINet partners have begun ETL activities –Two sites have provided full ETL extracts for development and testing Everything is/will be available
Questions? Funding provided by AHRQ 1R01HS (Scalable Architecture for Federated Translational Inquiries Network)