Reuse of Electronic Medical Records for Research Our architecture Two examples
Francis Collins, NEJM 9/16/2009 VanderbiltBioVU: A clinical laboratory for genomics and pharmacogenomics
Vanderbilt BioVU: an Opt-Out DNA Biobank Extracting DNA from left over blood samples
De-Identification eligible John Doe One way hash 32ef34a6e88c2… scrubbed Extract DNA 32ef34a6e88c2… John Doe 1.7 million records ~135,000 samples (>14,000 children) Research Identifier EMR
Patient Chart
Sample accrual into BioVU Currently >40 active projects w/ DNA >100 projects using Synthetic Derivative
Platform for EMR-clinical research at VUMC De-identified DNA Discarded blood samples Synthetic Derivative De-identification Clinical Notes WizOrder Orders Clinical Messaging StarChart ICD9, CPT Test Results
The “demonstration project” Are genotype-phenotype relations replicated in BioVU? Genotype “high-value” SNPs in the first 10,000 samples accrued. – 21 established loci (>1 SNP for some) – in 5 diseases with known associations: Atrial fibrillation Crohn’s disease Multiple Sclerosis Rheumatoid arthritis Type II Diabetes Develop “electronic phenotype algorithms” to identify cases and controls
Finding cases accurately Billing codes alone o nly 50-80% accurate Negation terms – “I don’t think this is MS” Context clues: – “FAMILY MEDICAL HISTORY: positive for rheumatoid arthritis.” Others – Note titles: “ Multiple Sclerosis Clinic Note“ True cases Natural Language Processing Billing codes Medications & Labs Genetic association tests
Rheumatoid Arthritis–Case Definition Evolution #Definition# Cases (in first 10k in BioVU) Problem 1ICD9 codes for RA + Medications (only in problem list) 371Found incomplete problem lists 2Same as above but searched notes411Patients billed as RA but actually other conditions, overlap syndromes, juvenile RA 3Above + require “rheumatoid arthritis” and small list of exclusions 358Overlap syndromes with other autoimmune conditions, conditions in which physicians did not agree 4Above + exclusion of other inflammatory arthritides 255PPV = 97%; a few “possible RA” or family history items remained
Finding cases: Rheumatoid Arthritis Definite Cases (algorithm-defined) Possible Cases (require manual review) Controls (algorithm-defined) Excluded (algorithm-defined) 7121 Used for analysis
Validating EMR phenotype algorithms (Using first 10,000 patients in BioVU) DiseaseMethodsDefinite CasesControlsCase PPVControl PPV Atrial fibrillationNLP of ECG impressions ICD9 codes CPT codes %100% Crohn’s DiseaseICD9 codes Medications (NLP) % Type 2 DiabetesICD9 codes Medications (NLP) NLP exclusions Labs % Multiple SclerosisICD9 codes or text diagnosis %100% Rheumatoid Arthritis ICD9 codes Medications (NLP) NLP exclusions %100% NLP = Natural language processing Common themes: Billing codes – 5/5 NLP – 5/5 Meds – 4/5 Labs – 2/5 Common themes: Billing codes – 5/5 NLP – 5/5 Meds – 4/5 Labs – 2/5
Results Odds Ratio 2.0 Ritchie et al., AJHG 2010 rs Chr. 4q25 rs Chr. 4q25 rs IL23R rs Chr. 5 rs Chr. 5 rs NOD2 rs PTPN22 rs DRB1*1501 rs IL2RA rs IL7RA rs Chr. 6 rs RSBN1 rs PTPN22 rs TCF7L2 rs TCF7L2 rs TCF7L2 rs CDKN2B rs FTO rs5219KCNJ11 rs5215KCNJ11 rs IGF2BP2 Atrial fibrillation Crohn's disease Multiple sclerosis Rheumatoid arthritis Type 2 diabetes disease gene / region marker observedpublished
The eMERGE Network Coordinating center 135,000 20,000 10,000 4,000 3,000 Goal: to assess utility of DNA collections integrated with electronic medical records (EMRs) as resources for genome science Outcome: GWAS data in >20,000 subjects with EMRs. Vanderbilt phenotype: normal variability in QRS duration
GH Marsh Mayo NW Domain experts define phenotype (VU) Create initial EMR-based algorithm (VU) Evaluate & refine Share algorithm Hypothyroidism: An eMERGE network phenotype
Hypothyroidism algorithm Diagram courtesy Mike Conway (Mayo)
Site Case PPV (%) Control PPV (%) Group Health Marshfield Mayo Clinic 8296 Northwestern Vanderbilt All sites (weighted) Same algorithm, deployed at five sites Denny et al., AJHG 2011 Hypothyroidism Validation
Hypothyroidism: “No-Genotyping” GWAS FOXE1 Denny et al., AJHG 2011
eMERGE Phenotypes SitePrimary phenotypeSecondary Phenotypes Group HealthDementiawhite blood cell counts MarshfieldCataractsdiabetic retinopathy Mayo ClinicPeripheral Arterial Disease red blood cell counts ESR levels NorthwesternType 2 Diabeteslipids and height VanderbiltNormal cardiac conductionPheWAS Network Phenotypes Autoimmune Hypothyroidism Resistant hypertension =novel associations discovered bold=GWAS completed with significant results