Square wheels: electronic medical records for discovery research in rheumatoid arthritis Robert M. Plenge, M.D., Ph.D. October 30, 2009 NCRR sponsored.

Slides:



Advertisements
Similar presentations
Meaningful Use and Health Information Exchange
Advertisements

Design and Implementation of a Web-Based Patient Portal Linked to an Ambulatory Care Electronic Health Record: Patient Gateway for Diabetes Collaborative.
Biologics for Children with Rheumatic Diseases An Introduction.
Health Services Research Howard Bailit, DMD, PhD University of Connecticut Dental Informatics and Dental Research Conference National Institutes of Health.
A quantitative approach to accurate classification of RA. Tom Huizinga.
Overview Clinical Documentation & Revenue Management: Capturing the Services Prepared and Presented by Linda Hagen and Mae Regalado.
Continuity Clinic Coding Patient Encounters EPISODE 1 Concepts.
Area 4 SHARP Face-to-Face Conference Phenotyping Team – Centerphase Project Assessing the Value of Phenotyping Algorithms June 30, 2011.
Preceptor: Louise A. Mawn, M.D. May 30, Medical Documentation Medical record serves many functions For health care providers it facilitates: Communication.
Reuse of Electronic Medical Records for Research Our architecture Two examples.
From Bedside to Bench and Back
Division of Biomedical Informatics Beyond Interoperability: What Ontology Can Do for the EHR William R. Hogan, MD, MS July 30 th, 2011 International Conference.
Data Collection in Private Practice and Implementation with Electronic Medical Records Martin J Bergman, MD Chief—Rheumatology Taylor Hospital Ridley Park,
University of Pittsburgh Department of Biomedical Informatics Healthcare institutions have established local clinical data research repositories to enable.
The ICH E5 Question and Answer Document Status and Content Robert T. O’Neill, Ph.D. Director, Office of Biostatistics, CDER, FDA Presented at the 4th Kitasato-Harvard.
Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology.
PrimeSUITE’s Practice Management and Electronic Health Record Software
Study: Statins increase life expectancy Detroit News, Associated Press
HIBBs is a program of the Global Health Informatics Partnership Introduction to Form Design Regional East African Centre for Health Informatics (REACH-INFORMATICS)
Identifying RA patients from the electronic medical records at Partners HealthCare Robert Plenge, M.D., Ph.D. VA Hospital July 20, 2010 HARVARD MEDICAL.
Screening and Early Detection Epidemiological Basis for Disease Control – Fall 2001 Joel L. Weissfeld, M.D. M.P.H.
1 Demographic and Clinical Characteristics of Rheumatoid Arthritis patients in some Arab States: Preliminary analysis Mohammed Hammoudeh,M.D Division of.
Opportunities for Big Data in Medical Records Mike Conlon, PhD
Elizabeth Karlson, MD Associate Professor of Medicine
Improving Data Quality and Quality Assurance in Newborn Screening by Including the Bloodspot Screening Collection Device Serial Number on Birth Certificates.
LARA MANGRAVITE SAGE BIONETWORKS ON BEHALF OF THE RA CHALLENGE ORGANIZING TEAM The DREAM Rheumatoid Arthritis Responder Challenge: Motivation, Data, Scoring.
Research Techniques Made Simple: Databases for Clinical Research Katrina Abuabara, MD MA David Margolis, MD PhD University of Pennsylvania.
1 Demographic and Clinical Characteristics of Rheumatoid Arthritis patients in some Arab States: Preliminary analysis Mohammed Hammoudeh Division of Rheumatology,
Butte Lab Journal Club 16 Aug 2010 Alexander A. Morgan.
Shared Health Research Information Network Andrew McMurry, MS SHRINE Architect Harvard Medical School Center for BioMedical Informatics Children’s Hospital.
Consent2Share Linking Cohort Discovery to Consent David R Nelson MD Assistant Vice President for Research Professor of Medicine Director, Clinical and.
Physician Performance and Reporting Commentary David W. Bates, MD, MSc Medical Director of Clinical and Quality Analysis, Partners Healthcare Chief, Division.
PGPop: PharmacoGenomic discovery and replication in very large patient POPulations PGPop: SUMMARY PGPop was conceived as a network resource to provide.
Information Technology for the Health Professions, Third Edition Lillian Burke and Barbara Weill Copyright ©2009 by Pearson Education, Inc. Upper Saddle.
Genetic Analysis in Human Disease Nataly Manjarrez, PhD GENETIC ANALYSIS IN HUMAN DISEASE Nataly Manjarrez Orduño, PhD Assistant Investigator, Feinstein.
Anticipated FY2016 Appropriations Agency$ Million NIH200 Cancer70 Cohort130 FDA10 Office of the Natl Coord. for Health IT (ONC) 5 TOTAL215 Mission: To.
ICD-10 Transition: Implications for the Clinical Research Community Jesica Pagano-Therrien, MSN, RN, CPNP HRPP Educator UMCCTS Office of Clinical Research.
Components of HIV/AIDS Case Surveillance: Case Report Forms and Sources.
OVERVIEW OF THE SYNTHETIC DERIVATIVE June 29, 2012 Melissa Basford, MBA Program Manager – Synthetic Derivative.
Medical Research with Military Healthcare System Databases 25 October 2006 LTC John S Scott.
EXPLORING DATA AND COHORT DISCOVERY IN THE SYNTHETIC DERIVATIVE.
Acknowledgements Contact Information Anthony Wong, MTech 1, Senthil K. Nachimuthu, MD 1, Peter J. Haug, MD 1,2 Patterns and Rules  Vital signs medoids.
Risk Prediction of Complex Disease David Evans. Genetic Testing and Personalized Medicine Is this possible also in complex diseases? Predictive testing.
Prediction statistics Prediction generally True and false, positives and negatives Quality of a prediction Usefulness of a prediction Prediction goes Bayesian.
Consultant Rheumatologist
A Claims Database Approach to Evaluating Cardiovascular Safety of ADHD Medications A. J. Allen, M.D., Ph.D. Child Psychiatrist, Pharmacologist Global Medical.
Integrated Population Health Bill Young Product Strategist/Principal Thomas Hohner Senior Report Developer.
Research Tools Brought to you by the Clinical and Translational Science Institute Presented by: Terri Shkuda Systems Analyst Research Informatics The Penn.
Uses of the NIH Collaboratory Distributed Research Network Jeffrey Brown, PhD for the DRN Team Harvard Pilgrim Health Care Institute and Harvard Medical.
Achieving automated health data linkages for learning healthcare systems: Lessons learned Allison Devlin, MS Program Director, Comparative Effectiveness.
Linking Electronic Health Records Across Institutions to Understand Why Women Seek Care at Multiple Sites for Breast Cancer Caroline A. Thompson, PhD,
Prevalence and clinical risk factors for interstitial lung disease in rheumatoid arthritis in a resource limited setting A Dasgupta, P Bhattacharyya, S.
Getting the detail right Standardisation and SNOMED CT © Nottingham University Hospitals NHS Trust.
The Clinical Practice Research Datalink Methodological Challenges in using Routine Clinical Data Dr Alison Nightingale, University of Bath.
Intersecting roles CMS and FDA – implications for pharmaceutical and device industries Peter B. Bach, MD, MAPP Senior Adviser, Office of the Administrator.
Genetic Susceptibility Variations and Visual Field Progression in Singaporean Chinese Patients with Primary Angle Closure Glaucoma 1 Duke-National University.
EHR Coding and Reimbursement
Event-Level Narrative
Phenotyping youth depression
John Weeks1, MD Candidate 2017, Justin Hickman1, MD Candidate 2017
How do we improve the “sufficiency” of the FDA’s Sentinel System?
Critical Reading of Clinical Study Results
Roland C. Merchant, MD, MPH, ScD
Diagnosing Rheumatoid Arthritis Early
Association Between Reduced Plasma 25-Hydroxy Vitamin D and Increased Risk of Cancer in Patients With Inflammatory Bowel Diseases  Ashwin N. Ananthakrishnan,
Impact of Hepatitis C, HIV, or Both on Survival in Veterans in Care Before and After the Introduction of HAART (1996) SL Fultz, MD, MPH CH Chang, PhD AA.
Discovery From Data Repositories H Craig Mak  Nature Biotechnology 29, 46–47 (2011) 2013 /06 /10.
Regulatory Perspective of the Use of EHRs in RCTs
Genetic Basis of Autoantibody Positive and Negative Rheumatoid Arthritis Risk in a Multi-ethnic Cohort Derived from Electronic Health Records  Fina Kurreeman,
Presentation transcript:

Square wheels: electronic medical records for discovery research in rheumatoid arthritis Robert M. Plenge, M.D., Ph.D. October 30, 2009 NCRR sponsored "Using EHR Data for Discovery Research" HARVARD MEDICAL SCHOOL

Key questions What are the regulatory obstacles impacting your work? What are the resource needs required to replicate your work at other institutions? What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?

Key questions How can I implement your approach, and how much better is it?

genotype phenotype clinical care

genotype phenotype clinical care bottleneck

Raychaudhuri et al in press Nature Genetics October 2009: >30 RA risk loci PTPN “shared epitope” hypothesis HLA DR PADI4CTLA4 TNFAIP3 STAT4 TRAF1- C5 IL2-IL21 CD40 CCL21 CD244 IL2RB TNFRSF14 PRKCQ PIP4K2C IL2RA AFF3 Latest GWAS in 25,000 case-control samples with replication in 20,000 additional samples: >10 new loci 2009 REL BLK TAGAP CD28 TRAF6 PTPRC FCGR2A PRDM1 CD2-CD58 Together explain ~35% of the genetic burden of disease

genotype phenotype clinical care bottleneck

Genetic predictors of response to anti-TNF therapy in RA PTPRC/CD45 allele n=1,283 patients P= Submitted to Arth & Rheum

How can we collect DNA and detailed clinical data on >20,000 RA patients?

What are the options for collecting clinical data and DNA for genetic studies?

Options for clinical + DNA designClinical data DNASample size cost clinical trial +++ +$$$ registry $$ claims data +n/a+++$ EMR $

Narrative data = free-form written text –info about symptoms, medical history, medications, exam, impression/plan Codified data = structured format –age, demographics, and billing codes Content of EMRs EMRs are increasingly utilized!

Gabriel (1994) Arthritis and Rheumatism This is not a new idea… Sens: 89% PPV: 57% Sens: 89% PPV: 57%

Gabriel (1994) Arthritis and Rheumatism Conclusion: The sole reliance on such databases for the diagnosis of RA can result in substantial misdiagnosis. …but EMR data are “dirty”

Partners HealthCare: 4 million patients

Partners HealthCare: linked by EMR

Partners HealthCare: organized by i2b2

4 million patients 31,171 patients ICD9 RA and/or CCP checked (goal = high sensitivity) 3,585 RA patients Classification algorithm (goal = high PPV) Clinical subsets Discarded blood for DNA

Natural language processing (NLP) –disease terms (e.g., RA, lupus) –medications (e.g., methotrexate) –autoantibodies (e.g., CCP, RF) –radiographic erosions Codified data –ICD9 disease codes –prescription medications –laboratory autoantibodies Our library of RA phenotypes Qing Zeng Concept/termAccuracy of concept presence of erosion88% seropositive96% CCP positive98.7% RF positive99.3% etanercept100% methotrexate100%

Natural language processing (NLP) –disease terms (e.g., RA, lupus) –medications (e.g., methotrexate) –autoantibodies (e.g., CCP, RF) –radiographic erosions Codified data –ICD9 disease codes –prescription medications –laboratory autoantibodies Our library of RA phenotypes Shawn Murphy

‘Optimal’ algorithm to classify RA: NLP + codified data Regression model with a penalty parameter (to avoid over-fitting) Codified dataNLP data Tianxi Cai, Kat Liao

High PPV with adequate sensitivity ✪ 392 out of 400 (98%) had definite or possible RA!

This means more patients! ~25% more subjects with the complete algorithm: 3,585 subjects (3,334 with true RA) 3,046 subjects (2,680 with true RA)

4 million patients 31,171 patients ICD9 RA and/or CCP checked (goal = high sensitivity) 3,585 RA patients Classification algorithm (goal = high PPV) Discarded blood for DNA

Linking the Datamart-Crimson NLP data Codified data

Over 3,000 samples collected to date –cost = $10 per sample DNA extracted on >2,400 Buffy coats –cost = $20 per sample –>90% had ≥1 ug of DNA –>99% had ≥5 ug of DNA after WGA Status of i2b2 Crimson collection genotyping of 384 SNPs (RA risk alleles, AIMs, other) is ongoing at Broad Institute

Measured autoantibodies from plasma –5 autoantibodies in ~380 RA patients –~85% are CCP+, ~35% ANA+, ~15% TPO+ Question: are non-RA autoantibodies present at increased frequency in RA patients vs matched controls? stay tuned…more data soon! Status of i2b2 Crimson collection

Key questions How can I implement your approach, and how much better is it?

Key questions What are the regulatory obstacles impacting your work? What are the resource needs required to replicate your work at other institutions? What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?

Key questions What are the regulatory obstacles impacting your work? What are the resource needs required to replicate your work at other institutions? What are the priority short term "translational" questions in your fields that would represent the most rapid payoff on investment?

Regulatory obstacles IRB approval De-identified vs truly anonymous Open question: sharing of genetic data

Key questions What are the regulatory obstacles impacting your work? What are the resource needs required to replicate your work at other institutions? What are the priority short term "translational" questions in your fields that would represent the most rapid payoff on investment?

Resources required Building a research DataMart –clinical EMR ≠ research EMR –multiple FTE’s to build/maintain NLP expertise –open-source software available –iterative process for fine-tuning Clinical expertise –understand nature of clinical data

Resources required (cont.) Statistical expertise –simple algorithm is not sufficient –prepare for the unexpected! –true for narrative and codified Biospecimen collection, DNA extraction –varies by institution –Crimson –Broad Institute

Key questions What are the regulatory obstacles impacting your work? What are the resource needs required to replicate your work at other institutions? What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?

4 million patients 31,171 patients ICD9 RA and/or CCP checked (goal = high sensitivity) 3,585 RA patients Classification algorithm (goal = high PPV) Clinical subsets Discarded blood for DNA

Characteristicsi2b2 RACORRONA total number 3,5857,971 Mean age (SD) 57.5 (17.5)58.9 (13.4) Female (%) Anti-CCP(%) 63N/A RF (%) Erosions (%) MTX (%) Anti-TNF (%) Clinical features of patients CCP has an OR = 1.5 for predicting erosions

Subset patients in clinically meaningful ways: causes of mortality NLP+codified data, together with statistical modeling, to define cardiovascular disease

Non-responder to anti-TNF therapy NLP+codified data, together with statistical modeling, to define treatment response

Responder to anti-TNF therapy NLP+codified data, together with statistical modeling, to define treatment response

Post-marketing surveillance of adverse events NLP+codified data, together with statistical modeling, to define treatment response pharmacovigilance

Conclusions

Options for clinical + DNA designClinical data DNASample size cost clinical trial +++ +$$$ registry $$ claims data +n/a+++$ EMR $ Conclusion: NLP + codified data, together with appropriate statistical modeling, can yield accurate clinical data.

Options for clinical + DNA designClinical data DNASample size cost clinical trial +++ +$$$ registry $$ claims data +n/a+++$ EMR $ Conclusion: We can collect DNA and plasma in a high-throughput manner.

Options for clinical + DNA designClinical data DNASample size cost clinical trial +++ +$$$ registry $$ claims data +n/a+++$ EMR $ Conclusion: The cost is reasonable...even for >20,000 RA patients!

genotype phenotype clinical care

Acknowledgments Zak Kohane Susanne Churchill Vivian Gainer Kat Liao Tianxi Cai Shawn Murphy Qing Zing Soumya Raychaudhuri Beth Karlson Pete Szolovits Lee-Jen Wei Lynn Bry (Crimson) Sergey Goryachev Barbara Mawn & many others ! Namaste!

Narrative data (NLP text extractions) Codified data (ICD9 codes, etc)

Run specific queries

Visualize results in a timeline

Identifying RA patients in our i2b2 RA DataMart Signs and symptoms Diseases that mimick RA Medications specific to RA Notes (including whether seen by a rheumatologist) diagnostic codes for RA Shawn Murphy, Vivian Gainer, others

signs and symptoms c/w RA RA without other diseases Specific RA meds, including MTX Seen by rheumatology Many diagnostic codes for RA Identifying RA patients in our i2b2 RA DataMart

Probability of RA: all 31K subjects Probability of RA Frequency not RARA (n=3,585)

ROC curves for algorithms sensitivity 1 - specificity 97% specificity codified + NLP NLP only codified only

Other algorithms to classify RA NLP Only Codified only Portability!

Classification of RA cases (and not RA) Probability RA Not RA possibleYes RA threshold 0.29 ???

Diagnosis = Ankylosing Spondylitis (but many RA codes) A few signs and symptoms c/w RA NLP with few mentions of RA Specific meds Visits to BWH/MGH diagnostic codes for RA Probability RA = 0.78

Diagnosis = JRA (but many RA codes) signs and symptoms c/w RA NLP with “RA” and “JRA” Specific meds Visits to the RA Center at BWH Many diagnostic codes for RA

Probability RA = 0.33 Diagnosis not clear initially… signs and symptoms c/w RA NLP without much “RA”, few specific meds (MTX x 1) …and few diagnostic codes for RA, despite multiple LMR notes, including visits to the BWH Arthritis Center

Now the false negatives…

Diagnosed in 1992, little follow-up For some reason few RA diagnostic codes Probability RA = 0.11

Enbrel (etanercept) codified: 1,628 NLP: 3,796 overlap: 1,612 (99%) Note: review of 50 NLP occurrences shows that 38 out of 50 actively on Enbrel Medications: codified data vs. NLP