Download presentation
Presentation is loading. Please wait.
Published byAshlee Knight Modified over 9 years ago
1
Square wheels: electronic medical records for discovery research in rheumatoid arthritis Robert M. Plenge, M.D., Ph.D. October 30, 2009 NCRR sponsored "Using EHR Data for Discovery Research" HARVARD MEDICAL SCHOOL
2
Key questions What are the regulatory obstacles impacting your work? What are the resource needs required to replicate your work at other institutions? What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?
3
Key questions How can I implement your approach, and how much better is it?
5
genotype phenotype clinical care
6
genotype phenotype clinical care bottleneck
7
Raychaudhuri et al in press Nature Genetics October 2009: >30 RA risk loci 20031978198720052004 PTPN22 2008 “shared epitope” hypothesis HLA DR4 2007 PADI4CTLA4 TNFAIP3 STAT4 TRAF1- C5 IL2-IL21 CD40 CCL21 CD244 IL2RB TNFRSF14 PRKCQ PIP4K2C IL2RA AFF3 Latest GWAS in 25,000 case-control samples with replication in 20,000 additional samples: >10 new loci 2009 REL BLK TAGAP CD28 TRAF6 PTPRC FCGR2A PRDM1 CD2-CD58 Together explain ~35% of the genetic burden of disease
8
genotype phenotype clinical care bottleneck
9
Genetic predictors of response to anti-TNF therapy in RA PTPRC/CD45 allele n=1,283 patients P=0.0001 Submitted to Arth & Rheum
10
How can we collect DNA and detailed clinical data on >20,000 RA patients?
11
What are the options for collecting clinical data and DNA for genetic studies?
12
Options for clinical + DNA designClinical data DNASample size cost clinical trial +++ +$$$ registry +++++++$$ claims data +n/a+++$ EMR +++++ $
13
Narrative data = free-form written text –info about symptoms, medical history, medications, exam, impression/plan Codified data = structured format –age, demographics, and billing codes Content of EMRs EMRs are increasingly utilized!
14
Gabriel (1994) Arthritis and Rheumatism This is not a new idea… Sens: 89% PPV: 57% Sens: 89% PPV: 57%
15
Gabriel (1994) Arthritis and Rheumatism Conclusion: The sole reliance on such databases for the diagnosis of RA can result in substantial misdiagnosis. …but EMR data are “dirty”
16
Partners HealthCare: 4 million patients
17
Partners HealthCare: linked by EMR
18
Partners HealthCare: organized by i2b2
19
4 million patients 31,171 patients ICD9 RA and/or CCP checked (goal = high sensitivity) 3,585 RA patients Classification algorithm (goal = high PPV) Clinical subsets Discarded blood for DNA
20
Natural language processing (NLP) –disease terms (e.g., RA, lupus) –medications (e.g., methotrexate) –autoantibodies (e.g., CCP, RF) –radiographic erosions Codified data –ICD9 disease codes –prescription medications –laboratory autoantibodies Our library of RA phenotypes Qing Zeng Concept/termAccuracy of concept presence of erosion88% seropositive96% CCP positive98.7% RF positive99.3% etanercept100% methotrexate100%
21
Natural language processing (NLP) –disease terms (e.g., RA, lupus) –medications (e.g., methotrexate) –autoantibodies (e.g., CCP, RF) –radiographic erosions Codified data –ICD9 disease codes –prescription medications –laboratory autoantibodies Our library of RA phenotypes Shawn Murphy
22
‘Optimal’ algorithm to classify RA: NLP + codified data Regression model with a penalty parameter (to avoid over-fitting) Codified dataNLP data Tianxi Cai, Kat Liao
23
High PPV with adequate sensitivity ✪ 392 out of 400 (98%) had definite or possible RA!
24
This means more patients! ~25% more subjects with the complete algorithm: 3,585 subjects (3,334 with true RA) 3,046 subjects (2,680 with true RA)
25
4 million patients 31,171 patients ICD9 RA and/or CCP checked (goal = high sensitivity) 3,585 RA patients Classification algorithm (goal = high PPV) Discarded blood for DNA
26
Linking the Datamart-Crimson NLP data Codified data
27
Over 3,000 samples collected to date –cost = $10 per sample DNA extracted on >2,400 Buffy coats –cost = $20 per sample –>90% had ≥1 ug of DNA –>99% had ≥5 ug of DNA after WGA Status of i2b2 Crimson collection genotyping of 384 SNPs (RA risk alleles, AIMs, other) is ongoing at Broad Institute
28
Measured autoantibodies from plasma –5 autoantibodies in ~380 RA patients –~85% are CCP+, ~35% ANA+, ~15% TPO+ Question: are non-RA autoantibodies present at increased frequency in RA patients vs matched controls? stay tuned…more data soon! Status of i2b2 Crimson collection
29
Key questions How can I implement your approach, and how much better is it?
30
Key questions What are the regulatory obstacles impacting your work? What are the resource needs required to replicate your work at other institutions? What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?
31
Key questions What are the regulatory obstacles impacting your work? What are the resource needs required to replicate your work at other institutions? What are the priority short term "translational" questions in your fields that would represent the most rapid payoff on investment?
32
Regulatory obstacles IRB approval De-identified vs truly anonymous Open question: sharing of genetic data
33
Key questions What are the regulatory obstacles impacting your work? What are the resource needs required to replicate your work at other institutions? What are the priority short term "translational" questions in your fields that would represent the most rapid payoff on investment?
34
Resources required Building a research DataMart –clinical EMR ≠ research EMR –multiple FTE’s to build/maintain NLP expertise –open-source software available –iterative process for fine-tuning Clinical expertise –understand nature of clinical data
35
Resources required (cont.) Statistical expertise –simple algorithm is not sufficient –prepare for the unexpected! –true for narrative and codified Biospecimen collection, DNA extraction –varies by institution –Crimson –Broad Institute
36
Key questions What are the regulatory obstacles impacting your work? What are the resource needs required to replicate your work at other institutions? What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?
37
4 million patients 31,171 patients ICD9 RA and/or CCP checked (goal = high sensitivity) 3,585 RA patients Classification algorithm (goal = high PPV) Clinical subsets Discarded blood for DNA
38
Characteristicsi2b2 RACORRONA total number 3,5857,971 Mean age (SD) 57.5 (17.5)58.9 (13.4) Female (%) 79.974.5 Anti-CCP(%) 63N/A RF (%) 74.472.1 Erosions (%) 59.259.7 MTX (%) 59.552.8 Anti-TNF (%) 32.622.6 Clinical features of patients CCP has an OR = 1.5 for predicting erosions
39
Subset patients in clinically meaningful ways: causes of mortality NLP+codified data, together with statistical modeling, to define cardiovascular disease
40
Non-responder to anti-TNF therapy NLP+codified data, together with statistical modeling, to define treatment response
41
Responder to anti-TNF therapy NLP+codified data, together with statistical modeling, to define treatment response
42
Post-marketing surveillance of adverse events NLP+codified data, together with statistical modeling, to define treatment response pharmacovigilance
43
Conclusions
44
Options for clinical + DNA designClinical data DNASample size cost clinical trial +++ +$$$ registry +++++++$$ claims data +n/a+++$ EMR +++++ $ Conclusion: NLP + codified data, together with appropriate statistical modeling, can yield accurate clinical data.
45
Options for clinical + DNA designClinical data DNASample size cost clinical trial +++ +$$$ registry +++++++$$ claims data +n/a+++$ EMR +++++ $ Conclusion: We can collect DNA and plasma in a high-throughput manner.
46
Options for clinical + DNA designClinical data DNASample size cost clinical trial +++ +$$$ registry +++++++$$ claims data +n/a+++$ EMR +++++ $ Conclusion: The cost is reasonable...even for >20,000 RA patients!
47
genotype phenotype clinical care
48
Acknowledgments Zak Kohane Susanne Churchill Vivian Gainer Kat Liao Tianxi Cai Shawn Murphy Qing Zing Soumya Raychaudhuri Beth Karlson Pete Szolovits Lee-Jen Wei Lynn Bry (Crimson) Sergey Goryachev Barbara Mawn & many others ! Namaste!
50
Narrative data (NLP text extractions) Codified data (ICD9 codes, etc)
51
Run specific queries
52
Visualize results in a timeline
53
Identifying RA patients in our i2b2 RA DataMart 19932008 Signs and symptoms Diseases that mimick RA Medications specific to RA Notes (including whether seen by a rheumatologist) diagnostic codes for RA Shawn Murphy, Vivian Gainer, others
54
signs and symptoms c/w RA RA without other diseases Specific RA meds, including MTX Seen by rheumatology Many diagnostic codes for RA 19932008 Identifying RA patients in our i2b2 RA DataMart
55
Probability of RA: all 31K subjects Probability of RA Frequency not RARA (n=3,585)
56
ROC curves for algorithms sensitivity 1 - specificity 97% specificity codified + NLP NLP only codified only
57
Other algorithms to classify RA NLP Only Codified only Portability!
58
Classification of RA cases (and not RA) 1.00 0.80 0.60 0.40 0.20 0.00 Probability RA Not RA possibleYes RA threshold 0.29 ???
59
Diagnosis = Ankylosing Spondylitis (but many RA codes) A few signs and symptoms c/w RA NLP with few mentions of RA Specific meds Visits to BWH/MGH diagnostic codes for RA Probability RA = 0.78
60
Diagnosis = JRA (but many RA codes) signs and symptoms c/w RA NLP with “RA” and “JRA” Specific meds Visits to the RA Center at BWH Many diagnostic codes for RA
61
Probability RA = 0.33 Diagnosis not clear initially… signs and symptoms c/w RA NLP without much “RA”, few specific meds (MTX x 1) …and few diagnostic codes for RA, despite multiple LMR notes, including visits to the BWH Arthritis Center
62
Now the false negatives…
63
Diagnosed in 1992, little follow-up For some reason few RA diagnostic codes Probability RA = 0.11
64
Enbrel (etanercept) codified: 1,628 NLP: 3,796 overlap: 1,612 (99%) Note: review of 50 NLP occurrences shows that 38 out of 50 actively on Enbrel Medications: codified data vs. NLP
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.