Presentation is loading. Please wait.

Presentation is loading. Please wait.

Identifying RA patients from the electronic medical records at Partners HealthCare Robert Plenge, M.D., Ph.D. VA Hospital July 20, 2010 HARVARD MEDICAL.

Similar presentations


Presentation on theme: "Identifying RA patients from the electronic medical records at Partners HealthCare Robert Plenge, M.D., Ph.D. VA Hospital July 20, 2010 HARVARD MEDICAL."— Presentation transcript:

1 Identifying RA patients from the electronic medical records at Partners HealthCare Robert Plenge, M.D., Ph.D. VA Hospital July 20, 2010 HARVARD MEDICAL SCHOOL

2 genotype phenotype clinical care

3 genotype phenotype clinical care bottleneck

4 July 2010: >30 RA risk loci 20031978198720052004 PTPN22 2008 “shared epitope” hypothesis HLA DR4 2007 PADI4CTLA4 TNFAIP 3 STAT4 TRAF1- C5 IL2-IL21 CD40 CCL21 CD244 IL2RB TNFRSF 14 PRKCQ PIP4K2C IL2RA AFF3 Latest GWAS in 25,000 case-control samples with replication in 20,000 additional samples 2009 REL BLK TAGAP CD28 TRAF6 PTPRC FCGR2A PRDM1 CD2- CD58 Together explain ~35% of the genetic burden of disease IL6ST SPRED2 5q21 RBPJ IRF5 CCR6 PXK 2010 (Q2)

5 genotype phenotype clinical care bottleneck

6 Genetic predictors of response to anti-TNF therapy in RA PTPRC/CD45 allele n=1,283 patients P=0.0001 Cui et al (2010) Arth & Rheum

7 How can we collect DNA and detailed clinical data on >20,000 RA patients?

8 What are the options for collecting clinical data and DNA for genetic studies?

9 Options for clinical + DNA designClinical data DNASample size cost clinical trial +++ +$$$ registry +++++++$$ claims data +n/a+++$ EMR +++++ $

10 Narrative data = free-form written text –info about symptoms, medical history, medications, exam, impression/plan Codified data = structured format –age, demographics, and billing codes Content of EMRs EMRs are increasingly utilized!

11 Gabriel (1994) Arthritis and Rheumatism This is not a new idea… Sens: 89% PPV: 57% Sens: 89% PPV: 57%

12 Gabriel (1994) Arthritis and Rheumatism Conclusion: The sole reliance on such databases for the diagnosis of RA can result in substantial misdiagnosis. …but EMR data are “dirty”

13 Partners HealthCare: 4 million patients

14 Partners HealthCare: linked by EMR

15 Partners HealthCare: organized by i2b2

16 4 million patients 31,171 patients ICD9 RA and/or CCP checked (goal = high sensitivity) 3,585 RA patients Classification algorithm (goal = high PPV)

17 Natural language processing (NLP) –disease terms (e.g., RA, lupus) –medications (e.g., methotrexate) –autoantibodies (e.g., CCP, RF) –radiographic erosions Codified data –ICD9 disease codes –prescription medications –laboratory autoantibodies Our library of RA phenotypes Qing Zeng Concept/termAccuracy of concept presence of erosion88% seropositive96% CCP positive98.7% RF positive99.3% etanercept100% methotrexate100% Guergana Savova

18 Natural language processing (NLP) –disease terms (e.g., RA, lupus) –medications (e.g., methotrexate) –autoantibodies (e.g., CCP, RF) –radiographic erosions Codified data –ICD9 disease codes –prescription medications –laboratory autoantibodies Our library of RA phenotypes Shawn Murphy

19 ‘Optimal’ algorithm to classify RA: NLP + codified data Regression model with a penalty parameter (to avoid over-fitting) Codified dataNLP data Tianxi Cai, Kat Liao

20 High PPV with adequate sensitivity ✪ 392 out of 400 (98%) had definite or possible RA!

21 This means more patients! ~25% more subjects with the complete algorithm: 3,585 subjects (3,334 with true RA) 3,046 subjects (2,680 with true RA)

22 Liao et (2010) Arth. Care Research Characteristicsi2b2 RACORRONA total number 3,5857,971 Mean age (SD) 57.5 (17.5)58.9 (13.4) Female (%) 79.974.5 Anti-CCP(%) 63N/A RF (%) 74.472.1 Erosions (%) 59.259.7 MTX (%) 59.552.8 Anti-TNF (%) 32.622.6 Clinical features of patients CCP has an OR = 1.5 for predicting erosions

23 4 million patients 31,171 patients ICD9 RA and/or CCP checked (goal = high sensitivity) 3,585 RA patients Classification algorithm (goal = high PPV) Discarded blood for DNA

24 Linking the Datamart-Crimson NLP data Codified data

25 OR similar in EMR cohort 1,500 RA multi-ethnic RA cases and 1,500 matched controls

26 Genetic risk score also similar

27 4 million patients 31,171 patients ICD9 RA and/or CCP checked (goal = high sensitivity) 3,585 RA patients Classification algorithm (goal = high PPV) Clinical subsets Discarded blood for DNA

28 Response to therapy

29 Non-responder to anti-TNF therapy NLP+codified data, together with statistical modeling, to define treatment response

30 Responder to anti-TNF therapy NLP+codified data, together with statistical modeling, to define treatment response

31 Responder to anti-TNF therapy 5-year NIH grant as part of the PharmacoGenomics Research Network (PGRN)

32 Conclusions

33 Options for clinical + DNA designClinical data DNASample size cost clinical trial +++ +$$$ registry +++++++$$ claims data +n/a+++$ EMR +++++ $ Conclusion: NLP + codified data, together with appropriate statistical modeling, can yield accurate clinical data.

34 Options for clinical + DNA designClinical data DNASample size cost clinical trial +++ +$$$ registry +++++++$$ claims data +n/a+++$ EMR +++++ $ Conclusion: Genetic studies in our EMR cohort yield effect sizes similar to traditional cohorts.

35 Options for clinical + DNA designClinical data DNASample size cost clinical trial +++ +$$$ registry +++++++$$ claims data +n/a+++$ EMR +++++ $ Conclusion: It should be possible to extend this same framework to classify response vs non-response to drugs used to treat RA.


Download ppt "Identifying RA patients from the electronic medical records at Partners HealthCare Robert Plenge, M.D., Ph.D. VA Hospital July 20, 2010 HARVARD MEDICAL."

Similar presentations


Ads by Google