Presentation is loading. Please wait.

Presentation is loading. Please wait.

Elizabeth Karlson, MD Associate Professor of Medicine

Similar presentations


Presentation on theme: "Elizabeth Karlson, MD Associate Professor of Medicine"— Presentation transcript:

1 i2b2 Rheumatoid Arthritis DBP Defining RA in the electronic health record for future studies
Elizabeth Karlson, MD Associate Professor of Medicine Harvard Medical School Brigham and Women’s Hospital

2 Background: Partners Resources
i2b2: “Informatics for Integrating Biology and the Bedside” RPDR: “Research Patient Data Repository” Natural Language Processing (HiTEX) Gold standard dataset: Training set: 500 manual chart reviews Validation set: 400 manual chart reviews

3 ICD-9 codes for related phenotypes
Coded data ICD-9 codes for RA ICD-9 codes for related phenotypes Lupus (SLE), psoriatic arthritis (PsA), juvenile inflammatory arthritis (JIA) Lab results for RA related antibodies Rheumatoid factor (RF), anti-CCP Medications physician entry, escripts

4 Different colors depending on yes/no/mentioned
Click on tick mark to take to note or mention in record 4

5 NLP Concepts NLP queries Rheumatoid arthritis RA-related antibodies
Anti-CCP/RF/seropositive Result coded as positive/negative RA Medications Coded as any mention Radiographs: RA erosions Coded as any erosion 5

6 Approach to develop RA cohort
Classification algorithm Step 1: Develop gold standard training set Step 2: Identify variables important for predicting RA Step 3: Develop algorithm

7 Chart review results RA Mart, N=32,000
ICD9 = 714.xxx OR CCP test ordered Manual chart review for 500 patients 20% validation rate definite RA=100 possible/no RA= 400

8 Comparison of NLP to manual chart review
Precision of NLP queries Methotrexate 100% Etanercept 100% CCP % Seropositive 96% Erosion 88%

9 Approach to develop RA cohort
Classification algorithm Step 2: Define variables (Vivian Gainer, Sergey Goryachev, Qing Zeng-Treitler, Shawn Murphy) Codified data ICD9 billing codes Electronic medication prescription CCP, RF lab results Narrative data extracted using natural language processing (NLP), i.e. from physician notes, radiology reports Erosions RF positive, CCP positive, seropositive RA medications

10 Approach to develop RA cohort
Classification algorithm Step 3: Develop algorithm (Tianxi Cai) Penalized logistic regression with adaptive LASSO Parsimonious predictors selected based on BIC We then applied the classification algorithm to the RA Mart 10

11 Narrative + Codified 3585 94 63 reference Codified only 3046 88 51 6
Model RA PPV (%) Sensitivity (%) Difference in PPV Algorithms Narrative + Codified 3585 94 63 reference Codified only 3046 88 51 6 NLP only 3341 89 56 5 Published administrative codified criteria ≥ 3 ICD9 RA 7960 80 38 ≥1 ICD9RA + med 7799 45 66 49

12 Top 5 predictive variables for RA
Standardized regression coefficient Standard error NLP rheumatoid arthritis 1.11 0.48 NLP seropositive 0.74 0.26 ICD9 RA normalized 0.71 0.23 ICD9 RA 0.66 0.44 NLP erosions 0.46 0.29 An interesting aspect of our algorithm is that 3 out of the 5 most predictive variables for RA were derived using NLP. 12

13 Liao, et al., Arthritis Care & Research 2010
i2b2 RA cohort Characteristics I2b2 RA, n=3,585 CORRONA*, n=7,971 Age, mean (SD) 57.5 (17.5) 58.9 (13.4) Women (%) 79.9 74.5 Anti-CCP+ (%) 63 N/A RF+ (%) 74.4 72.1 Erosions (%) 59.2 52.8 MTX use (%) 59.5 TNFi use (%) 32.6 22.6 Although we were happy with the performance of our classification algorithm, we also wanted to know that these RA subjects clinically had RA. CORRONA- a more traditional cohort created by patient recruitment *Consortium of Rheumatology Researchers of North America Liao, et al., Arthritis Care & Research 2010 13

14 i2b2 Virtual RA Cohort Studies
Case-control cohort ~4,000 RA cases ~13,000 matched non-RA controls Age, gender, race and health care utilization Samples collected from 1500 cases/1500 controls for genotyping Genetic risk score predicts RA with same magnitude as in GWAS (Kurreman, 2010) CAD outcomes in RA cases being validated in i2b2 Pharmacogenetics Research Network (PGRN) I’m going to start with what we have. 14

15 Selected codified data from RPDR
i2b2 RA Project: Selected codified data from RPDR Performed NLP queries for RA features Developed algorithm based on: coded + NLP data Liao, 2010

16 PGRN Methods: Select codified data from RPDR (meds) Perform NLP queries for RA disease activity features Develop algorithm (s) based on: Meds + NLP data

17 17

18 PGRN Specific Aims Aim 1: Define RA disease activity level in the EMR
Aim 2: Develop an algorithm to predict RA disease activity from EMR data Aim 3: Define temporal relations between RA medications and disease activity to define to define treatment response in RA

19 Background In RA, disease activity score (DAS28) is considered the gold standard tool to evaluate disease activity and response to treatment in clinical practice DAS28 has 2 components: Disease activity level Change in disease activity level

20 Disease activity level scored as low, moderate, high
Disease activity change scored as low, moderate, high Van Gestel AM et al. Arthritis Rheum 1996; 39: 34-40

21 Research Methods Construct a virtual cohort of RA patients (N=5906)
Review charts for disease activity (document level) Remission Low Moderate  Remission/Low vs. High/Moderate High Indeterminant Annotate charts for disease activity features (Knowtator) Disease_disorder Symptoms (reported pain, stiffness, swelling) Signs (objective tenderness, limited range of motion, synovitis) Anatomic site (relations with signs and symptoms) RA medication signature RA labs, level of inflammation (CRP, ESR) Patient functioning (activities of daily living)

22

23 NLP Methods Move from keyword matching in i2b2 to ontology mapping in PGRN Customize cTAKES for RA medications RA anatomic sites Find relations between entities Define new modules RA medication changes (start/stop) Reasons to stop medications Lab values Patient functioning status

24

25 NLP Analytic Approaches
1- Internal gold standard datasets N=200 BWH annotated notes N= 200 MGH annotated notes 2- Analyses Study whether MD summary (1-3 sentences) predicts disease activity SVM: construct vectors based on features and relations to predict disease activity Bag of concepts to predict disease activity  2- External gold standard datasets: DAS28 scores from standardized tool at MGH matched to clinical note DAS28 scores from BRASS matched to clinical note

26 Future work Define temporal relations between anti-TNF medication use (eg. new starts) and pre and post start disease activity to define response to therapy Construct disease activity timeline (patient level) Construct medication timeline (patient level)

27 Use NLP to define temporal sequence of medication start and adverse event

28 Questions?

29


Download ppt "Elizabeth Karlson, MD Associate Professor of Medicine"

Similar presentations


Ads by Google