Presentation is loading. Please wait.

Presentation is loading. Please wait.

Medication Information Extraction

Similar presentations


Presentation on theme: "Medication Information Extraction"— Presentation transcript:

1 Medication Information Extraction
General review of the third i2b2 Workshop on Natural Language Processing Challenges for Clinical Records Dongfang Xu School of Information

2 Outline I2b2 Workshop Introduction Medication information task
Overview of Medication Challenge Data and Materials Systems Evaluation and Analysis Methods Results and Discussion Conclusion

3 I2B2 Workshop Released datasets regularly Call for participants
A workshop to Enhance the NLP tools to acquire fine grained information from clinical records. Released datasets regularly Call for participants De-identification challenge, Smoking challenge, Obesity Challenge, Medication Challenge, Relations Challenge, Heart Disease risks Challenge 2016 challenge:  De-identificationa over ~1000 psychiatric evaluation records; RDoC classification: determine symptom severity in a domain for a patient; non-specific tasks related with mental health. See:

4 Medication Task Extract the following information(called field) on
Medication experienced by the patient from discharge summary: Medications (m): names, brand names, generics, and collective names of prescription substances, over the counter medications, and other biological substances Dosages (do): indicating the amount of a medication Modes (mo): indicating the route for administering the medication Frequencies (f): indicating how often each dose of the medication should be taken. Durations (du): indicating how long the medication is to be administered. Reasons (r): stating the medical reason for which the medication is given. List/narrative (ln): indicating whether the medication information appears in a list structure or in narrative running text in the discharge summary.

5 Medication Task The text corresponding to each field was specified by its line and token offsets in the discharge summary so that repeated mentions of a medication could be distinguished from each other. The values for the set of fields related to a medication mention, if presented within a two-line window of the mention, were linked in order to create what we defined as an ‘entry’. If the value of a field for a mention were not specified within a two-line window, then the value ‘nm’ for ‘not mentioned’ was entered and the offsets were left unspecified.

6 Outline I2b2 Workshop Introduction Medication information task
Overview of Medication Challenge Data and Materials Systems Evaluation and Analysis Methods Results and Discussion Conclusion

7 Data & Materials 1243 Discharge Summaries
Training Data Test Data Annotated by expert 17 ----- Annotated by community 251 (based on system outputs) Without annotation Total 696 547 20 teams participated in this medication challenge.

8 Outline I2b2 Workshop Introduction Medication information task
Overview of Medication Challenge Data and Materials Systems Evaluation and Analysis Methods Results and Discussion Conclusion

9 Systems External resources (marked as “Yes” or “No”)
These20 teams were classified along three dimensions: External resources (marked as “Yes” or “No”) System that used proprietary systems, data, and resources that were not available to other teams; four were declared to have utilized external resources. Medical expert involvement(marked as “Yes” or “No”) Five were declared to have benefitted from medical experts. Methods (marked as “rule based”, “supervised” , “hybrid”) 10 were described by their authors as rule-based, four as supervised, and six as hybrids.

10 Outline I2b2 Workshop Introduction Medication information task
Overview of Medication Challenge Data and Materials Systems Evaluation and Analysis Methods Results and Discussion Conclusion

11 Methods Two sets of evaluation matrics.
Horizontal matrics; Vertical matrics. Precision, recall and F1 score at phrase and token level. Phrase level: Complete text of field values. Token level: delimited by spaces and punctuation.

12 Methods To ran the significance test on each two system outputs:
Approximate randomization was used for testing significance. Get the difference(f) of the horizontal phrase-level F-measures of two system outputs A &B. 2. Let j be the number of entries in A, and let k be the number of entries in B, and a combined outputs C from A and B. 3. For iterations n=1000: Randomly select j entries without resampling from C as new A*, and let the rest be B*, recalculate the horizontal phrase-level F-measures for both A* and B*, get the difference f*, and count how many times there are positive differences between f* and f, f-f*, named as k. 4. Get the p value , p=k/n

13 Outline I2b2 Workshop Introduction Medication information task
Overview of Medication Challenge Data and Materials Systems Evaluation and Analysis Methods Results and Discussion Conclusion

14 Systems Introduction 1. These teams applied text filtering to eliminate the content that was not related to the medications of the patient. 2. Built vocabularies from publicly available knowledge sources, enriched these vocabularies with examples from the training data and the annotation guidelines, and bootstrapped examples from unlabeled i2b2 discharge summaries as well as the web.

15 The top 10 teams with best performing submissions
Rank Group external resources, medical experts) Methods Notes 1 Usyd N, Y hybrid Combined CRFs with SVMs and rules. 2 Vanderbit Y, Y Rule-based MedEx system for tagging, Context free gra 3 Manchester N, N 4 NLM MetaMap for marking reasons 5 BME-Humboldt GNU software for RE, Unstructured Infor Manag Architecture (UIMA) as their base 6 OpenU Genia Tagger for pos tagging 7 Uparis Ogmios platform for linguisitic stuff 8 LIMSI 9 UofUtah Compiled a knowledge base, Open NLP, MetaMa, UMLS 10 UWisconsin CRFs and rule based for Medi, Adabosst for paring

16 The top 10 teams with best performing submissions
List/narrative: indicating whether the medication information appears in a list structure or in narrative running text in the discharge summary.

17 The top 10 teams with best performing submissions
University of Wisconsin-Milwaukee’s system is statistically indiscernible from all but two systems, including one of the top three. See red box. In terms of the phrase-level horizontal F-measures, the only systems to perform significantly differently from all systems that scored below them came from the University of Sydney and the University of Manchester. See green and blue box.

18 Vertical matrices Evaluation on fields
Expert annotated charge summaries and community annotated charge summaries against final community ground truth (gold standard) using Macro-averaged F-measure. From community annotation experiment paper.

19 Vertical matrices Evaluation on fields
Expert annotated charge summaries against final community ground truth (gold standard) using Micro-averaged F-measure. From community annotation experiment paper.

20 Outline I2b2 Workshop Introduction Medication information task
Overview of Medication Challenge Data and Materials Systems Evaluation and Analysis Methods Results and Discussion Conclusion

21 Conclusion The Third i2b2 Workshop on Natural Language Processing Challenges for Clinical Records attracted 20 international teams and tackled a complex set of information extraction problems. The state-of-the-art NLP systems perform well in extracting medication names, dosages, modes, and frequencies. Detecting duration and the reason for medication events remains a challenge.

22 Reference Uzuner, Ö., Solti, I., & Cadag, E. (2010). Extracting medication information from clinical text. Journal of the American Medical Informatics Association,17(5), Uzuner, Ö., Solti, I., Xia, F., & Cadag, E. (2010). Community annotation experiment for ground truth generation for the i2b2 medication challenge.Journal of the American Medical Informatics Association, 17(5),

23 Thank you!


Download ppt "Medication Information Extraction"

Similar presentations


Ads by Google