Medication Information Extraction


Similar presentations
Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji, and.

Supporting clinical professionals in the decision-making for patients with chronic diseases Mitja Luštrek 1, Božidara Cvetković 1, Maurizio Bordone 2,
HOO 2012: A Report on the Preposition and Determiner Error Correction Shared Task Robert Dale, Ilya Anisimoff and George Narroway Centre for Language Technology.
A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Original slides by Iman Sen Edited by Ralph Grishman.
1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Presented by Zeehasham Rasheed
Mining the Medical Literature Chirag Bhatt October 14 th, 2004.
Extraction of Adverse Drug Effects from Clinical Records E. ARAMAKI* Ph.D., Y. MIURA **, M. TONOIKE ** Ph.D., T. OHKUMA ** Ph.D., H. MASHUICHI ** Ph.D.,K.WAKI.
Thien Anh Dinh1, Tomi Silander1, Bolan Su1, Tianxia Gong
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Introduction to Microsoft Access Danielle Zammit B.Pharm. (Hons.), M.S.(Pharm.)
Towards Improving Classification of Real World Biomedical Articles Kostas Fragos TEI of Athens Christos Skourlas TEI of Athens
De-identifying Pathology Reports for Pathology Informatics
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Describing Images using Inferred Visual Dependency Representations Authors : Desmond Elliot & Arjen P. de Vries Presentation of Paper by : Jantre Sanket.
I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Lexico-semantic Patterns for Information Extraction from Text The International Conference on Operations Research 2013 (OR 2013) Frederik Hogenboom
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Clinical Language Annotation, Modeling, and Processing Toolkit (CLAMP)
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Consumer Health Question Answering Systems Rohit Chandra Sourabh Singh
Writing is Essential: Overview for Student Success Presented by Angela McClary-Rush WCSD, ELA Coordinator.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia-Molina Department of Computer Science Stanford University SIGIR 2008 Presentation.
Best-of-Breed Hybrid Methods for Text De-identification Yang H, Garibaldi JM. Automatic detection of protected health information from clinical narratives.
Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding.
Extracting CHF information from clinical text using CLAMP Hua Xu, PhD pSCANNER
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
TDM in the Life Sciences Application to Drug Repositioning *
Language Identification and Part-of-Speech Tagging
Constructing a Predictor to Identify Drug and Adverse Event Pairs
Queensland University of Technology
Showcasing work by Jonnageddala, Liaw, Ray, Kumar, Chang, and Dai on
How to forecast solar flares?
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Guillaume-Alexandre Bilodeau
CRF &SVM in Medication Extraction
DM-Group Meeting Liangzhe Chen, Nov
cTAKES: Demo Clinical Text Analysis and Knowledge Extraction System
EHR System Function and Information Model (EHR-S FIM is based on EHR-S FM R2.0) CP.1.3 Manage Medication List aka DC in EHR-S FM
Ying He Wuhan University of Technology Twitter: #AMIA2017
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Social Knowledge Mining
Extracting Semantic Concept Relations
On-going research on Object Detection *Some modification after seminar
Introduction Task: extracting relational facts from text
Towards a Personal Briefing Assistant
iSRD Spam Review Detection with Imbalanced Data Distributions
A Framework for Benchmarking Entity-Annotation Systems
Sadov M. A. , NRU HSE, Moscow, Russia Kutuzov A. B
Text Mining & Natural Language Processing
Guided Research: Intelligent Contextual Task Support for Mails
Leveraging Textual Specifications for Grammar-based Fuzzing of Network Protocols Samuel Jero, Maria Leonor Pacheco, Dan Goldwasser, Cristina Nita-Rotaru.
Using Uneven Margins SVM and Perceptron for IE
University of Illinois System in HOO Text Correction Shared Task
By Hossein Hematialam and Wlodek Zadrozny Presented by
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Medication Information Extraction -------General review of the third i2b2 Workshop on Natural Language Processing Challenges for Clinical Records Dongfang Xu School of Information

Outline I2b2 Workshop Introduction Medication information task Overview of Medication Challenge Data and Materials Systems Evaluation and Analysis Methods Results and Discussion Conclusion

I2B2 Workshop Released datasets regularly Call for participants A workshop to Enhance the NLP tools to acquire fine grained information from clinical records. Released datasets regularly Call for participants De-identification challenge, Smoking challenge, Obesity Challenge, Medication Challenge, Relations Challenge, Heart Disease risks Challenge 2016 challenge:  De-identificationa over ~1000 psychiatric evaluation records; RDoC classification: determine symptom severity in a domain for a patient; non-specific tasks related with mental health. See:

Medication Task Extract the following information(called field) on Medication experienced by the patient from discharge summary: Medications (m): names, brand names, generics, and collective names of prescription substances, over the counter medications, and other biological substances Dosages (do): indicating the amount of a medication Modes (mo): indicating the route for administering the medication Frequencies (f): indicating how often each dose of the medication should be taken. Durations (du): indicating how long the medication is to be administered. Reasons (r): stating the medical reason for which the medication is given. List/narrative (ln): indicating whether the medication information appears in a list structure or in narrative running text in the discharge summary.

Medication Task The text corresponding to each field was specified by its line and token offsets in the discharge summary so that repeated mentions of a medication could be distinguished from each other. The values for the set of fields related to a medication mention, if presented within a two-line window of the mention, were linked in order to create what we defined as an ‘entry’. If the value of a field for a mention were not specified within a two-line window, then the value ‘nm’ for ‘not mentioned’ was entered and the offsets were left unspecified.

Outline I2b2 Workshop Introduction Medication information task Overview of Medication Challenge Data and Materials Systems Evaluation and Analysis Methods Results and Discussion Conclusion

Data & Materials 1243 Discharge Summaries Training Data Test Data Annotated by expert 17 ----- Annotated by community 251 (based on system outputs) Without annotation Total 696 547 20 teams participated in this medication challenge.

Outline I2b2 Workshop Introduction Medication information task Overview of Medication Challenge Data and Materials Systems Evaluation and Analysis Methods Results and Discussion Conclusion

Systems External resources (marked as “Yes” or “No”) These20 teams were classified along three dimensions: External resources (marked as “Yes” or “No”) System that used proprietary systems, data, and resources that were not available to other teams; four were declared to have utilized external resources. Medical expert involvement(marked as “Yes” or “No”) Five were declared to have benefitted from medical experts. Methods (marked as “rule based”, “supervised” , “hybrid”) 10 were described by their authors as rule-based, four as supervised, and six as hybrids.

Outline I2b2 Workshop Introduction Medication information task Overview of Medication Challenge Data and Materials Systems Evaluation and Analysis Methods Results and Discussion Conclusion

Methods Two sets of evaluation matrics. Horizontal matrics; Vertical matrics. Precision, recall and F1 score at phrase and token level. Phrase level: Complete text of field values. Token level: delimited by spaces and punctuation.

Methods To ran the significance test on each two system outputs: Approximate randomization was used for testing significance. Get the difference(f) of the horizontal phrase-level F-measures of two system outputs A &B. 2. Let j be the number of entries in A, and let k be the number of entries in B, and a combined outputs C from A and B. 3. For iterations n=1000: Randomly select j entries without resampling from C as new A*, and let the rest be B*, recalculate the horizontal phrase-level F-measures for both A* and B*, get the difference f*, and count how many times there are positive differences between f* and f, f-f*, named as k. 4. Get the p value , p=k/n

Outline I2b2 Workshop Introduction Medication information task Overview of Medication Challenge Data and Materials Systems Evaluation and Analysis Methods Results and Discussion Conclusion

Systems Introduction 1. These teams applied text filtering to eliminate the content that was not related to the medications of the patient. 2. Built vocabularies from publicly available knowledge sources, enriched these vocabularies with examples from the training data and the annotation guidelines, and bootstrapped examples from unlabeled i2b2 discharge summaries as well as the web.

The top 10 teams with best performing submissions Rank Group external resources, medical experts) Methods Notes 1 Usyd N, Y hybrid Combined CRFs with SVMs and rules. 2 Vanderbit Y, Y Rule-based MedEx system for tagging, Context free gra 3 Manchester N, N 4 NLM MetaMap for marking reasons 5 BME-Humboldt GNU software for RE, Unstructured Infor Manag Architecture (UIMA) as their base 6 OpenU Genia Tagger for pos tagging 7 Uparis Ogmios platform for linguisitic stuff 8 LIMSI 9 UofUtah Compiled a knowledge base, Open NLP, MetaMa, UMLS 10 UWisconsin CRFs and rule based for Medi, Adabosst for paring

The top 10 teams with best performing submissions List/narrative: indicating whether the medication information appears in a list structure or in narrative running text in the discharge summary.

The top 10 teams with best performing submissions University of Wisconsin-Milwaukee’s system is statistically indiscernible from all but two systems, including one of the top three. See red box. In terms of the phrase-level horizontal F-measures, the only systems to perform significantly differently from all systems that scored below them came from the University of Sydney and the University of Manchester. See green and blue box.

Vertical matrices Evaluation on fields Expert annotated charge summaries and community annotated charge summaries against final community ground truth (gold standard) using Macro-averaged F-measure. From community annotation experiment paper.

Vertical matrices Evaluation on fields Expert annotated charge summaries against final community ground truth (gold standard) using Micro-averaged F-measure. From community annotation experiment paper.

Outline I2b2 Workshop Introduction Medication information task Overview of Medication Challenge Data and Materials Systems Evaluation and Analysis Methods Results and Discussion Conclusion

Conclusion The Third i2b2 Workshop on Natural Language Processing Challenges for Clinical Records attracted 20 international teams and tackled a complex set of information extraction problems. The state-of-the-art NLP systems perform well in extracting medication names, dosages, modes, and frequencies. Detecting duration and the reason for medication events remains a challenge.

Reference Uzuner, Ö., Solti, I., & Cadag, E. (2010). Extracting medication information from clinical text. Journal of the American Medical Informatics Association,17(5), 514-518. Uzuner, Ö., Solti, I., Xia, F., & Cadag, E. (2010). Community annotation experiment for ground truth generation for the i2b2 medication challenge.Journal of the American Medical Informatics Association, 17(5), 519-523.

Thank you!