Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.

Slides:



Advertisements
Similar presentations
Exploring the Effectiveness of Lexical Ontologies for Modeling Temporal Relations with Markov Logic Eun Y. Ha, Alok Baikadi, Carlyle Licata, Bradford Mott,
Advertisements

Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.
Recognizing Textual Entailment Challenge PASCAL Suleiman BaniHani.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
Automatic Discovery of Technology Trends from Patent Text Youngho Kim, Yingshi Tian, Yoonjae Jeong, Ryu Jihee, Sung-Hyon Myaeng School of Engineering Information.
Information Extraction CS 652 Information Extraction and Integration.
Information Extraction from Clinical Reports Wendy W. Chapman, PhD University of Pittsburgh Department of Biomedical Informatics.
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Introduction to Machine Learning Approach Lecture 5.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Introduction.  Classification based on function role in classroom instruction  Placement assessment: administered at the beginning of instruction 
Using Rhetorical Grammar in the English 90 Classroom.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Information Retrieval – and projects we have done. Group Members: Aditya Tiwari ( ) Harshit Mittal ( ) Rohit Kumar Saraf ( ) Vinay.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA 2 Department of Information Studies, University at Albany SUNY, Albany,
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Survey of Semantic Annotation Platforms
 Knowledge Acquisition  Machine Learning. The transfer and transformation of potential problem solving expertise from some knowledge source to a program.
Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : Jorge Villalon and Rafael A. Calvo 2011, EST Concept Maps as Cognitive Visualizations.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Academic Needs of L2/Bilingual Learners
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
CTAKES The clinical Text Analysis and Knowledge Extraction System.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
DEPARTMENT OF EDUCATION AND TRAINING Assessment using the Australian Curriculum 2012.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Lexico-semantic Patterns for Information Extraction from Text The International Conference on Operations Research 2013 (OR 2013) Frederik Hogenboom
MedKAT Medical Knowledge Analysis Tool December 2009.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Approach to building ontologies A high-level view Chris Wroe.
Open Health Natural Language Processing Consortium
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NATURAL LANGUAGE PROCESSING
Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.
WP4 Models and Contents Quality Assessment
Dr Anie Attan 26 April 2017 Language Academy UTMJB
Approaches to Machine Translation
A knowledge-based text annotation tool
Social Knowledge Mining
Statistical NLP: Lecture 9
Automatic Detection of Causal Relations for Question Answering
Approaches to Machine Translation
By Hossein Hematialam and Wlodek Zadrozny Presented by
Information Retrieval
Extracting Why Text Segment from Web Based on Grammar-gram
Statistical NLP : Lecture 9 Word Sense Disambiguation
Statistical NLP: Lecture 10
Presentation transcript:

Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development of NLP Systems MedInfo 2010 Congress September 11, 2010

2 Detailed clinical information is often locked in free-text clinical note documents. Applying Natural language Processing (NLP) methods to clinical free-text allows more detailed document level review Clinical information can be used for DSS, Q/A, Research, Performance improvement, Surveillance etc… NLP applied to Clinical Documents Annotation is a central task for evaluation of NLP systems used for Information Retrieval (IR) or Information Extraction (IE) tasks.

3 Why and When to Annotate? Annotation as a Central Task: Manually annotated corpora focus and clarify NLP system requirements. Establish reference standard(s) to train and evaluate NLP tools applied to clinical texts for various tasks. Provide data for NLP systems development (supervised learning) o Extraction rules may be created automatically or by hand. o Statistical models of text documents built by machine learning algorithms.

4 What to annotate and at what level? Meta Information: specific document types, annotator, institution, clinic, authors, document type, author Document level: Sections, headers, paragraphs, templates, or other document level assessments Lexical: semantic categories of words. Syntax: structures combined to produce sentences: o Words combine in well-defined structures POS (syntactic parse trees), grammatical level. Semantics: meaning/interpretation combined o Individual word sense – combined to form meaningful sentences

5 What to annotate and at what level? Pragmatics: relies on clinical inference/context affects the interpretation of meaning o Domain, report section, location within the section of a report, or other implicit information. Discourse: links between annotated instances, or across sentences o Previous information affects the interpretation of the current information. o Includes referents (pronouns, definite or bridging clauses, time of events, coherence of sentences) World knowledge: facts about the world at large and/or common sense These levels may differ depending on the specific use case, the clinical question, and goals of application for NLP.

6 Semantic annotation Concepts (“markables”): types of information defined by the annotated instance level. o Use case dependent o Focus on noun phrases only? o Focus on specific semantic types (diagnoses, findings, treatments, procedures, etc…)? Modifiers (“attributes”): information features o Negation, experiencer, temporality, certainty, change over time, severity, numeric values, anatomic locations, note section, modifiers, information quality, etc…?

7 Annotation guideline: Defines what qualifies as a “markable” for a given use case, how annotated instances should be identified, and how/what particular attributes are associated with annotated instances. - In other words… the rules of the game so to speak defining what information will be used to train and evaluate the performance of the NLP system. Annotation schema: Provides a logical representation of the annotation guideline. Some jargon

8 Common annotation tasks Task 1Task 2 Task 3

9 What is measured at the task level? Estimate of reliability (task consistency): IAA = matches/(matches+nonmatches) o Partial (spans of annotated instances overlap) o Exact (spans of annotated instances match exactly) Measurement of Validity (task accuracy): Recall = TP/TP+FN, Precision TP/TP+FP, F-measure = [(1+β 2 )(PR)] / (β 2 P + R) These metrics will be discussed in more depth in Part 2

10 Who should do annotation tasks? Who: depends on use case, and annotation goals o For some use cases may need many annotators Level of domain expertise (physicians, nurses, nurse practitioners, pharmacists, physician assistants, coders…and yes even graduate students). o Depends on the level of clinical inference required.

11 A commonly used approach

12 The annotation task? Use Case: Focus on extracting as many explicitly mentioned diagnoses as possible from a collection of 75 discharge summaries selected from one of the i2b2 Challenge tasks. Goals: Illustrate level of difficulty involved with annotation tasks. Demonstrate use of annotation guideline and schema to develop a reference standard. Demonstrate calculation of evaluation metrics in terms of task consistency and accuracy (i.e. IAA, precision, recall F-measure).

13 Workshop annotation task The good news (things we built for you): o We don’t expect you to infer clinical diagnoses (no discourse or linking of concepts across sentences). o We have already developed an annotation guideline and schema for this task. o Diagnoses are loosely based on semantic types from the UMLS:

14 Workshop annotation task The bad news (or the challenge) : o One of the attributes we will identify is negation status. -e.g “No evidence of peripheral arterial disease”. o This task does have a certain level of difficulty, but will be a good demonstration of reference standard and practical application of NLP.

15 Protégé/Knowtator What tool will be used? o For annotation tasks we will use the Knowtator plugin written for the Protégé knowledge representation system. Knowtator facilitates annotation and adjudication tasks. o A final reference standard has been created and will be available to participants. Concepts (“markables”) are called “classes”. Modifiers (“attributes”) are called “slots”.

16 Hands-on component: Install Protégé 3.3.1, Knowtator 1.9 available from: Protégé: Knowtator: Review the annotation guideline and try using the Knowtator schema. –annotate the first 5 documents. Don’t Panic! – Ask for help from any of the instructors, this is a hands-on exercise.

17 For more information: TA: Thank you for your attention!