Download presentation
Presentation is loading. Please wait.
Published byJeffry Cooper Modified over 9 years ago
1
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development of NLP Systems MedInfo 2010 Congress September 11, 2010
2
2 Detailed clinical information is often locked in free-text clinical note documents. Applying Natural language Processing (NLP) methods to clinical free-text allows more detailed document level review Clinical information can be used for DSS, Q/A, Research, Performance improvement, Surveillance etc… NLP applied to Clinical Documents Annotation is a central task for evaluation of NLP systems used for Information Retrieval (IR) or Information Extraction (IE) tasks.
3
3 Why and When to Annotate? Annotation as a Central Task: Manually annotated corpora focus and clarify NLP system requirements. Establish reference standard(s) to train and evaluate NLP tools applied to clinical texts for various tasks. Provide data for NLP systems development (supervised learning) o Extraction rules may be created automatically or by hand. o Statistical models of text documents built by machine learning algorithms.
4
4 What to annotate and at what level? Meta Information: specific document types, annotator, institution, clinic, authors, document type, author Document level: Sections, headers, paragraphs, templates, or other document level assessments Lexical: semantic categories of words. Syntax: structures combined to produce sentences: o Words combine in well-defined structures POS (syntactic parse trees), grammatical level. Semantics: meaning/interpretation combined o Individual word sense – combined to form meaningful sentences
5
5 What to annotate and at what level? Pragmatics: relies on clinical inference/context affects the interpretation of meaning o Domain, report section, location within the section of a report, or other implicit information. Discourse: links between annotated instances, or across sentences o Previous information affects the interpretation of the current information. o Includes referents (pronouns, definite or bridging clauses, time of events, coherence of sentences) World knowledge: facts about the world at large and/or common sense These levels may differ depending on the specific use case, the clinical question, and goals of application for NLP.
6
6 Semantic annotation Concepts (“markables”): types of information defined by the annotated instance level. o Use case dependent o Focus on noun phrases only? o Focus on specific semantic types (diagnoses, findings, treatments, procedures, etc…)? Modifiers (“attributes”): information features o Negation, experiencer, temporality, certainty, change over time, severity, numeric values, anatomic locations, note section, modifiers, information quality, etc…?
7
7 Annotation guideline: Defines what qualifies as a “markable” for a given use case, how annotated instances should be identified, and how/what particular attributes are associated with annotated instances. - In other words… the rules of the game so to speak defining what information will be used to train and evaluate the performance of the NLP system. Annotation schema: Provides a logical representation of the annotation guideline. Some jargon
8
8 Common annotation tasks Task 1Task 2 Task 3
9
9 What is measured at the task level? Estimate of reliability (task consistency): IAA = matches/(matches+nonmatches) o Partial (spans of annotated instances overlap) o Exact (spans of annotated instances match exactly) Measurement of Validity (task accuracy): Recall = TP/TP+FN, Precision TP/TP+FP, F-measure = [(1+β 2 )(PR)] / (β 2 P + R) These metrics will be discussed in more depth in Part 2
10
10 Who should do annotation tasks? Who: depends on use case, and annotation goals o For some use cases may need many annotators Level of domain expertise (physicians, nurses, nurse practitioners, pharmacists, physician assistants, coders…and yes even graduate students). o Depends on the level of clinical inference required.
11
11 A commonly used approach
12
12 The annotation task? Use Case: Focus on extracting as many explicitly mentioned diagnoses as possible from a collection of 75 discharge summaries selected from one of the i2b2 Challenge tasks. Goals: Illustrate level of difficulty involved with annotation tasks. Demonstrate use of annotation guideline and schema to develop a reference standard. Demonstrate calculation of evaluation metrics in terms of task consistency and accuracy (i.e. IAA, precision, recall F-measure).
13
13 Workshop annotation task The good news (things we built for you): o We don’t expect you to infer clinical diagnoses (no discourse or linking of concepts across sentences). o We have already developed an annotation guideline and schema for this task. o Diagnoses are loosely based on semantic types from the UMLS:
14
14 Workshop annotation task The bad news (or the challenge) : o One of the attributes we will identify is negation status. -e.g “No evidence of peripheral arterial disease”. o This task does have a certain level of difficulty, but will be a good demonstration of reference standard and practical application of NLP.
15
15 Protégé/Knowtator What tool will be used? o For annotation tasks we will use the Knowtator plugin written for the Protégé knowledge representation system. Knowtator facilitates annotation and adjudication tasks. o A final reference standard has been created and will be available to participants. Concepts (“markables”) are called “classes”. Modifiers (“attributes”) are called “slots”.
16
16 Hands-on component: Install Protégé 3.3.1, Knowtator 1.9 available from: Protégé: http://protege.cim3.net/download/old-releases/3.3.1/basic Knowtator: http://knowtator.sourceforge.nethttp://protege.cim3.net/download/old-releases/3.3.1/basichttp://knowtator.sourceforge.net Review the annotation guideline and try using the Knowtator schema. –annotate the first 5 documents. Don’t Panic! – Ask for help from any of the instructors, this is a hands-on exercise.
17
17 For more information: Brett.South@hsc.utah.edu Shuying.Shen@hsc.utah.edu Scott.Duvall@hsc.utah.edu Stephane.Meystre@hsc.utah.edu TA: Chris.Leng@utah.edu Thank you for your attention!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.