Towards comprehensive syntactic and semantic annotations of the clinical narrative Daniel Albright, Arrick Lanfranchi, Anwen Fredriksen, William F Styler.

Slides:



Advertisements
Similar presentations
NLP Highlights GS Savova And team. Medication CEM template associatedCode Change_status Conditional Dosage Duration End_date Form Frequency Generic Negation_indicator.
Advertisements

Discovering Severity and Body Site Modifiers Dmitriy Dligach, Ph.D. Boston Children’s Hospital and Harvard Medical School.
Layering Semantics (Putting meaning into trees) Treebank Workshop Martha Palmer April 26, 2007.
Multilinugual PennTools that capture parses and predicate-argument structures, and their use in Applications Martha Palmer, Aravind Joshi, Mitch Marcus,
E XTRACTING SEMANTIC ROLE INFORMATION FROM UNSTRUCTURED TEXTS Diana Trandab ă 1 and Alexandru Trandab ă 2 1 Faculty of Computer Science, University “Al.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
Semantic Role Labeling Abdul-Lateef Yussiff
Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,
Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey.
Semantic Role Chunking Combining Complementary Syntactic Views Sameer Pradhan, Kadri Hacioglu, Wayne Ward, James H. Martin, Daniel Jurafsky  Center for.
Introduction to treebanks Session 1: 7/08/
1 NSF-ULA Sense tagging and Eventive Nouns Martha Palmer, Miriam Eckert, Jena D. Hwang, Susan Windisch Brown, Dmitriy Dligach, Jinho Choi, Nianwen Xue.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
1 CS546: Machine Learning and Natural Language Preparation to the Term Project: - Dependency Parsing - Dependency Representation for Semantic Role Labeling.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Session II: Scientific Publishing and Semantic Web W3C Semantic Web for Life Sciences Workshop October 27, 2004 Moderator: Alan R. Aronson.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Open Health Natural Language Processing Consortium (OHNLP)
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.
MinorThird 서울시립대학교 인공지능연구실 곽별샘
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
CTAKES The clinical Text Analysis and Knowledge Extraction System.
AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,
Knowledge-Based Semantic Interpretation for Summarizing Biomedical Text Thomas C. Rindflesch, Ph.D. Marcelo Fiszman, M.D., Ph.D. Halil Kilicoglu, M.S.
Approximating a Deep-Syntactic Metric for MT Evaluation and Tuning Matouš Macháček, Ondřej Bojar; {machacek, Charles University.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
1 Guy Divita Qing Zeng-Treitler Salt Lake City VA, University of Utah School of Medicine Pragmatic Interoperability.
1 Multi-Perspective Question Answering Using the OpQA Corpus (HLT/EMNLP 2005) Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University.
AQUAINT Phase II Six Month Workshop – October 2004 Fusing Rich Information Extracted from Multiple Media and Languages to Generate Contextualized, Complex.
Clinical Data Normalization Dr. Chute Aims: Build generalizable data normalization pipeline Semantic normalization annotators involving LexEVS Establish.
MedKAT Medical Knowledge Analysis Tool December 2009.
Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
NLP. Introduction to NLP Last week, Min broke the window with a hammer. The window was broken with a hammer by Min last week With a hammer, Min broke.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Multilinugual PennTools that capture parses and predicate-argument structures, for use in Applications Martha Palmer, Aravind Joshi, Mitch Marcus, Mark.
Open Health Natural Language Processing Consortium
Overview of Statistical NLP IR Group Meeting March 7, 2006.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Extracting CHF information from clinical text using CLAMP Hua Xu, PhD pSCANNER
Language Identification and Part-of-Speech Tagging
Leonardo Zilio Supervisors: Prof. Dr. Maria José Bocorny Finatto
English Proposition Bank: Status Report
Medication Information Extraction
CRF &SVM in Medication Extraction
David Mareček and Zdeněk Žabokrtský
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
CSC 594 Topics in AI – Natural Language Processing
Improving a Pipeline Architecture for Shallow Discourse Parsing
Giuseppe Attardi Dipartimento di Informatica Università di Pisa
Health Natural Language Processing Center
cTAKES: Demo Clinical Text Analysis and Knowledge Extraction System
Text Analytics Giuseppe Attardi Università di Pisa
Using UMLS CUIs for WSD in the Biomedical Domain
Donna M. Gates Carnegie Mellon University
Statistical NLP: Lecture 9
Topics in Linguistics ENG 331
Automatic Extraction of BI-RADS Features from Cross-Institution and Cross-Language Free-Text Mammography Reports Houssam Nassif, Terrie Kitchner, Filipe.
SNOMED-CT representation Radiologic report Admission Letter
DSM-IV delirium prevalence, including the estimated delirium prevalence in the 31 patients that did not undergo delirium assessment following initial screening.
Computational Linguistics: New Vistas
Sadov M. A. , NRU HSE, Moscow, Russia Kutuzov A. B
CS224N Section 3: Corpora, etc.
Hierarchical, Perceptron-like Learning for OBIE
By Hossein Hematialam and Wlodek Zadrozny Presented by
Progress report on Semantic Role Labeling
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Towards comprehensive syntactic and semantic annotations of the clinical narrative Daniel Albright, Arrick Lanfranchi, Anwen Fredriksen, William F Styler IV, Colin Warner, Jena D Hwang, Jinho D Choi, Dmitriy Dligach, Rodney D Nielsen, James Martin, Wayne Ward, Martha Palmer, Guergana K Savova Albright D, Lanfranchi A, Fredriksen A, et al. JAMIA Dec 2012 doi:10.1136/amiajnl-2012-001317

Three projects Corpus: clinical narrative text, anonymized from Mayo Clinic Pathology reports (colon cancer related) Mayo Clinic CN – randomly selected Treebank PropBank Create a Gold Standard – Text annotated for UMLS entities manually Gold Standard is used to train and evaluate algorithms 3. UMLS – Unified Medical Language System

UMLS

Annotation Statistics Named Entity Types Corpus Statistics Total Sentences 13091 Tokens 127606 Predicate Lemmas PropBank 1772 Named Entity 15 semantic Groups 1 semantic Type Person semantic category (non-UMLS) 28539 Semantic Class Proportion Count Procedures 15.71% 4483 Concepts & ideas 15.10% 4308 Disorders 14.74% 4208 Anatomy 12.80% 3652 Sign or Symptom 12.46% 3556 Chemicals and drugs 7.49% 2137 All Other 21.7% 130,000 tokens ~ = 300 notes

Annotation Statistics Named Entity Types Corpus Statistics Total Sentences 13091 Tokens 127606 Predicate Lemmas PropBank 1772 Named Entity 15 semantic Groups 1 semantic Type Person semantic category (non-UMLS) 28539 Semantic Class Proportion Count Procedures 15.71% 4483 Concepts & ideas 15.10% 4308 Disorders 14.74% 4208 Anatomy 12.80% 3652 Sign or Symptom 12.46% 3556 Chemicals and drugs 7.49% 2137 All Other 21.7% 130,000 tokens ~ = 300 notes

IAA Results Average IAA Double Annotation Size Treebank 0.926 8% PropBank, exact 0.891 100%? PropBank, Core-arg 0.917 PropBank, Constituent 0.931 UMLS, exact 0.697 74% UMLS, partial 0.750

Costs Project Cost Startup % Treebank $100,000 70% PropBank $40,000 <50% UMLS $50,000 – 60,000 33%

Tools Built on Annotations (and incorporated into cTAKES) POS tagger Constituency parser Dependency parser Semantic role labeler

Tools Built on Annotations (and incorporated into cTAKES) Best result of MiPACQ training model POS tagger 94.28 Dependency Parser -Labeled Attachment 83.63 -Unlabeled Attach. 85.72 Semantic Role Labeler -Identification 86.58 -Ident. + classification 77.72