Open Health Natural Language Processing Consortium (OHNLP)

Slides:

Advertisements

Similar presentations

University of Sheffield NLP Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve Gorrell.

Advertisements

Consistent and standardized common model to support large-scale vocabulary use and adoption Robust, scalable, and common API to reduce variation in clinical.

NLP Highlights GS Savova And team. Medication CEM template associatedCode Change_status Conditional Dosage Duration End_date Form Frequency Generic Negation_indicator.

Discovering Severity and Body Site Modifiers Dmitriy Dligach, Ph.D. Boston Children’s Hospital and Harvard Medical School.

A general-purpose text annotation tool called Knowtator is presented. Knowtator facilitates the manual creation of annotated corpora that can be used for.

SRDC Ltd. 1. Problem  Solutions  Various standardization efforts ◦ Document models addressing a broad range of requirements vs Industry Specific Document.

CS4025: Advanced Information Extraction. Overview CS4025, Department of Computing Science, University of Aberdeen 2 Overview of aspects of IE and General.

Faculty of Computer Science © 2006 CMPUT 605March 3, 2008 Concept-Based Electronic Health Records: Opportunities and Challenges S. Ebadollahi, S Chang,

Overview of Biomedical Informatics Rakesh Nagarajan.

Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.

EleMAP: An Online Tool for Harmonizing Data Elements using Standardized Metadata Registries and Biomedical Vocabularies Jyotishman Pathak, PhD 1 Janey.

Project topics Projects are due till the end of May Choose one of these topics or think of something else you’d like to code and send me the details (so.

LYU 0102 : XML for Interoperable Digital Video Library Recent years, rapid increase in the usage of multimedia information, Recent years, rapid increase.

Mayo LexWiki: A Prototype of Collaborative Platform for Terminology/Ontology Content Development Guoqian Jiang, Ph.D. Division of Biomedical Informatics,

Methodology Conceptual Database Design

Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.

Korea Terminology Research Center for Language and Knowledge Engineering Infrastructures in Korea and for the Korean Language Key-Sun Choi.

SHARPn Data Normalization November 18, Data-driven Healthcare Big Data Knowledge Research Practice Analytics Domain Pragmatics Experts.

Extraction of Adverse Drug Effects from Clinical Records E. ARAMAKI* Ph.D., Y. MIURA **, M. TONOIKE ** Ph.D., T. OHKUMA ** Ph.D., H. MASHUICHI ** Ph.D.,K.WAKI.

AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.

Initial Prototype for Clinical Data Normalization and High Throughput Phenotyping SHARPn F2F June 30,2011.

1 Betsy L. Humphreys, MLS Betsy L. Humphreys, MLS National Library of Medicine National Library of Medicine National Institutes of Health National Institutes.

Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.

Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use Dr. Friedman on-site visit, Mayo Clinic 3 September 2010.

CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.

LexEVS 6.0 Overview Scott Bauer Mayo Clinic Rochester, Minnesota February 2011.

Final Review 31 October WP2: Named Entity Recognition and Classification Claire Grover University of Edinburgh.

December 15, 2011 Use of Semantic Adapter in caCIS Architecture.

Survey of Semantic Annotation Platforms

De-identifying Pathology Reports for Pathology Informatics

Information Extraction From Medical Records by Alexander Barsky.

Methodology - Conceptual Database Design Transparencies

GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)

A Case Study of ICD-11 Anatomy Value Set Extraction from SNOMED CT Guoqian Jiang, PhD ©2011 MFMER | slide-1 Division of Biomedical Statistics & Informatics,

Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.

Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.

© Copyright 2008 STI INNSBRUCK NLP Interchange Format José M. García.

Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK

Value Set Resolution: Build generalizable data normalization pipeline using LexEVS infrastructure resources Explore UIMA framework for implementing semantic.

CTAKES The clinical Text Analysis and Knowledge Extraction System.

Acknowledgements Contact Information Objective An automated annotation tool was developed to assist human annotators in the efficient production of a high.

Introduction to GATE Developer Ian Roberts. University of Sheffield NLP Overview The GATE component model (CREOLE) Documents, annotations and corpora.

De-identification: A Critical Success Factor in Clinical and Population Research Steven Merahn MD Dee Lang, RHIT Prepared for 2007 APIII Pittsburgh, PA.

1 Guy Divita Qing Zeng-Treitler Salt Lake City VA, University of Utah School of Medicine Pragmatic Interoperability.

Natural language processing tools Lê Đức Trọng 1.

IBM Research © Copyright IBM Corporation 2005 | A Development Environment for Configurable Meta-Annotators in a Pipelined NLP Architecture Youssef Drissi,

Clinical Data Normalization Dr. Chute Aims: Build generalizable data normalization pipeline Semantic normalization annotators involving LexEVS Establish.

Mayo cTAKES: UIMA Type System

Combining GATE and UIMA Ian Roberts. University of Sheffield NLP 2 Overview Introduction to UIMA Comparison with GATE Mapping annotations between GATE.

Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.

MedKAT Medical Knowledge Analysis Tool December 2009.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Clinical Language Annotation, Modeling, and Processing Toolkit (CLAMP)

Toward an Open Source Textual Entailment Platform (Excitement Project) Bernardo Magnini (on behalf of the Excitement consortium) 1 STS workshop, NYC March.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Object-Oriented Parsing and Transformation Kenneth Baclawski Northeastern University Scott A. DeLoach Air Force Institute of Technology Mieczyslaw Kokar.

Open Health Natural Language Processing Consortium

For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.

An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.

Best-of-Breed Hybrid Methods for Text De-identification Yang H, Garibaldi JM. Automatic detection of protected health information from clinical narratives.

Extracting CHF information from clinical text using CLAMP Hua Xu, PhD pSCANNER

SNOMED CT and Surgical Pathology

UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)

SNOMED CT and Surgical Pathology

Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin

Health Natural Language Processing Center

cTAKES: Demo Clinical Text Analysis and Knowledge Extraction System

Towards comprehensive syntactic and semantic annotations of the clinical narrative Daniel Albright, Arrick Lanfranchi, Anwen Fredriksen, William F Styler.

SNOMED-CT representation Radiologic report Admission Letter

Extending OMOP CDM to Support Observational Cancer Research

Presentation transcript:

Open Health Natural Language Processing Consortium (OHNLP) Mayo Clinic: Guergana Savova, Ph.D. James Masanz clinicalnlp@mayo.edu IBM Watson Research: Anni Coden, Ph.D. Michael Tanenblatt mednlp@us.ibm.com

Overview OHNLP? Oh, NLP? Demo of a clinical OHNLP system (cTAKES) Demo of a medical OHNLP system (MedKAT) with extensions to pathology (/P) How can I adapt the system to my data? Lively discussion: how can I get involved, OHNLP future steps…

Open Health Natural Language Processing Consortium www.ohnlp.org (part of caBIG Vocabulary Knowledge Center web presence) Goal Foster an open-source collaborative community around clinical NLP that can deliver best-of-breed annotators, leverage the dynamic features of UIMA flow-control, and establish the infrastructure for clinical NLP. Two open source releases as part of OHNLP Mayo’s pipeline for processing clinical notes (cTAKES) IBM’s pipeline for processing medical notes (MedKAT) and pathology reports (MedKAT/P)

Other non-OHNLP clinical NLP Systems Proprietary medLEE (Columbia University) Topaz (University of Pittsburgh) Vanderbilt University caTIES (University of Pittsburgh) MPLUS/Onyx (University of Utah) VA Hospital system Open Source i2b2 HITEx (Health Information Text Extraction)

Clinical example: clinical Text Analysis and Knowledge Extraction System (cTAKES) Presenters: Guergana Savova James Masanz

Overview cTAKES Commitment to both R and D in R&D Developed at Mayo Clinic Goals: Phenotype extraction Generic – to be used for a variety of retrievals and use cases Expandable – at the information model level and methods Modular Cutting edge technologies – best methods combining existing practices and novel research with rapid technology transfer Best software practices (80M+ notes) Commitment to both R and D in R&D

cTAKES: Components Clinical narrative as a sublanguage Core components Sentence boundary detection (OpenNLP technology) Tokenization (rule-based) Morphologic normalization (NLM’s LVG) POS tagging (OpenNLP technology) Shallow parsing (OpenNLP technology) Named Entity Recognition Dictionary mapping (lookup algorithm) Machine learning (MAWUI) Negation and context identification (NegEx)

Output Example: Disorder Object “No evidence of unstable angina.” Disorder Text: unstable angina Associated code: SNOMED 4557003 Named entity type: disease/disorder Status: current Negation: true

Methods Preliminary results: Savova, Guergana; Kipper-Schuler, Karin; Buntrock, James and Chute, Christopher. 2008. UIMA-based clinical information extraction system. LREC 2008: Towards enhanced interoperability for large HLT systems: UIMA for NLP. Manuscript with detailed system description and evaluation under review (JAMIA)

cTAKES demo

Medical example: Medical Knowledge Analysis System MedKAT and MedKAT/P Presenters: Anni Coden Michael Tanenblatt

Overview MedKAT and MedKAT/P Developed at IBM Goal: Identification of concepts and their attributes based on a standard or proprietary terminology/ontology /P adaptation to pathology reports – relation extraction Modular, Generic, Expandable Terminology, Conceptual Model Easy adaptation to specific corpus and conventions Integration into institutional system Ongoing commitment to Research and Development

Core Components Document structure Syntactic tools (tokenization ... Shallow parsing) Concept identification Negation Relationship extraction Extracted data F-score Anatomic site 0.95 Histology 0.98 Size 1.00 Date Grade Gross Desc 0.80 Lymph Nodes 0.81 Primary Tumor 0.82 Metastatic Tumor 0.65

Document Structure 16

Document Structure 17

Document Structure 18

Output

Cancer Disease Knowledge Representation Model

Demos Query by Model / Cancer Detailed view of annotations in Document Analyzer http://domino.research.ibm.com/com m/research_projects.nsf/pages/medic alinformatics.index.html

Adaptation Presenters: Anni Coden Michael Tanenblatt

Adaptation Sentence breaks Text case Part of speech tags Shallow parser Dictionary lookup Document structure

Sentence Breaks

Sentence Breaks Some solutions: Use annotator to re-break sentences Retrain tagger

Case/Part of Speech Tags

Case/Part of Speech Tags Some solutions: Retrain tagger Use UIMA annotator to create a “true case” view

Part of Speech Tags

Part of Speech Tags Some solutions: Retrain tagger Use dictionary lookup to modify incorrect tags Create rule-based annotator to modify incorrect tags

Shallow Parser

Shallow Parser 31

Shallow Parser 32

Dictionary Lookup Dictionary entries can be added, changed, deleted Dictionary entry attributes can be added, changed, deleted Search parameters can be modified Post processing filters Tokenization of text and dictionary should be the same

Document Structure Plain text or XML (e.g., CDA) Processes specific document section types (e.g., diagnosis) Detection of formatting (e.g. bullets) Detection of relations between sections Making implicit conventions explicit (e.g. meaning of title)

Discussion: Future of OHNLP.ORG Provided seed annotators and tools Goal: growing community Annotators, tools Methodologies Gold standards Common type system for plug-and- play What are the hurdles?

Hands-on Customization

MedKAT Dictionary adaptation Concept identification parameters Document structure detection

cTAKES Negation window Lookup window Dictionary modifications

Questions?