Open Health Natural Language Processing Consortium (OHNLP)

Name: Open Health Natural Language Processing Consortium (OHNLP)
Uploaded: 2017-10-03T16:37:31+00:00
Duration: PTM9S31
Channel: Alexandrina Lyons
Description: Open Health Natural Language Processing Consortium (OHNLP)

Open Health Natural Language Processing Consortium (OHNLP)
Mayo Clinic: Guergana Savova, Ph.D. James Masanz IBM Watson Research: Anni Coden, Ph.D. Michael Tanenblatt

Overview OHNLP? Oh, NLP? Demo of a clinical OHNLP system (cTAKES)
Demo of a medical OHNLP system (MedKAT) with extensions to pathology (/P) How can I adapt the system to my data? Lively discussion: how can I get involved, OHNLP future steps…

Open Health Natural Language Processing Consortium
(part of caBIG Vocabulary Knowledge Center web presence) Goal Foster an open-source collaborative community around clinical NLP that can deliver best-of-breed annotators, leverage the dynamic features of UIMA flow-control, and establish the infrastructure for clinical NLP. Two open source releases as part of OHNLP Mayo’s pipeline for processing clinical notes (cTAKES) IBM’s pipeline for processing medical notes (MedKAT) and pathology reports (MedKAT/P)

Other non-OHNLP clinical NLP Systems
Proprietary medLEE (Columbia University) Topaz (University of Pittsburgh) Vanderbilt University caTIES (University of Pittsburgh) MPLUS/Onyx (University of Utah) VA Hospital system Open Source i2b2 HITEx (Health Information Text Extraction)

Clinical example: clinical Text Analysis and Knowledge Extraction System (cTAKES)
Presenters: Guergana Savova James Masanz

Overview cTAKES Commitment to both R and D in R&D
Developed at Mayo Clinic Goals: Phenotype extraction Generic – to be used for a variety of retrievals and use cases Expandable – at the information model level and methods Modular Cutting edge technologies – best methods combining existing practices and novel research with rapid technology transfer Best software practices (80M+ notes) Commitment to both R and D in R&D

cTAKES: Components Clinical narrative as a sublanguage Core components
Sentence boundary detection (OpenNLP technology) Tokenization (rule-based) Morphologic normalization (NLM’s LVG) POS tagging (OpenNLP technology) Shallow parsing (OpenNLP technology) Named Entity Recognition Dictionary mapping (lookup algorithm) Machine learning (MAWUI) Negation and context identification (NegEx)

Output Example: Disorder Object
“No evidence of unstable angina.” Disorder Text: unstable angina Associated code: SNOMED Named entity type: disease/disorder Status: current Negation: true

Methods Preliminary results:
Savova, Guergana; Kipper-Schuler, Karin; Buntrock, James and Chute, Christopher UIMA-based clinical information extraction system. LREC 2008: Towards enhanced interoperability for large HLT systems: UIMA for NLP. Manuscript with detailed system description and evaluation under review (JAMIA)

cTAKES demo

Medical example: Medical Knowledge Analysis System MedKAT and MedKAT/P
Presenters: Anni Coden Michael Tanenblatt

Overview MedKAT and MedKAT/P
Developed at IBM Goal: Identification of concepts and their attributes based on a standard or proprietary terminology/ontology /P adaptation to pathology reports – relation extraction Modular, Generic, Expandable Terminology, Conceptual Model Easy adaptation to specific corpus and conventions Integration into institutional system Ongoing commitment to Research and Development

Core Components Document structure
Syntactic tools (tokenization ... Shallow parsing) Concept identification Negation Relationship extraction Extracted data F-score Anatomic site 0.95 Histology 0.98 Size 1.00 Date Grade Gross Desc 0.80 Lymph Nodes 0.81 Primary Tumor 0.82 Metastatic Tumor 0.65

Document Structure 16

Output

Cancer Disease Knowledge Representation Model

Demos Query by Model / Cancer
Detailed view of annotations in Document Analyzer m/research_projects.nsf/pages/medic alinformatics.index.html

Adaptation Presenters: Anni Coden Michael Tanenblatt

Adaptation Sentence breaks Text case Part of speech tags
Shallow parser Dictionary lookup Document structure

Sentence Breaks

Sentence Breaks Some solutions: Use annotator to re-break sentences
Retrain tagger

Case/Part of Speech Tags

Case/Part of Speech Tags
Some solutions: Retrain tagger Use UIMA annotator to create a “true case” view

Part of Speech Tags

Part of Speech Tags Some solutions: Retrain tagger
Use dictionary lookup to modify incorrect tags Create rule-based annotator to modify incorrect tags

Shallow Parser

Shallow Parser 31

Shallow Parser 32

Dictionary Lookup Dictionary entries can be added, changed, deleted
Dictionary entry attributes can be added, changed, deleted Search parameters can be modified Post processing filters Tokenization of text and dictionary should be the same

Document Structure Plain text or XML (e.g., CDA)
Processes specific document section types (e.g., diagnosis) Detection of formatting (e.g. bullets) Detection of relations between sections Making implicit conventions explicit (e.g. meaning of title)

Discussion: Future of OHNLP.ORG
Provided seed annotators and tools Goal: growing community Annotators, tools Methodologies Gold standards Common type system for plug-and- play What are the hurdles?

Hands-on Customization

MedKAT Dictionary adaptation Concept identification parameters
Document structure detection

cTAKES Negation window Lookup window Dictionary modifications

Questions?

Open Health Natural Language Processing Consortium (OHNLP)

Similar presentations

Presentation on theme: "Open Health Natural Language Processing Consortium (OHNLP)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Open Health Natural Language Processing Consortium (OHNLP)

Similar presentations

Presentation on theme: "Open Health Natural Language Processing Consortium (OHNLP)"— Presentation transcript:

Similar presentations

About project

Feedback