CTAKES The clinical Text Analysis and Knowledge Extraction System.

Slides:



Advertisements
Similar presentations
An Introduction to GATE
Advertisements

University of Sheffield NLP Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve Gorrell.
University of Sheffield NLP Module 11: Advanced Machine Learning.
Consistent and standardized common model to support large-scale vocabulary use and adoption Robust, scalable, and common API to reduce variation in clinical.
NLP Highlights GS Savova And team. Medication CEM template associatedCode Change_status Conditional Dosage Duration End_date Form Frequency Generic Negation_indicator.
Discovering Severity and Body Site Modifiers Dmitriy Dligach, Ph.D. Boston Children’s Hospital and Harvard Medical School.
Data Normalization Milestones. Data Normalization  Goals –To conduct the science for realizing semantic interoperability and integration of diverse data.
ClearTK: A Framework for Statistical Biomedical Natural Language Processing Philip Ogren Philipp Wetzler Department of Computer Science University of Colorado.
Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.
A general-purpose text annotation tool called Knowtator is presented. Knowtator facilitates the manual creation of annotated corpora that can be used for.
©2013 MFMER | slide-1 Building A Knowledge Base of Severe Adverse Drug Events Based On AERS Reporting Data Using Semantic Web Technologies Guoqian Jiang,
© 2012 The MITRE Corporation. All rights reserved. Approved for Public Release: Identifying Negation/Uncertainty Attributes for SHARPn NLP Presentation.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Information Retrieval in Practice
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Project topics Projects are due till the end of May Choose one of these topics or think of something else you’d like to code and send me the details (so.
Information Extraction from Clinical Reports Wendy W. Chapman, PhD University of Pittsburgh Department of Biomedical Informatics.
UIMA Introduction SHARPn Summit June 11, 2012
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
Initial Prototype for Clinical Data Normalization and High Throughput Phenotyping SHARPn F2F June 30,2011.
1 Betsy L. Humphreys, MLS Betsy L. Humphreys, MLS National Library of Medicine National Library of Medicine National Institutes of Health National Institutes.
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 1 Quick Tutorial – Part 1 Using Oracle BPM with Open Data Web Services David Webber.
OpenAlea An OpenSource platform for plant modeling C. Pradal, S. Dufour-Kowalski, F. Boudon, C. Fournier, C. Godin.
Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use Dr. Friedman on-site visit, Mayo Clinic 3 September 2010.
CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
A Case Study of ICD-11 Anatomy Value Set Extraction from SNOMED CT Guoqian Jiang, PhD ©2011 MFMER | slide-1 Division of Biomedical Statistics & Informatics,
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
UIMA SHARP 4 - NLP May 25, Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
1 Peter Fox Xinformatics 4400/6400 Week 11, April 16, 2013 Information Audit and dealing with Unstructured Information.
Open Health Natural Language Processing Consortium (OHNLP)
Open Information Extraction using Wikipedia
Applying dependency parses and SRL: Subject and Generic Attribute Discovery Stephen Wu, Mayo Clinic SHARPn Summit 2012 June 11, 2012.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
Updating JUPITER framework using XML interface Kobe University Susumu Kishimoto.
Indexing UMLS concepts with Apache Lucene Julien Thibault University of Utah Department of Biomedical Informatics.
Value Set Resolution: Build generalizable data normalization pipeline using LexEVS infrastructure resources Explore UIMA framework for implementing semantic.
Event-Based Hybrid Consistency Framework (EBHCF) for Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey.
Multimodal User Interface with Natural Language Classification for Clinicians At Point of Care Health Informatics Showcase Peter Budd Sponsors: NCCH -
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Introduction to GATE Developer Ian Roberts. University of Sheffield NLP Overview The GATE component model (CREOLE) Documents, annotations and corpora.
1 Guy Divita Qing Zeng-Treitler Salt Lake City VA, University of Utah School of Medicine Pragmatic Interoperability.
1 Incorporating Data Mining Applications into Clinical Guidelines Reza Sherafat Dr. Kamran Sartipi Department of Computing and Software McMaster University,
Natural language processing tools Lê Đức Trọng 1.
©2003 Paula Matuszek Taken primarily from a presentation by Lin Lin. CSC 9010: Text Mining Applications.
IBM Research © Copyright IBM Corporation 2005 | A Development Environment for Configurable Meta-Annotators in a Pipelined NLP Architecture Youssef Drissi,
Clinical Data Normalization Dr. Chute Aims: Build generalizable data normalization pipeline Semantic normalization annotators involving LexEVS Establish.
Mayo cTAKES: UIMA Type System
Combining GATE and UIMA Ian Roberts. University of Sheffield NLP 2 Overview Introduction to UIMA Comparison with GATE Mapping annotations between GATE.
MedKAT Medical Knowledge Analysis Tool December 2009.
Clinical Language Annotation, Modeling, and Processing Toolkit (CLAMP)
Toward an Open Source Textual Entailment Platform (Excitement Project) Bernardo Magnini (on behalf of the Excitement consortium) 1 STS workshop, NYC March.
Reviews Crawler (Detection, Extraction & Analysis) FOSS Practicum By: Syed Ahmed & Rakhi Gupta April 28, 2010.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining knowledge from natural language texts using fuzzy associated concept mapping Presenter : Wu,
Part I: Introduction to SHARPn Normalization Hongfang Liu, PhD, Mayo Clinic Tom Oniki, PhD, Intermountain Healthcare.
AUTONOMOUS REQUIREMENTS SPECIFICATION PROCESSING USING NATURAL LANGUAGE PROCESSING - Vivek Punjabi.
Open Health Natural Language Processing Consortium
July 2002, DI Colloquium Semantic Annotation for Semantic Indexing Paul Buitelaar, Martin VolkMuchMore DFKI Language Technology Saarbrücken, Germany Eurospider.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
Extracting CHF information from clinical text using CLAMP Hua Xu, PhD pSCANNER
Showcasing work by Jonnageddala, Liaw, Ray, Kumar, Chang, and Dai on
Semantic natural language understanding at scale using Spark, machine-learned annotators and deep-learned ontologies David Claudiu Branzan.
Health Natural Language Processing Center
cTAKES: Demo Clinical Text Analysis and Knowledge Extraction System
SHARP: Secondary Use Project 1: Data Normalization
Data Normalization Architecture
Towards comprehensive syntactic and semantic annotations of the clinical narrative Daniel Albright, Arrick Lanfranchi, Anwen Fredriksen, William F Styler.
SE goes software engineering; managing the Compose* project.
Presentation transcript:

cTAKES The clinical Text Analysis and Knowledge Extraction System

cTAKES Overview Open source software Natural Language Processing (NLP) Developed at Mayo Clinic Contributed to the Open Health Natural Language Processing (OHNLP) Consortium Built on the Apache UIMA framework Unstructured Information Management Architecture UIMA framework itself is also open source

Open Health Natural Language Processing (OHNLP) Consortium Goal: Foster an open-source collaborative community around clinical NLP that can deliver best-of-breed annotators, leverage the dynamic features of UIMA flow- control, and establish the infrastructure for clinical NLP. www.ohnlp.org

www.ohnlp.org Gateway to News Documentation Downloads Forums for asking questions Bug tracker for reporting issues List of publications

cTAKES Goals Phenotype extraction Generic – to be used for a variety of retrievals and use cases Expandable – at the information model level and methods Modular Cutting edge technologies – best methods combining existing practices and novel research with rapid technology transfer Best software practices (80M+ notes) Commitment to both R and D in R&D

Original cTAKES Components Sentence boundary detection (OpenNLP technology) Tokenization (rule-based) Morphologic normalization (NLM’s LVG) POS tagging (OpenNLP technology) Shallow parsing (OpenNLP technology) Named Entity Recognition Dictionary mapping (lookup algorithm) Negation and context identification (both based on NegEx)

Original cTAKES Named Entities Drug mentions Disease/disorder mentions Sign/symptom mentions Anatomical site mentions With these attributes RxNorm code or Concept Unique Identifier (CUI) and SNOMED-CT codes. Negation (denies chest pain) Status (history of, family history of, possible/probable)

Additional cTAKES Components Smoking status classifier More detailed drug mention annotator dosage route form drug change status and more Peripheral Artery Disease (PAD) annotator Dependency parser

Output Example: Disorder Object “No evidence of unstable angina.” Text: unstable angina Associated codes: SNOMED 4557003 UMLS CUI C0002965 Named entity type: disease/disorder Negation: true

cTAKES Configuration Options XML configuration files Control many things, such as Dictionary location Dictionary format Which dictionaries to use Type of input (plain text or CDA) Forums contain details on creating your own dictionary

cTAKES Methods Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications Guergana K Savova, James J Masanz, Philip V Ogren, Jiaping Zheng, Sunghwan Sohn, Karin C Kipper-Schuler, Christopher G Chute. JAMIA 2010;17:507-513

References http://www.ohnlp.org http://uima.apache.org