A knowledge-based text annotation tool

Slides:



Advertisements
Similar presentations
Test Automation Success: Choosing the Right People & Process
Advertisements

ClearTK: A Framework for Statistical Biomedical Natural Language Processing Philip Ogren Philipp Wetzler Department of Computer Science University of Colorado.
A general-purpose text annotation tool called Knowtator is presented. Knowtator facilitates the manual creation of annotated corpora that can be used for.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya F. Noy and Mark A. Musen.
Semantic Mediation & OWS 8 Glenn Guempel
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Process-oriented System Automation Executable Process Modeling & Process Automation.
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
IDR Snapshot: Quantitative Assessment Methodology Evaluating Size and Comprehensiveness of an Integrated Data Repository Vojtech Huser, MD, PhD a James.
ATLAS Demystified: A Practical Introduction Christophe Laprun, Jonathan Fiscus, John Garofolo, Sylvain Pajot National Institute of Standards and Technology.
Interpreting Dictionary Definitions Dan Tecuci May 2002.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Flashcard Application —A facebook application with multiple purposes Aobo Wang 1.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
SRDR Quarterly Training Brown Evidence-based Practice Center Brown University June 20 th, :00pm-2:00pm Entering Data Retrospectively into SRDR The.
Nadir Saghar, Tony Pan, Ashish Sharma REST for Data Services.
Acknowledgements Contact Information Objective An automated annotation tool was developed to assist human annotators in the efficient production of a high.
SSO: THE SYNDROMIC SURVEILLANCE ONTOLOGY Okhmatovskaia A, Chapman WW, Collier N, Espino J, Conway M, Buckeridge DL Ontology Description The SSO was developed.
Effort.vs. Software Product “Quality” Effort Product “Quality” Which curve? - linear? - logarithmic? - exponential?
Query Health Technical WG Update 12/1/2011. Agenda TopicTime Slot F2F Update (Actions, Decisions and FollowUps) 2:05 – 2:50 pm Wrap Up2:50 - 2:55 pm.
Proposed Preliminary Statewide Full Service Partnership Classification System BASED ON STAKEHOLDER FEEDBACK THIS REPORT IS THE MENTAL HEALTH SERVICES OVERSIGHT.
Supporting Collaborative Ontology Development in Protégé International Semantic Web Conference 2008 Tania Tudorache, Natalya F. Noy, Mark A. Musen Stanford.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Banaras Hindu University. A Course on Software Reuse by Design Patterns and Frameworks.
IST 210 Database Design Process IST 210, Section 1 Todd S. Bacastow January 2004.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Paths to a Reference Architecture for an Open Bio Grid Rick Stevens.
Management Academy for Public Health SCHOOL OF PUBLIC HEALTH ● ● KENAN-FLAGLER BUSINESS SCHOOL The Management Academy For Public Health: Developing Entrepreneurial.
1 The XMSF Profile Overlay to the FEDEP Dr. Katherine L. Morse, SAIC Mr. Robert Lutz, JHU APL
Product Training Program
Introduction to DBMS Purpose of Database Systems View of Data
Jennie Larkin, PhD Senior Advisor
It takes a whole village to raise a child.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Introduction for Academic Business Managers
ELD Next Generation Standards
z/Ware 2.0 Technical Overview
Physical Data Model – step-by-step instructions and template
Globey's World Abstract End-Product Description Technical Approach
CSC 221: Computer Programming I Fall 2005
Development of the Amphibian Anatomical Ontology
Research4Life Programmes: Similarities and Differences!
Iterative design and prototyping
Business System Development
Chapter 6 Database Design
Databases and Information Management
Display of Near Optimal Sequence Alignments
Functional Annotation of the Horse Genome
Social Knowledge Mining
Tools of Software Development
Evaluating Compuware OptimalJ as an MDA tool
Health Ingenuity Exchange - HingX
Discussion Class 7 Lucene.
CSE 490ra Projects.
European Social Dialogue Agreement on Silica Claire Lanne (IMA-Europe)
The Gene Ontology: an evolution
Ontology-Based Approaches to Data Integration
CHAPTER 9 (part a) BASIC INFORMATION SYSTEMS CONCEPTS
Clustering Gene Expression Data Using Independent Component Analysis
Business Process Management and Semantic Technologies
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 Tools of Software Development l 2 types of tools used by software engineers:
Building Ontologies with Protégé-2000
A Simple SQA Plan.
Ponder policy toolkit Jovana Balkoski, Rashid Mijumbi
Chapter 2: Building a System
Building a “System” Moving from writing a program to building a system. What’s the difference?! Complexity, size, complexity, size complexity Breadth.
Presentation transcript:

A knowledge-based text annotation tool Knowtator A knowledge-based text annotation tool

Philip Ogren (Philip.Ogren@uchsc.edu) Larry Hunter, PhD (Larry.Hunter@uchsc.edu) Center for Computational Pharmacology University of Colorado Health Sciences Center Aurora, CO

bionlp.sourceforge.net/Knowtator Availability: bionlp.sourceforge.net/Knowtator Source code will be available under MPL soon. Comments and suggestions welcome! This work was supported by NIH grant R01-LM008111

Knowtator is: A general-purpose text annotation tool A Protégé plugin

Knowtator screenshot

Synopsis Knowtator facilitates the manual creation of training and evaluation corpora for a variety of biomedical language processing tasks. Knowtator’s key strength is the ability to define an annotation schema using a Protégé knowledge base.

Features Stand-off annotation Inter-annotator agreement metrics Original text is not modified Inter-annotator agreement metrics Simple API allows annotation of any arbitrary text source. Annotation filters All annotations are assigned an annotator and (optionally) one or more annotation sets. Annotations of many types, from multiple annotators and annotation sets can clutter the user interface. Filters allow viewing select annotations

Knowtator annotation schemas are defined by a Protégé knowledge base Biological and linguistic concepts can be modeled in Protégé.

Entities in an annotation schema are defined by Protégé class definitions. Protégé slots and constraints on those slots can be used to relate annotations in meaningful ways. Class definition for endocytosis

Example: endocytosis annotation Annotations of endocytosis relate to annotations of cellular component and molecule via the slot definitions of the endocytosis class definition. Six slots of endocytosis location: filled by cellular component annotations origin: subslot of location destination: subslot of location transport participants: filled by molecule annotations transported entities: subslot of transport participants transporters: subslot of transport participants

Example endocytosis annotation

Knowtator data model The goal of Knowtator is to create mappings between concepts represented in a knowledge base and texts that talk about those concepts.

Ontology/knowledge base of concepts and relationships (Protégé frames) The Knowtator data model has three parts: Ontology/knowledge base of concepts and relationships (Protégé frames) Mentions of concepts and assertions about relationships between concepts found in text A mapping between the target text and members of 1 and 2 (annotations)

II. Mentions/Assertions III. Annotations I. Ontology/KB Endocytosis of molecule with thromboxane A2 receptor from endosome to cell surface

To do: report on annotation efforts mechanism for semi-automated annotation import/export scripts for other annotation formats (e.g. ATLAS)