Combining GATE and UIMA

Slides:



Advertisements
Similar presentations
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Introduction to Rails.
Advertisements

Bringing Procedural Knowledge to XLIFF Prof. Dr. Klemens Waldhör TAUS Labs & FOM University of Applied Science FEISGILTT 16 October 2012 Seattle, USA.
An Introduction to GATE
Impact of OASIS UIMA Standard on Apache UIMA OASIS Unstructured Information Management Architecture (UIMA) TC
University of Sheffield NLP Module 11: Advanced Machine Learning.
Snejina Lazarova Senior QA Engineer, Team Lead CRMTeam Dimo Mitev Senior QA Engineer, Team Lead SystemIntegrationTeam Telerik QA Academy SOAP-based Web.
Feature requests for Case Manager By Spar Nord Bank A/S IBM Insight 2014 Spar Nord Bank A/S1.
Experiences with UIMA in NLP teaching and research Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.
Text Analytics on UIMA and UIMA Semantic Search Engine ISM209 David Lewis Student Project Presentation
UIMA Overview Fall 2005 OOPD John Anthony. UIMA Conceptual Overview.
Use Case Modelling Visual Annotator for studying ICU Notes Bacchus Beale.
UIMA Introduction SHARPn Summit June 11, 2012
UNIT-V The MVC architecture and Struts Framework.
GRID job tracking and monitoring Dmitry Rogozin Laboratory of Particle Physics, JINR 07/08/ /09/2006.
Implementation Yaodong Bi. Introduction to Implementation Purposes of Implementation – Plan the system integrations required in each iteration – Distribute.
IBM User Technologies 11 / 2004 © 2004 IBM Corporation Information development with DITA Ian Larner User Technologies, IBM Hursley Lab, England
© 2006 IBM Corporation IBM WebSphere Portlet Factory Architecture.
Interoperability in Information Schemas Ruben Mendes Orientador: Prof. José Borbinha MEIC-Tagus Instituto Superior Técnico.
1 CSC 221: Introduction to Programming Fall 2012 Functions & Modules  standard modules: math, random  Python documentation, help  user-defined functions,
4/2/03I-1 © 2001 T. Horton CS 494 Object-Oriented Analysis & Design Software Architecture and Design Readings: Ambler, Chap. 7 (Sections to start.
UIMA SHARP 4 - NLP May 25, Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.
WordFreak A Language Independent, Extensible Annotation Tool.
Querying Structured Text in an XML Database By Xuemei Luo.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
EXist Indexing Using the right index for you data Date: 9/29/2008 Dan McCreary President Dan McCreary & Associates (952) M.
Web Services Standards. Introduction A web service is a type of component that is available on the web and can be incorporated in applications or used.
Introduction to GATE Developer Ian Roberts. University of Sheffield NLP Overview The GATE component model (CREOLE) Documents, annotations and corpora.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Introduction to programming in the Java programming language.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
©2003 Paula Matuszek Taken primarily from a presentation by Lin Lin. CSC 9010: Text Mining Applications.
IBM Research © Copyright IBM Corporation 2005 | A Development Environment for Configurable Meta-Annotators in a Pipelined NLP Architecture Youssef Drissi,
Xml:tm XML Text Memory Using XML technology to reduce the cost of translating XML documents.
Fall 2013, Databases, Exam 2 Questions for the second exam…
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
Combining GATE and UIMA Ian Roberts. University of Sheffield NLP 2 Overview Introduction to UIMA Comparison with GATE Mapping annotations between GATE.
Working With Objects Tonga Institute of Higher Education.
Dean Anderson Polk County, Oregon GIS in Action 2014 Modifying Open Source Software (A Case Study)
Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de A.
Reviews Crawler (Detection, Extraction & Analysis) FOSS Practicum By: Syed Ahmed & Rakhi Gupta April 28, 2010.
The Instruction Set Architecture. Hardware – Software boundary Java Program C Program Ada Program Compiler Instruction Set Architecture Microcode Hardware.
Combining GATE and UIMA Ian Roberts. 2 Overview Introduction to UIMA Comparison with GATE Mapping annotations between GATE and UIMA.
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
Apache Cocoon – XML Publishing Framework 데이터베이스 연구실 박사 1 학기 이 세영.
1 Introduction to Engineering Spring 2007 Lecture 18: Digital Tools 2.
Introduction toData structures and Algorithms
Unit 2 Technology Systems
Type Checking Generalizes the concept of operands and operators to include subprograms and assignments Type checking is the activity of ensuring that the.
Spark Presentation.
Programming the Web Using Visual Studio .NET
DBW - PHP DBW2017.
ECE 551: Digital System Design & Synthesis
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
Data Modeling II XML Schema & JAXB Marc Dumontier May 4, 2004
Extraction, aggregation and classification at Web Scale
Overview of Hadoop MapReduce MapReduce is a soft work framework for easily writing applications which process vast amounts of.
CS6604 Digital Libraries IDEAL Webpages Presented by
Chapter 11 Introduction to Programming in C
Chapter 10 Thinking in Objects
Chapter 9 Web Services: JAX-RPC, WSDL, XML Schema, and SOAP
Chapter 15 Introduction to Rails.
Collections Not in our text.
Introduction C is a general-purpose, high-level language that was originally developed by Dennis M. Ritchie to develop the UNIX operating system at Bell.
Lecture 8 Information Retrieval Introduction
Introduction to Data Structure
JavaScript is a scripting language designed for Web pages by Netscape.
Oriented Design and Abstract Data Type
SPL – PS1 Introduction to C++.
Presentation transcript:

Combining GATE and UIMA Ian Roberts

Overview Introduction to UIMA Comparison with GATE Mapping annotations between GATE and UIMA Examples and demo

What is UIMA? Language processing framework developed by IBM Similar document processing pipeline architecture to GATE Concentrates on performance and scalability Supports components written in different programming languages (currently Java and C++) Native support for distributed processing via web services

UIMA Terminology Processing tasks in UIMA are encapsulated in Analysis Engines (AEs) Text-specific processing by Text Analysis Engines (TAEs) In UIMA, AEs can be primitive (~ a single PR in GATE terms), or aggregate (~ a GATE controller). Aggregate AE can include other primitive or aggregate AEs GATE includes interoperability layer to run GATE controller as a primitive TAE in UIMA UIMA TAE (primitive or aggregate) as a GATE PR

UIMA and GATE In GATE, unit of processing is the Document Text, plus features, plus annotations Annotations can have arbitrary features, with any Java object as value In UIMA, unit of processing is CAS (common analysis structure) Text, plus Feature Structures Annotations are just a special kind of FS, which includes start and end offset features

Key Differences In GATE, annotations can have any features, with any values In UIMA, feature structures are strongly typed Must declare what types of annotations are supported by each analysis engine Must specify what features each annotation type supports Must specify what type feature values may take Primitive types - string, integer, float Reference types - reference to another FS in the CAS Arrays of the above All defined in XML descriptor for the AE

Integrating GATE and UIMA So the problem is to map between the loosely-typed GATE world and the strongly-typed UIMA world Best explained by example…

Example 1 Simple UIMA annotator that annotates each instance of the word “Goldfish” in a document. Does not need any input annotations Produces output annotations of type gate.example.Goldfish

Example 1 GATE UIMA This is a document that talks about Goldfish… Create UIMA doc This is a document that talks about Goldfish… Add GATE annotation of type Goldfish at the corresponding place Goldfish Annotator adds annotation of type gate.example.Goldfish Goldfish Copy annotations back Run UIMA annotator

Example 2 We may want to copy annotations, as well as text, from the original GATE document. Consider a UIMA annotator that takes gate.example.Sentence annotations as input annotates “Goldfish” as before also adds a feature GoldfishCount to each Sentence giving the number of goldfish annotations in that sentence

Example 2 GATE UIMA Create UIMA doc, with Sentence annotations numFish = 1 …and adding a feature to each Sentence GoldfishCount = 1 This is a document that talks about Goldfish. Goldfish are easy to look after, and … This is a document that talks about Goldfish. Goldfish are easy to look after, and … Goldfish Goldfish Run UIMA annotator, annotating Goldfish as before… Copy Goldfish annotations back… … and also want to copy new feature back to original Sentences We need an index linking UIMA annotations to the GATE annotations they came from

Defining the mapping The mapping must be defined by the user in XML <uimaGateMapping> <inputs> <uimaAnnotation type="gate.example.Sentence" gateType="Sentence" indexed="true"/> </inputs> … create a UIMA annotation of this type at the same place… For each GATE annotation of type Sentence… … and remember this mapping Can copy features as well, but not in this example. Don’t index by default for efficiency in distributed operation

Defining the mapping (2) <outputs> <added> <gateAnnotation type="Goldfish" uimaType="gate.example.Goldfish" /> </added> <updated> <gateAnnotation type="Sentence" uimaType="gate.example.Sentence"> <feature name="numFish"> <uimaFSFeatureValue name="gate.example.Sentence:GoldfishCount" kind="int" /> </feature> </gateAnnotation> </updated> </outputs> </uimaGateMapping> … create a GATE annotation at the same place For each UIMA annotation of this type… … find the GATE annotation it came from… For each UIMA annotation of this type… … and set its numFish feature… … to the value of the GoldfishCount feature of the UIMA annotation. Notice that GATE/UIMA feature names need not match