UIMA Introduction SHARPn Summit June 11, 2012 Hi, I’m James Masanz from Mayo Clinic. Welcome.
Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations interactively
UIMA Terminology CAS XCAS JCAS View Analysis Engine (AE) / Annotator XML output: XCAS XMI Type System JCasGen CAS Visual Debugger (CVD) CPE (Collection Processing Engine) We will not have time to cover all these today. Note, those terms in italics are UIMA-specific.
UIMA Framework Tooling Defining data types Passing data from one component to another Tooling Viewing results Debugging Editing XML visually UIMA is both a framework and tooling built around the framework. UIMA = Unstructured Information Management Architecture
Data Through a Pipeline Type System Defines the data types passed along CAS (Common Analysis Structure) Container for the data passed along Created by UIMA from the Type System Type System defines what data types will be allowed in the CAS. A CAS is a runtime datastructure.
Parts of a UIMA Pipeline Collection Reader Read input document Analysis Engine(s) / Annotator(s) Process document CAS Consumer Output data
Tying a Pipeline Together CPE descriptor (Collection Processing Engine) Collection Reader Analysis Engine(s) CAS Consumer Aggregate analysis engine Multiple Analysis Engines and their order
Pipeline Example UIMA term Collection Reader Example Analysis Engine CAS Consumer Example Read files from a dir Sentence detector Tokenizer annotator Part of Speech tagger Output tokens to DB
UIMA plugin for Eclipse Provides visual editors for descriptors Mini GUI for selecting options Rather than editing XML directly An “Update site” exists for installing plugin http://www.apache.org/dist/incubator/uima/eclipse-update-site
UIMA Tooling Options Tools: Options: CPE Configurator CVD (CAS Visual Debugger) Options: Command line scripts/.bat files Run within Eclipse
Running a Pipeline - CPE cTAKES provides a script and a bat file runctakesCPE Choose a CPE descriptor, such as test_plaintext.xml from cTAKESdesc/cdpdesc/collection_processing_engine
Viewing Annotations - CVD Viewing annotations using the CVD Load the Type System Load the XCAS or XMI
Annotation Viewers UIMA tools Viewing XML output CVD (CAS Visual Debugger) Annotation viewer Viewing XML output Any XML viewer Any text editor
Questions? http://uima.apache.org/ masanz.james@mayo.edu
Supplemental slides follow
Options to Run a Pipeline CPE GUI CVD GUI Single Aggregate Analysis Engine No Collection Reader Instantiate a CpeDescription and invoke the process() method uimaFIT– removes dependency on XML
Creating a New Annotator Within Eclipse Create Java project Right click -> Add UIMA Nature Add UIMA jars to .classpath (Build Path) Create Analysis Engine (AE) descriptor Add types to AE descriptor, or optionally create separate Type System descriptor Write code!
Running an AE in CVD Using CVD to run an Analysis Engine No Collection Reader Single Analysis Engine (can be an aggregate) No CAS Consumer Load an Analysis Engine Paste/type in text to process Family history of hyperlipidemia.
Modifying a parameter UIMA’s descriptor editors allow you to modify most parameters without looking at the XML itself.
Links Getting started with UIMA UIMA Update site for use in Eclipse http://uima.apache.org/doc-uima-annotator.html UIMA Update site for use in Eclipse http://www.apache.org/dist/incubator/uima/eclipse-update-site
Email address masanz.james@mayo.edu