IBM Research © Copyright IBM Corporation 2005 | A Development Environment for Configurable Meta-Annotators in a Pipelined NLP Architecture Youssef Drissi,

IBM Research © Copyright IBM Corporation 2005 | A Development Environment for Configurable Meta-Annotators in a Pipelined NLP Architecture Youssef Drissi, Branimir Boguraev, Mary Neff, David Ferrucci, Paul Keyser and Anthony Levas IBM T.J. Watson Research Center {youssefd,bran,ferrucci,pkeyser,levas}@us.ibm.comyoussefd,bran,ferrucci,pkeyser,levas}@us.ibm.com

IBM Research © Copyright IBM Corporation 2003 Outline  Background: -Text Analytics -Unstructured Information Management Architecture (UIMA)  The Challenges -The Consumability Challenges  Our Approach to meet these challenges -The Concept-Centric Approach -Our Text Analytics Development Cycle  A Scenario (Demo) -Detecting sentiments about cars from a corpus of car reviews

IBM Research © Copyright IBM Corporation 2003 Text Analytics FredistheCenterCEOof Organization Person CeoOf Arg2:Org Arg1:Person PP VP NP Parser Named Entity Relationship CenterMicros UIMA: Unstructured Information Management Architecture

IBM Research © Copyright IBM Corporation 2003 UIMA: A runtime framework for Text Analytics UIMA: Unstructured Information Management Architecture CEO Relationship PERSON Finder POS Tagger Tokenizer COMPANY Finder data PERSON COMPANY CEO Relationship Concepts analysis results annotators List of terms Dictionaries Regular expressions Pattern files Statistical models etc. Models represented by

IBM Research © Copyright IBM Corporation 2003 Sample Annotator: Java Code /** * This annotator searches for person titles using simple string matching. * * @param aTCAS TCAS containing document text and previously discovered * annotations, and to which new annotations are to be written. * @param aResultSpec A list of output types and features that this annotator * should produce. * * @see com.ibm.uima.analysis_engine.annotator.TextAnnotator#process(TCAS, ResultSpecification) */ public void process(TCAS aTCAS, ResultSpecification aResultSpec) throws AnnotatorProcessException { try { //If the ResultSpec doesn't include the PersonTitle type, we have //nothing to do. if (!aResultSpec.containsType("example.PersonTitle")) { return; } if (mContainingType == null) { //Search the whole document for PersonTitle annotations String text = aTCAS.getDocumentText(); annotateRange(aTCAS, text, 0, aResultSpec); } else { //Search only within annotations of type mContainingType // Get an iterator over the annotations of type mContainingType. FSIterator it = aTCAS.getAnnotationIndex(mContainingType).iterator(); // Loop over the iterator. while (it.isValid()) { // Get the next annotation from the iterator AnnotationFS annot = (AnnotationFS) it.get(); // Get text covered by this annotation String coveredText = annot.getCoveredText(); // Get begin position of this annotation int annotBegin = annot.getBegin(); //search for matches within this annotateRange(aTCAS, coveredText, annotBegin, aResultSpec); // Advance the iterator. it.moveToNext(); } catch(Exception e) { throw new AnnotatorProcessException(e); }

IBM Research © Copyright IBM Corporation 2003 # Shallow parser cascade: level 8 honour % SUB[], PSUB[], Phrase[] ; boundary % Sentence[] ; #_____ # auxtensed = Token[_unilex=~"VB+AUX:P"] | Token[_unilex=~"VB+AUX:Z"] | Token[_unilex=~"VB+AUX:D"] ; vrbtensed = Token[_unilex=~"VB-AUX:P"] | Token[_unilex=~"VB- AUX:Z"] | Token[_unilex=~"VB-AUX:D"] ; vrbuntensed = Token[_unilex=~"VB-AUX:I"] ; vrbgrpmodal = ( VG[@descend]. Token[_unilex=~"MD"]. Token[_unilex=~"RB"]*. ( ( Token[_unilex=~"VB-AUX:I"] ) | ( Token[_unilex=~"VB+AUX:I"]. Token[_unilex=~"VB-AUX:G"] ) ). Token[_unilex=~"RB"]*. ) | ( PVG[@descend]. Token[_unilex=~"MD"]. Token[_unilex=~"RB"]*. Token[_unilex=~"VB+AUX:I"]. Token[_unilex=~"RB"]*. Token[_unilex=~"VB-AUX:N"]. Token[_unilex=~"RB"]*. ) ; vrbgrpinfform = VG[@descend]. Token[_orth=~*SWORD]*. Token[_unilex=~"VB:I"]. ; Sample Annotator: AFST Grammar Syntax #_____ simplenp = NP[] ;# simple noun phrase possnp = PNP[] ;# possessive noun phrase npp = NPP[] ;# noun phrase with a trailing PP nplist = NPList[] ;# a list of NP's complexnp = CNP[] ;# complex (appositive) NP npphrase = :simplenp | :possnp | :npp | :nplist | :complexnp ; # an entity behaving like an NP #______ export scannerEight = ( :vrbgrptensed | :vrbgrpinfform ). Token[_unilex=~"RP"]|. /[OBJ. :npphrase. /]OBJ ;

IBM Research © Copyright IBM Corporation 2003 Sample Annotator: Semantic Dictionary Authority File 

IBM Research © Copyright IBM Corporation 2003 The Consumability Challenge  Building Analytics is a complex process - Requires highly trained individuals: NLP Experts UIMA Experts Advanced Java programmers with XML skills - Is very time consuming: Need time for learning the UIMA framework Need time for building the annotators

IBM Research © Copyright IBM Corporation 2003 Key Features  End to End Text Analytics Development Tool -Supports the full Cycle of Text Analytics Development Activities  Ease Of Use -Insulates the user from the complexity of the underlying frameworks  Concept-Centric -Lets the user think in terms of concepts as opposed to annotators and software components  Extensibility -Supports for plugging new model types, model editors, results viewers, and exploration tools

IBM Research © Copyright IBM Corporation 2003 Text Analytics Development Cycle Develop Concept Models Identify Domain- Relevant Concepts Configure & Assemble Application Analysis Engine Evaluate Discovery Results Run Analytics Evaluation Results Ontology (Type System) Concept Models Concept Finder Start Structured Information Corpus & Domain Exploration Type System Development

IBM Research © Copyright IBM Corporation 2003 Scenario: Detecting Sentiments about Cars and Car Features

IBM Research © Copyright IBM Corporation 2003 Demo

IBM Research © Copyright IBM Corporation 2003 Conclusion  This work addresses the text analytics consumability challenges with Platform, that provides: -Support the full Cycle of Text Analytics Development Activities -Ease Of Use -Support for a Concept-Centric development process -Extensibility

IBM Research © Copyright IBM Corporation 2003 Thank You Merci Shoukran

IBM Research © Copyright IBM Corporation 2003  Concepts -Concepts to find in Text  Documents -Corpora that can be used in analysis  Concept Finders -Analysis Engines built from concept models  Results -Results from running Concept Finder on Corpora. Overview

IBM Research © Copyright IBM Corporation 2003

IBM Research © Copyright IBM Corporation 2003 GlossEx: Domain Exploration Tool Domain Exploration

IBM Research © Copyright IBM Corporation 2003  Ontology -A group of concepts in a domain  Concept -A Concept in the domain  Model -Analytic for finding a specific Concept Ontologies, Concepts and Models

IBM Research © Copyright IBM Corporation 2003 Build CarAspectModel using Semantic Dictionary CAT 1. Enter a representative Term 2. Select synonyms (e.g. From WordNet) 3. Store Terms in a dictionary Building Models For Concepts

IBM Research © Copyright IBM Corporation 2003 Build CarAspectModel using Semantic Dictionary CAT 1. add representative Terms 2. Select synonyms (e.g. From WordNet) 3. Store Terms in a dictionary Building Models For Concepts

IBM Research © Copyright IBM Corporation 2003 Build a ConceptFinder for CarSentiments 1. Select All Relevant Concepts 2. The System generates a ConceptFinder for the selected concepts Building ConceptFinders

IBM Research © Copyright IBM Corporation 2003 Plugin Components: CATs & KoGs Dictionary Configurable Annotator Configurable Annotator Semantic Dictionary UI CATs Plugin Framework CAT Concordance Indexer KoG KoGs Plugin Framework Concordance Explorer UI KoG

IBM Research © Copyright IBM Corporation 2005 | A Development Environment for Configurable Meta-Annotators in a Pipelined NLP Architecture Youssef Drissi,

Similar presentations

Presentation on theme: "IBM Research © Copyright IBM Corporation 2005 | A Development Environment for Configurable Meta-Annotators in a Pipelined NLP Architecture Youssef Drissi,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IBM Research © Copyright IBM Corporation 2005 | A Development Environment for Configurable Meta-Annotators in a Pipelined NLP Architecture Youssef Drissi,

Similar presentations

Presentation on theme: "IBM Research © Copyright IBM Corporation 2005 | A Development Environment for Configurable Meta-Annotators in a Pipelined NLP Architecture Youssef Drissi,"— Presentation transcript:

Similar presentations

About project

Feedback