Download presentation
Presentation is loading. Please wait.
Published byBuck Rich Modified over 9 years ago
1
IBM Research © Copyright IBM Corporation 2005 | A Development Environment for Configurable Meta-Annotators in a Pipelined NLP Architecture Youssef Drissi, Branimir Boguraev, Mary Neff, David Ferrucci, Paul Keyser and Anthony Levas IBM T.J. Watson Research Center {youssefd,bran,ferrucci,pkeyser,levas}@us.ibm.comyoussefd,bran,ferrucci,pkeyser,levas}@us.ibm.com
2
IBM Research © Copyright IBM Corporation 2003 Outline Background: -Text Analytics -Unstructured Information Management Architecture (UIMA) The Challenges -The Consumability Challenges Our Approach to meet these challenges -The Concept-Centric Approach -Our Text Analytics Development Cycle A Scenario (Demo) -Detecting sentiments about cars from a corpus of car reviews
3
IBM Research © Copyright IBM Corporation 2003 Text Analytics FredistheCenterCEOof Organization Person CeoOf Arg2:Org Arg1:Person PP VP NP Parser Named Entity Relationship CenterMicros UIMA: Unstructured Information Management Architecture
4
IBM Research © Copyright IBM Corporation 2003 UIMA: A runtime framework for Text Analytics UIMA: Unstructured Information Management Architecture CEO Relationship PERSON Finder POS Tagger Tokenizer COMPANY Finder data PERSON COMPANY CEO Relationship Concepts analysis results annotators List of terms Dictionaries Regular expressions Pattern files Statistical models etc. Models represented by
5
IBM Research © Copyright IBM Corporation 2003 Sample Annotator: Java Code /** * This annotator searches for person titles using simple string matching. * * @param aTCAS TCAS containing document text and previously discovered * annotations, and to which new annotations are to be written. * @param aResultSpec A list of output types and features that this annotator * should produce. * * @see com.ibm.uima.analysis_engine.annotator.TextAnnotator#process(TCAS, ResultSpecification) */ public void process(TCAS aTCAS, ResultSpecification aResultSpec) throws AnnotatorProcessException { try { //If the ResultSpec doesn't include the PersonTitle type, we have //nothing to do. if (!aResultSpec.containsType("example.PersonTitle")) { return; } if (mContainingType == null) { //Search the whole document for PersonTitle annotations String text = aTCAS.getDocumentText(); annotateRange(aTCAS, text, 0, aResultSpec); } else { //Search only within annotations of type mContainingType // Get an iterator over the annotations of type mContainingType. FSIterator it = aTCAS.getAnnotationIndex(mContainingType).iterator(); // Loop over the iterator. while (it.isValid()) { // Get the next annotation from the iterator AnnotationFS annot = (AnnotationFS) it.get(); // Get text covered by this annotation String coveredText = annot.getCoveredText(); // Get begin position of this annotation int annotBegin = annot.getBegin(); //search for matches within this annotateRange(aTCAS, coveredText, annotBegin, aResultSpec); // Advance the iterator. it.moveToNext(); } catch(Exception e) { throw new AnnotatorProcessException(e); }
6
IBM Research © Copyright IBM Corporation 2003 # Shallow parser cascade: level 8 honour % SUB[], PSUB[], Phrase[] ; boundary % Sentence[] ; #_____ # auxtensed = Token[_unilex=~"VB+AUX:P"] | Token[_unilex=~"VB+AUX:Z"] | Token[_unilex=~"VB+AUX:D"] ; vrbtensed = Token[_unilex=~"VB-AUX:P"] | Token[_unilex=~"VB- AUX:Z"] | Token[_unilex=~"VB-AUX:D"] ; vrbuntensed = Token[_unilex=~"VB-AUX:I"] ; vrbgrpmodal = ( VG[@descend]. Token[_unilex=~"MD"]. Token[_unilex=~"RB"]*. ( ( Token[_unilex=~"VB-AUX:I"] ) | ( Token[_unilex=~"VB+AUX:I"]. Token[_unilex=~"VB-AUX:G"] ) ). Token[_unilex=~"RB"]*. ) | ( PVG[@descend]. Token[_unilex=~"MD"]. Token[_unilex=~"RB"]*. Token[_unilex=~"VB+AUX:I"]. Token[_unilex=~"RB"]*. Token[_unilex=~"VB-AUX:N"]. Token[_unilex=~"RB"]*. ) ; vrbgrpinfform = VG[@descend]. Token[_orth=~*SWORD]*. Token[_unilex=~"VB:I"]. ; Sample Annotator: AFST Grammar Syntax #_____ simplenp = NP[] ;# simple noun phrase possnp = PNP[] ;# possessive noun phrase npp = NPP[] ;# noun phrase with a trailing PP nplist = NPList[] ;# a list of NP's complexnp = CNP[] ;# complex (appositive) NP npphrase = :simplenp | :possnp | :npp | :nplist | :complexnp ; # an entity behaving like an NP #______ export scannerEight = ( :vrbgrptensed | :vrbgrpinfform ). Token[_unilex=~"RP"]|. /[OBJ. :npphrase. /]OBJ ;
7
IBM Research © Copyright IBM Corporation 2003 Sample Annotator: Semantic Dictionary Authority File
8
IBM Research © Copyright IBM Corporation 2003 The Consumability Challenge Building Analytics is a complex process - Requires highly trained individuals: NLP Experts UIMA Experts Advanced Java programmers with XML skills - Is very time consuming: Need time for learning the UIMA framework Need time for building the annotators
9
IBM Research © Copyright IBM Corporation 2003 Key Features End to End Text Analytics Development Tool -Supports the full Cycle of Text Analytics Development Activities Ease Of Use -Insulates the user from the complexity of the underlying frameworks Concept-Centric -Lets the user think in terms of concepts as opposed to annotators and software components Extensibility -Supports for plugging new model types, model editors, results viewers, and exploration tools
10
IBM Research © Copyright IBM Corporation 2003 Text Analytics Development Cycle Develop Concept Models Identify Domain- Relevant Concepts Configure & Assemble Application Analysis Engine Evaluate Discovery Results Run Analytics Evaluation Results Ontology (Type System) Concept Models Concept Finder Start Structured Information Corpus & Domain Exploration Type System Development
11
IBM Research © Copyright IBM Corporation 2003 Scenario: Detecting Sentiments about Cars and Car Features
12
IBM Research © Copyright IBM Corporation 2003 Demo
13
IBM Research © Copyright IBM Corporation 2003 Conclusion This work addresses the text analytics consumability challenges with Platform, that provides: -Support the full Cycle of Text Analytics Development Activities -Ease Of Use -Support for a Concept-Centric development process -Extensibility
14
IBM Research © Copyright IBM Corporation 2003 Thank You Merci Shoukran
15
IBM Research © Copyright IBM Corporation 2003 Concepts -Concepts to find in Text Documents -Corpora that can be used in analysis Concept Finders -Analysis Engines built from concept models Results -Results from running Concept Finder on Corpora. Overview
16
IBM Research © Copyright IBM Corporation 2003
17
IBM Research © Copyright IBM Corporation 2003 GlossEx: Domain Exploration Tool Domain Exploration
18
IBM Research © Copyright IBM Corporation 2003 Ontology -A group of concepts in a domain Concept -A Concept in the domain Model -Analytic for finding a specific Concept Ontologies, Concepts and Models
19
IBM Research © Copyright IBM Corporation 2003 Build CarAspectModel using Semantic Dictionary CAT 1. Enter a representative Term 2. Select synonyms (e.g. From WordNet) 3. Store Terms in a dictionary Building Models For Concepts
20
IBM Research © Copyright IBM Corporation 2003 Build CarAspectModel using Semantic Dictionary CAT 1. add representative Terms 2. Select synonyms (e.g. From WordNet) 3. Store Terms in a dictionary Building Models For Concepts
21
IBM Research © Copyright IBM Corporation 2003 Build CarSentimentModel using AFST CAT 1. Drag and Drop ConceptModels onto WorkArea 2. Interconnect to define pattern sequence Building Models
22
IBM Research © Copyright IBM Corporation 2003 Build a ConceptFinder for CarSentiments 1. Select All Relevant Concepts 2. The System generates a ConceptFinder for the selected concepts Building ConceptFinders
23
IBM Research © Copyright IBM Corporation 2003 Run ConceptFinder on a Corpus 1. Select ConceptFinder 2. Select Corpus 3. Run the analysis Running Analytics to get Results
24
IBM Research © Copyright IBM Corporation 2003 Annotations Viewer Results Evaluation
25
IBM Research © Copyright IBM Corporation 2003 Concordance Viewier Iterative Refinement Tools
26
IBM Research © Copyright IBM Corporation 2003 Collection Level Statistics : Comparing Results Results Evaluation
27
IBM Research © Copyright IBM Corporation 2003 Plugin Components: CATs & KoGs Dictionary Configurable Annotator Configurable Annotator Semantic Dictionary UI CATs Plugin Framework CAT Concordance Indexer KoG KoGs Plugin Framework Concordance Explorer UI KoG
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.