Presentation is loading. Please wait.

Presentation is loading. Please wait.

Encoding Extraction as Inferences

Similar presentations


Presentation on theme: "Encoding Extraction as Inferences"— Presentation transcript:

1 Encoding Extraction as Inferences
J. William Murdock1, Paulo Pinheiro da Silva2, David Ferrucci1, Christopher Welty1, Deborah McGuinness2 1IBM Watson Research Center 19 Skyline Drive Hawthorne, NY 10532, USA 2Knowledge Systems Laboratory Stanford University Stanford, CA 94305, USA

2 Reuse, adapt, and integrate existing extraction technology: UIMA
Idea Providing a declarative representation of knowledge extraction processes To provide browsable explanations For other metacognition tasks Checking consistency, determining trust, selecting components, etc. Reuse, adapt, and integrate existing representation/explanation technology: Inference Web Reuse, adapt, and integrate existing extraction technology: UIMA

3 Pre-Existing Inference Web Technology
Enables browsing representations of processes Requires that systems to describe their processing as inferences Has been used with theorem-proving technology Lends itself to formal, inference representation We are now applying this technique to knowledge extraction Requires a new perspective: extraction as inference

4 Pre-Existing Inference Web Component: IW Browser

5 Pre-Existing UIMA Technology
Architecture and framework for creating, composing, and deploying multi-modal analytics UIMA provides a declarative model of structure, requirements, of analytic components. UIMA provides shared programming interfaces and knowledge structures for analytics Because components share interface and structures, one can provide generic tools that interact with components E.g., a tool to record analysis tasks as inferences UIMA provides a declarative model of structure, requirements, of analytic components.

6 Pre-Existing UIMA Tooling: Semantic Analysis Workbench

7 Formalizing UIMA Analytics for IW
A taxonomy of extraction tasks expressed as inference rules A language for expressing the conclusions of a given extraction inference rule Components that record extraction traces using tasks in the taxonomy Client side software iterates through internal extraction results and infers what processes were performed Server side registrar produces formally represented semantic web nodes in order to leverage iw for text analysis, must render text analysis process conclusions in same formalization Uses an upper-level type system for extraction results sufficient for expressing the conclusions

8 Taxonomy (1): Extraction
1) Entity Recognition: A span refers to some entity of a specified type 2) Relation Recognition: A span refers to some relation of a specified type 3) Relation Annotation Argument Identification: An annotation fills a role in a relation annotation 4) Entity Identification: An entity annotation refers to a particular entity Extraction Results 5) Relation Identification: A relation annotation refers to a particular relation Relation: OwnerOf(e1, e2) 6) Extracted Entity Classification: An entity has a particular type Recorded Trace Entity: e1 Entity: e1 (Person) Entity: e2 Entity: e2 (Organization) ER ER ER RR OwnerOf (Relation Annotation) RAAI RAAI EI EI RI Person (Entity Annotation) Organization (Entity Annotation) Person (Entity Annotation) Thomas Gradgrind is the owner of Gradgrind Foods. He lives in New York City. EEC EEC

9 Taxonomy (2): Semantic Integration
7) Entity Mapping: An entity encoded in the target ontology is derived from an entity or relation encoded in the type system Integration Results 8) Relation Mapping: A relation in the target ontology is derived from an entity or relation instance in the type system Relation: <HasOwner #e2 #e1> 9) Target Entity Classification: An entity is a member of a class in the target ontology Entity: #e1 (Person) Entity: #e1 Entity: #e2 (Company) Entity: #e2 Recorded Trace Relation: OwnerOf(e1, e2) ER ER ER RR Entity: e1 Entity: e1 (Person) Entity: e2 Entity: e2 (Organization) RAAI RAAI EI EI RI OwnerOf (Relation Annotation) EEC EEC RM Person (Entity Annotation) Organization (Entity Annotation) Person (Entity Annotation) EM EM Thomas Gradgrind is the owner of Gradgrind Foods. He lives in New York City. TEC TEC

10 Visualizing the Recorded Trace
Direct Assertion Direct Assertion He lives in New York City. Direct Assertion Thomas Gradgrind is the owner of Gradgrind Foods. (subClassOf Person Entity) Thomas Gradgrind (Person) is the owner of Gradgrind Foods. IBM EAnnotator Entity Recognition He (Person) lives in New York City. IBM ACE-model Annotator Entity Recognition “Thomas Gradgrind (Person)”, “He (Person)” refer to E1 IBM Cross-Annotator Coreference Entity Identification (hasClass E1 Person) IBM Cross-Annotator Coreference Extracted Entity Classification

11 Representing Effects Internal representations are used to encode the conclusions for the steps in the trace Source text is encoded in the original natural language Intermediate results relating to text are encoded using a specialized formalism: refersToMemberOf( 0-16, com.ibm.hutt.Person) Thomas Gradgrind (Person) is the owner of Gradgrind Foods. Final results (extracted knowledge) is represented in KIF. May want to use KIF for intermediate results too.

12 Proof Markup Language (Simplified)
<rdf:RDF xmlns:iw= xmlns:rdf= xmlns:owl=" <iw:MethodRule rdf:about=" <iw:InferenceEngine rdf:about=" <iw:NodeSet rdf:about=" <iw:Language rdf:about=" <iw:conclusion>refersToMemberOf(0-16 , com.ibm.hutt.Person)</iw:conclusion> <iw:isConsequentOf rdf:parseType="Collection"> <iw:InferenceStep iw:hasIndex="0"> <iw:hasAntecedent rdf:parseType="Collection"> <iw:NodeSet rdf:about=" </iw:hasAntecedent> </iw:InferenceStep> </iw:isConsequentOf> </iw:NodeSet> </rdf:RDF> Thomas Gradgrind (Person) is the owner of Gradgrind Foods. IBM EAnnotator Entity Recognition

13 Accessing Traces in the Semantic Analysis Workbench

14

15 Future Work: Taxonomy Extensions
Other primitive text extraction tasks e.g., assigning canonical form and variant forms to entities Primitive extraction tasks from other modalities audio, images, video, etc. Non-primitive extraction tasks e.g., aggregating entity identification, entity classification, etc. into coreference resolution More abstract explanations at a non-primitive level, with ability to drill-down to primitive tasks

16 Future Work: Other Metacognition Applications
Determining trust for extracted knowledge Inference Web trace connects conclusions to sources via inference steps. Trust for a conclusion depends on trust of the sources and trust of the inference steps. Checking consistency e.g., rejecting entity recognition over a span that does not exist Automatically selecting extraction components Based on information about the component in the IW Registry Based on past IW traces (CBR)


Download ppt "Encoding Extraction as Inferences"

Similar presentations


Ads by Google