Encoding Extraction as Inferences

Encoding Extraction as Inferences
J. William Murdock1, Paulo Pinheiro da Silva2, David Ferrucci1, Christopher Welty1, Deborah McGuinness2 1IBM Watson Research Center 19 Skyline Drive Hawthorne, NY 10532, USA 2Knowledge Systems Laboratory Stanford University Stanford, CA 94305, USA

Reuse, adapt, and integrate existing extraction technology: UIMA
Idea Providing a declarative representation of knowledge extraction processes To provide browsable explanations For other metacognition tasks Checking consistency, determining trust, selecting components, etc. Reuse, adapt, and integrate existing representation/explanation technology: Inference Web Reuse, adapt, and integrate existing extraction technology: UIMA

Pre-Existing Inference Web Technology
Enables browsing representations of processes Requires that systems to describe their processing as inferences Has been used with theorem-proving technology Lends itself to formal, inference representation We are now applying this technique to knowledge extraction Requires a new perspective: extraction as inference

Pre-Existing Inference Web Component: IW Browser

Pre-Existing UIMA Technology
Architecture and framework for creating, composing, and deploying multi-modal analytics UIMA provides a declarative model of structure, requirements, of analytic components. UIMA provides shared programming interfaces and knowledge structures for analytics Because components share interface and structures, one can provide generic tools that interact with components E.g., a tool to record analysis tasks as inferences UIMA provides a declarative model of structure, requirements, of analytic components.

Pre-Existing UIMA Tooling: Semantic Analysis Workbench

Formalizing UIMA Analytics for IW
A taxonomy of extraction tasks expressed as inference rules A language for expressing the conclusions of a given extraction inference rule Components that record extraction traces using tasks in the taxonomy Client side software iterates through internal extraction results and infers what processes were performed Server side registrar produces formally represented semantic web nodes in order to leverage iw for text analysis, must render text analysis process conclusions in same formalization Uses an upper-level type system for extraction results sufficient for expressing the conclusions

Taxonomy (1): Extraction
1) Entity Recognition: A span refers to some entity of a specified type 2) Relation Recognition: A span refers to some relation of a specified type 3) Relation Annotation Argument Identification: An annotation fills a role in a relation annotation 4) Entity Identification: An entity annotation refers to a particular entity Extraction Results 5) Relation Identification: A relation annotation refers to a particular relation Relation: OwnerOf(e1, e2) 6) Extracted Entity Classification: An entity has a particular type Recorded Trace Entity: e1 Entity: e1 (Person) Entity: e2 Entity: e2 (Organization) ER ER ER RR OwnerOf (Relation Annotation) RAAI RAAI EI EI RI Person (Entity Annotation) Organization (Entity Annotation) Person (Entity Annotation) Thomas Gradgrind is the owner of Gradgrind Foods. He lives in New York City. EEC EEC

Taxonomy (2): Semantic Integration
7) Entity Mapping: An entity encoded in the target ontology is derived from an entity or relation encoded in the type system Integration Results 8) Relation Mapping: A relation in the target ontology is derived from an entity or relation instance in the type system Relation: <HasOwner #e2 #e1> 9) Target Entity Classification: An entity is a member of a class in the target ontology Entity: #e1 (Person) Entity: #e1 Entity: #e2 (Company) Entity: #e2 Recorded Trace Relation: OwnerOf(e1, e2) ER ER ER RR Entity: e1 Entity: e1 (Person) Entity: e2 Entity: e2 (Organization) RAAI RAAI EI EI RI OwnerOf (Relation Annotation) EEC EEC RM Person (Entity Annotation) Organization (Entity Annotation) Person (Entity Annotation) EM EM Thomas Gradgrind is the owner of Gradgrind Foods. He lives in New York City. TEC TEC

Visualizing the Recorded Trace
Direct Assertion Direct Assertion He lives in New York City. Direct Assertion Thomas Gradgrind is the owner of Gradgrind Foods. (subClassOf Person Entity) Thomas Gradgrind (Person) is the owner of Gradgrind Foods. IBM EAnnotator Entity Recognition He (Person) lives in New York City. IBM ACE-model Annotator Entity Recognition “Thomas Gradgrind (Person)”, “He (Person)” refer to E1 IBM Cross-Annotator Coreference Entity Identification (hasClass E1 Person) IBM Cross-Annotator Coreference Extracted Entity Classification

Representing Effects Internal representations are used to encode the conclusions for the steps in the trace Source text is encoded in the original natural language Intermediate results relating to text are encoded using a specialized formalism: refersToMemberOf( 0-16, com.ibm.hutt.Person) Thomas Gradgrind (Person) is the owner of Gradgrind Foods. Final results (extracted knowledge) is represented in KIF. May want to use KIF for intermediate results too.

Proof Markup Language (Simplified)
<rdf:RDF xmlns:iw= xmlns:rdf= xmlns:owl=" <iw:MethodRule rdf:about=" <iw:InferenceEngine rdf:about=" <iw:NodeSet rdf:about=" <iw:Language rdf:about=" <iw:conclusion>refersToMemberOf(0-16 , com.ibm.hutt.Person)</iw:conclusion> <iw:isConsequentOf rdf:parseType="Collection"> <iw:InferenceStep iw:hasIndex="0"> <iw:hasAntecedent rdf:parseType="Collection"> <iw:NodeSet rdf:about=" </iw:hasAntecedent> </iw:InferenceStep> </iw:isConsequentOf> </iw:NodeSet> </rdf:RDF> Thomas Gradgrind (Person) is the owner of Gradgrind Foods. IBM EAnnotator Entity Recognition

Accessing Traces in the Semantic Analysis Workbench

Future Work: Taxonomy Extensions
Other primitive text extraction tasks e.g., assigning canonical form and variant forms to entities Primitive extraction tasks from other modalities audio, images, video, etc. Non-primitive extraction tasks e.g., aggregating entity identification, entity classification, etc. into coreference resolution More abstract explanations at a non-primitive level, with ability to drill-down to primitive tasks

Future Work: Other Metacognition Applications
Determining trust for extracted knowledge Inference Web trace connects conclusions to sources via inference steps. Trust for a conclusion depends on trust of the sources and trust of the inference steps. Checking consistency e.g., rejecting entity recognition over a span that does not exist Automatically selecting extraction components Based on information about the component in the IW Registry Based on past IW traces (CBR)

Encoding Extraction as Inferences

Similar presentations

Presentation on theme: "Encoding Extraction as Inferences"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Encoding Extraction as Inferences

Similar presentations

Presentation on theme: "Encoding Extraction as Inferences"— Presentation transcript:

Similar presentations

About project

Feedback