Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern and Ido Dagan (earlier partial version by Roy Bar-Haim) Download at:

Slides:

Advertisements

Similar presentations

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.

Advertisements

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.

Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

Recognizing Textual Entailment Challenge PASCAL Suleiman BaniHani.

Max-Margin Matching for Semantic Role Labeling David Vickrey James Connor Daphne Koller Stanford University.

Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.

LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.

Global Learning of Type Entailment Rules Jonathan Berant, Ido Dagan, Jacob Goldberger June 21 st, 2011.

Robust Textual Inference via Graph Matching Aria Haghighi Andrew Ng Christopher Manning.

Normalized alignment of dependency trees for detecting textual entailment Erwin Marsi & Emiel Krahmer Tilburg University Wauter Bosma & Mariët Theune University.

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Logic in general Logics are formal languages for representing information such that conclusions can be drawn Syntax defines the sentences in the language.

Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.

Knowledge Acquisitioning. Definition The transfer and transformation of potential problem solving expertise from some knowledge source to a program.

July 9, 2003ACL An Improved Pattern Model for Automatic IE Pattern Acquisition Kiyoshi Sudo Satoshi Sekine Ralph Grishman New York University.

Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.

Using Maximal Embedded Subtrees for Textual Entailment Recognition Sophia Katrenko & Pieter Adriaans Adaptive Information Disclosure project Human Computer.

Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University

Presented by Zeehasham Rasheed

Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)

Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun

Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.

A Confidence Model for Syntactically-Motivated Entailment Proofs Asher Stern & Ido Dagan ISCOL June 2011, Israel 1.

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

Outline P1EDA’s simple features currently implemented –And their ablation test Features we have reviewed from Literature –(Let’s briefly visit them) –Iftene’s.

Overview of the Fourth Recognising Textual Entailment Challenge NIST-Nov. 17, 2008TAC Danilo Giampiccolo (coordinator, CELCT) Hoa Trang Dan (NIST)

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Ontology Learning from Text: A Survey of Methods Source: LDV Forum,Volume 20, Number 2, 2005 Authors: Chris Biemann Reporter:Yong-Xiang Chen.

Issues with Data Mining

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Chapter 14: Artificial Intelligence Invitation to Computer Science, C++ Version, Third Edition.

Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido.

Empirical Explorations with The Logical Theory Machine: A Case Study in Heuristics by Allen Newell, J. C. Shaw, & H. A. Simon by Allen Newell, J. C. Shaw,

Querying Structured Text in an XML Database By Xuemei Luo.

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

Today Ensemble Methods. Recap of the course. Classifier Fusion

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Formal Specification of Intrusion Signatures and Detection Rules By Jean-Philippe Pouzol and Mireille Ducassé 15 th IEEE Computer Security Foundations.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.

Joseph Xu Soar Workshop Learning Modal Continuous Models.

Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

© Copyright 2008 STI INNSBRUCK Intelligent Systems Propositional Logic.

Toward an Open Source Textual Entailment Platform (Excitement Project) Bernardo Magnini (on behalf of the Excitement consortium) 1 STS workshop, NYC March.

Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.

Towards Entailment Based Question Answering: ITC-irst at Clef 2006 Milen Kouylekov, Matteo Negri, Bernardo Magnini & Bonaventura Coppola ITC-irst, Centro.

Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.

Data Mining and Decision Support

SALSA-WS 09/05 Approximating Textual Entailment with LFG and FrameNet Frames Aljoscha Burchardt, Anette Frank Computational Linguistics Department Saarland.

Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.

1 Propositional Logic Limits The expressive power of propositional logic is limited. The assumption is that everything can be expressed by simple facts.

Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.

Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.

Logical Agents. Inference : Example 1 How many variables? 3 variables A,B,C How many models? 2 3 = 8 models.

Logical Agents. Outline Knowledge-based agents Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability.

Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :

Recognizing Partial Textual Entailment

Automatic Detection of Causal Relations for Question Answering

Overfitting and Underfitting

CS246: Information Retrieval

Extracting Why Text Segment from Web Based on Grammar-gram

Presentation transcript:

Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern and Ido Dagan (earlier partial version by Roy Bar-Haim) Download at:

Transformation-based Inference Sequence of transformations (A proof)  Tree-Edits Complete proofs – by limited pre-defined set of operations Estimate confidence in each operation  Knowledge based Entailment Rules Arbitrary knowledge-based transformations Formalize many types of knowledge 2 T = T 0 → T 1 → T 2 →... → T n = H

3

Transformation based RTE - Example T = T 0 → T 1 → T 2 →... → T n = H Text: The boy was located by the police. Hypothesis: Eventually, the police found the child. 4

Transformation based RTE - Example T = T 0 → T 1 → T 2 →... → T n = H Text: The boy was located by the police. The police located the boy. The police found the boy. The police found the child. Hypothesis: Eventually, the police found the child. 5

Transformation based RTE - Example T = T 0 → T 1 → T 2 →... → T n = H 6

7 B IU T EE ’s Inference Formalism Analogy to logic proof systems: Parse TreesPropositions Tree transformation/generationInference Steps Sequence of generated trees: T … T i … HProof

BIUTEE Goals Rely on Entailment Rules  Supported by many types of knowledge Tree Edits  Allow complete proofs BIUTEE  Integrates the benefits of both  Estimate confidence of both 8

Challenges / System Components 1. generate linguistically motivated complete proofs? 2. estimate proof confidence? 3. find the best proof? 4. learn the model parameters? How to … 9

1. Generate linguistically motivated complete proofs 10

Knowledge-based Entailment Rules boy child Generic Syntactic Lexical Syntactic Lexical Bar-Haim et al Semantic inference at the lexical-syntactic level.

Extended Tree Edits (On The Fly Operations) Predefined custom tree edits  Insert node on the fly  Move node / move sub-tree on the fly  Flip part of speech  … Heuristically capture linguistic phenomena  Operation definition  Features – to estimate confidence 12

Proof over Parse Trees - Example T = T 0 → T 1 → T 2 →... → T n = H Text: The boy was located by the police. Passive to active The police located the boy. X locate Y  X find Y The police found the boy. Boy  child The police found the child. Tree-edit insertion Hypothesis: Eventually, the police found the child. 13

14 Co-reference Substitution For co-referring subtrees S 1, S 2 :  Copy source tree containing S 1 while replacing it with S 2 My brother is a musician. He plays the drums. be verb ROOT i musician noun brother noun subj my noun gen a det det pred play verb drum noun the det obj ROOT i he noun det subj play verb drum noun the det obj ROOT i det subj My brother plays the drums. brother noun my noun gen 

2. Estimate proof confidence 15

Cost based Model (Variant of Raina et al., 2005) Define operation cost  Represent each operation as a feature vector  Cost is linear combination of feature values Define proof cost as the sum of the operations’ costs Classify: entailment if and only if proof cost is lower than a threshold 16

Feature vector representation Define operation feature value  Represent each operation as a feature vector Features (Insert-Named-Entity, Insert-Verb, …, WordNet, Lin, DIRT, …) The police located the boy. DIRT: X locate Y  X find Y (score = 0.9) The police found the boy. (0,0,…,0.257,…,0) (0,0,…,0,…,0) Feature vector that represents the operation 17 An operation A downward function of score

Cost based Model Define operation cost – Cost is standard linear combination of feature values Cost = weight-vector * feature-vector Weight-vector is learned automatically 18

Confidence Model Define operation cost  Represent each operation as a feature vector Define proof cost as the sum of the operations’ costs Cost of proof Weight vector Vector represents the proof. Define

Feature vector representation - example T = T 0 → T 1 → T 2 →... → T n = H (0,0,……………….……..,1,0) (0,0,………..……0.457,..,0,0) (0,0,..…0.5,.……….……..,0,0) (0,0,1,……..…….…..…....,0,0) (0,0, …0.457,....…1,0) = 20 Text: The boy was located by the police. Passive to active The police located the boy. X locate Y  X find Y The police found the boy. Boy  child The police found the child. Insertion on the fly Hypothesis: Eventually, the police found the child.

Cost based Model Define operation cost  Represent each operation as a feature vector Define proof cost as the sum of the operations’ costs Classify: “entailing” if and only if proof cost is smaller than a threshold 21 Learn

3. Find the best proof 22

Search the best proof 23 Proof #1 Proof #2 Proof #3 Proof #4 T - H

Search the best proof 24 Need to consider the “best” proof for the positive pairs “Best Proof” = proof with lowest cost ‒ Assuming a weight vector is given Search space exponential – AI-style search (ACL-12) ‒ Gradient-based evaluation function ‒ Local lookahead for “complex” operations Proof #1 Proof #2 Proof #3 Proof #4 T  H Proof #1 Proof #2 Proof #3 Proof #4 T  H

4. Learn model parameters 25

Learning Goal: Learn parameters (w, b) Use a linear learning algorithm  logistic regression 26

27 Inference vs. Learning Training samples Vector representation Learning algorithm w,b Best Proofs Feature extraction

28 Inference vs. Learning Training samples Vector representation Learning algorithm w,b Best Proofs Feature extraction

29 Iterative Learning Scheme Training samples Vector representation Learning algorithm w,b Best Proofs 1. W=reasonable guess 2. Find the best proofs 3. Learn new w and b 4. Repeat to step 2

Summary- System Components 1. Generate syntactically motivated complete proofs?  Entailment rules  On the fly operations (Extended Tree Edit Operations) 2. Estimate proof validity?  Confidence Model 3. Find the best proof?  Novel search Algorithm 4. Learn the model parameters?  Iterative Learning Scheme How to 30

Results RTE SystemRTE-1RTE-2RTE-3RTE-5 Raina et al Harmeling, Wang and Manning, Bar-Haim et al., Mehdad and Magnini, Our System Text: Hypothesis: Text: Hypothesis: Evaluation by accuracy – comparison with transformation-based systems

Results RTE 6 32 RTE 6 (F1%) Base line (Use IR top-5 relevance)34.63 Median (2010)36.14 Best (2010)48.01 Our system49.54 Natural distribution of entailments Evaluation by Recall / Precision / F1

Hybrid mode: Transformation and Similarity 33

Hybrid Mode Recall: Entailment rules are insufficient in generating a complete proof of H from T. The problem:  Incomplete proofs are insufficient for entailment decision, neither are they comparable with each other. Learning & Inference cannot be performed. Solutions:  First solution – complete proof: incorporate (less reliable) on-the-fly transformations, to “force” a proof, even if it’s incorrect  Second solution: hybrid mode 34

Hybrid Mode Generate partial proofs, using only the more reliable transformations Jointly estimate cost of transformations and remaining gap Note: feature vectors of incomplete proofs + remaining gaps are comparable! 35 Features(WordNet, Lin, DIRT, …, Missing predicate, predicate mismatch …) Features (WordNet, Lin, DIRT, …, Insert-Named-Entity, Insert-Verb, … ) Feature vector in pure mode Feature vector in hybrid mode

Hybrid Mode: feature categories Lexical features 1. H’s words that are missing in T predicate-argument features 1. Argument in H missing from T 2. Same argument is connected to a different predicate in T and H 3. Argument partial match (e.g. “boy” vs. “little boy”) 4. Predicate in H missing from T 36

Hybrid Mode: discussion Hybrid mode pros.  Clear and explainable proofs Hybrid mode cons.  Cannot chain on-the-fly transformations required to subsequently apply entailment rules.  Abstraction of T and H, so some gaps are not detected. 37

Hybrid Mode: example T: The death of Allama Hassan Turabi is likely to raise tensions …muslim … He was the leader of a Shia party, Islami Tehreek Pakistan … H: Allama Hassan Turabi was a member of the Pakistani Shia Muslim in Karachi. Pure mode  Co-reference: He  Turabi  DIRT: X is leader of Y  X is member of Y  5 insertions (“a”, “the”, “pakistani”, “in” “karachi”)  One move (“muslim” under “member”) Hybrid mode:  Co-reference and DIRT as in pure mode  Gap detected: “Pakistani Shia Muslim” is not connected to the predicate “member” 38

Hybrid Mode: results Accuracy %F1 % RTE-1RTE-2RTE-3RTE-5RTE-6RTE-7 Pure mode Hybrid mode Run with the 6 resources that were most commonly applied. Takeouts  Pure mode usually performs better that hybrid mode  However, hybrid mode proofs are clean and explainable, in contrast to pure mode. Deserve further exploration, possibly in conjunction with some on-the-fly operations.

Conclusions – The BIUTEE Inference Engine Inference as proof over parse trees  Natural to incorporate many inference types Results - close to best or best on RTEs Open Source  Configurable  Extensible  Visual tracing  Support 40

Adding extra-linguistic inferences Some tasks may benefit from extra-linguistic “expert” inferences  Temporal / arithmetic / spatial reasoning / … 2 soldiers and a civilian => 3 people Need to integrate with primary inference over language structures  “Expert” may detect on the fly inferences that would bridge text and hypothesis,  Interleaved within tree-generation process 41

Slide from Inderjeet Mani 42