Presentation is loading. Please wait.

Presentation is loading. Please wait.

Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern and Ido Dagan (earlier partial version by Roy Bar-Haim) Download at:

Similar presentations


Presentation on theme: "Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern and Ido Dagan (earlier partial version by Roy Bar-Haim) Download at:"— Presentation transcript:

1 Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern and Ido Dagan (earlier partial version by Roy Bar-Haim) Download at: http://www.cs.biu.ac.il/~nlp/downloads/biutee

2 Transformation-based Inference Sequence of transformations (A proof)  Tree-Edits Complete proofs – by limited pre-defined set of operations Estimate confidence in each operation  Knowledge based Entailment Rules Arbitrary knowledge-based transformations Formalize many types of knowledge 2 T = T 0 → T 1 → T 2 →... → T n = H

3 3

4 Transformation based RTE - Example T = T 0 → T 1 → T 2 →... → T n = H Text: The boy was located by the police. Hypothesis: Eventually, the police found the child. 4

5 Transformation based RTE - Example T = T 0 → T 1 → T 2 →... → T n = H Text: The boy was located by the police. The police located the boy. The police found the boy. The police found the child. Hypothesis: Eventually, the police found the child. 5

6 Transformation based RTE - Example T = T 0 → T 1 → T 2 →... → T n = H 6

7 7 B IU T EE ’s Inference Formalism Analogy to logic proof systems: Parse TreesPropositions Tree transformation/generationInference Steps Sequence of generated trees: T … T i … HProof

8 BIUTEE Goals Rely on Entailment Rules  Supported by many types of knowledge Tree Edits  Allow complete proofs BIUTEE  Integrates the benefits of both  Estimate confidence of both 8

9 Challenges / System Components 1. generate linguistically motivated complete proofs? 2. estimate proof confidence? 3. find the best proof? 4. learn the model parameters? How to … 9

10 1. Generate linguistically motivated complete proofs 10

11 Knowledge-based Entailment Rules boy child Generic Syntactic Lexical Syntactic Lexical Bar-Haim et al. 2007. Semantic inference at the lexical-syntactic level.

12 Extended Tree Edits (On The Fly Operations) Predefined custom tree edits  Insert node on the fly  Move node / move sub-tree on the fly  Flip part of speech  … Heuristically capture linguistic phenomena  Operation definition  Features – to estimate confidence 12

13 Proof over Parse Trees - Example T = T 0 → T 1 → T 2 →... → T n = H Text: The boy was located by the police. Passive to active The police located the boy. X locate Y  X find Y The police found the boy. Boy  child The police found the child. Tree-edit insertion Hypothesis: Eventually, the police found the child. 13

14 14 Co-reference Substitution For co-referring subtrees S 1, S 2 :  Copy source tree containing S 1 while replacing it with S 2 My brother is a musician. He plays the drums. be verb ROOT i musician noun brother noun subj my noun gen a det det pred play verb drum noun the det obj ROOT i he noun det subj play verb drum noun the det obj ROOT i det subj My brother plays the drums. brother noun my noun gen 

15 2. Estimate proof confidence 15

16 Cost based Model (Variant of Raina et al., 2005) Define operation cost  Represent each operation as a feature vector  Cost is linear combination of feature values Define proof cost as the sum of the operations’ costs Classify: entailment if and only if proof cost is lower than a threshold 16

17 Feature vector representation Define operation feature value  Represent each operation as a feature vector Features (Insert-Named-Entity, Insert-Verb, …, WordNet, Lin, DIRT, …) The police located the boy. DIRT: X locate Y  X find Y (score = 0.9) The police found the boy. (0,0,…,0.257,…,0) (0,0,…,0,…,0) Feature vector that represents the operation 17 An operation A downward function of score

18 Cost based Model Define operation cost – Cost is standard linear combination of feature values Cost = weight-vector * feature-vector Weight-vector is learned automatically 18

19 Confidence Model Define operation cost  Represent each operation as a feature vector Define proof cost as the sum of the operations’ costs Cost of proof Weight vector Vector represents the proof. Define

20 Feature vector representation - example T = T 0 → T 1 → T 2 →... → T n = H (0,0,……………….……..,1,0) (0,0,………..……0.457,..,0,0) (0,0,..…0.5,.……….……..,0,0) (0,0,1,……..…….…..…....,0,0) (0,0,1..0.5..…0.457,....…1,0) + + + = 20 Text: The boy was located by the police. Passive to active The police located the boy. X locate Y  X find Y The police found the boy. Boy  child The police found the child. Insertion on the fly Hypothesis: Eventually, the police found the child.

21 Cost based Model Define operation cost  Represent each operation as a feature vector Define proof cost as the sum of the operations’ costs Classify: “entailing” if and only if proof cost is smaller than a threshold 21 Learn

22 3. Find the best proof 22

23 Search the best proof 23 Proof #1 Proof #2 Proof #3 Proof #4 T - H

24 Search the best proof 24 Need to consider the “best” proof for the positive pairs “Best Proof” = proof with lowest cost ‒ Assuming a weight vector is given Search space exponential – AI-style search (ACL-12) ‒ Gradient-based evaluation function ‒ Local lookahead for “complex” operations Proof #1 Proof #2 Proof #3 Proof #4 T  H Proof #1 Proof #2 Proof #3 Proof #4 T  H

25 4. Learn model parameters 25

26 Learning Goal: Learn parameters (w, b) Use a linear learning algorithm  logistic regression 26

27 27 Inference vs. Learning Training samples Vector representation Learning algorithm w,b Best Proofs Feature extraction

28 28 Inference vs. Learning Training samples Vector representation Learning algorithm w,b Best Proofs Feature extraction

29 29 Iterative Learning Scheme Training samples Vector representation Learning algorithm w,b Best Proofs 1. W=reasonable guess 2. Find the best proofs 3. Learn new w and b 4. Repeat to step 2

30 Summary- System Components 1. Generate syntactically motivated complete proofs?  Entailment rules  On the fly operations (Extended Tree Edit Operations) 2. Estimate proof validity?  Confidence Model 3. Find the best proof?  Novel search Algorithm 4. Learn the model parameters?  Iterative Learning Scheme How to 30

31 Results RTE 1-5 31 SystemRTE-1RTE-2RTE-3RTE-5 Raina et al. 2005 57.0 Harmeling, 2009 56.3957.88 Wang and Manning, 2010 63.061.10 Bar-Haim et al., 2007 61.1263.80 Mehdad and Magnini, 2009 58.6259.8762.460.2 Our System 57.1361.6367.1363.50 Text: Hypothesis: Text: Hypothesis: Evaluation by accuracy – comparison with transformation-based systems

32 Results RTE 6 32 RTE 6 (F1%) Base line (Use IR top-5 relevance)34.63 Median (2010)36.14 Best (2010)48.01 Our system49.54 Natural distribution of entailments Evaluation by Recall / Precision / F1

33 Hybrid mode: Transformation and Similarity 33

34 Hybrid Mode Recall: Entailment rules are insufficient in generating a complete proof of H from T. The problem:  Incomplete proofs are insufficient for entailment decision, neither are they comparable with each other. Learning & Inference cannot be performed. Solutions:  First solution – complete proof: incorporate (less reliable) on-the-fly transformations, to “force” a proof, even if it’s incorrect  Second solution: hybrid mode 34

35 Hybrid Mode Generate partial proofs, using only the more reliable transformations Jointly estimate cost of transformations and remaining gap Note: feature vectors of incomplete proofs + remaining gaps are comparable! 35 Features(WordNet, Lin, DIRT, …, Missing predicate, predicate mismatch …) Features (WordNet, Lin, DIRT, …, Insert-Named-Entity, Insert-Verb, … ) Feature vector in pure mode Feature vector in hybrid mode

36 Hybrid Mode: feature categories Lexical features 1. H’s words that are missing in T predicate-argument features 1. Argument in H missing from T 2. Same argument is connected to a different predicate in T and H 3. Argument partial match (e.g. “boy” vs. “little boy”) 4. Predicate in H missing from T 36

37 Hybrid Mode: discussion Hybrid mode pros.  Clear and explainable proofs Hybrid mode cons.  Cannot chain on-the-fly transformations required to subsequently apply entailment rules.  Abstraction of T and H, so some gaps are not detected. 37

38 Hybrid Mode: example T: The death of Allama Hassan Turabi is likely to raise tensions …muslim … He was the leader of a Shia party, Islami Tehreek Pakistan … H: Allama Hassan Turabi was a member of the Pakistani Shia Muslim in Karachi. Pure mode  Co-reference: He  Turabi  DIRT: X is leader of Y  X is member of Y  5 insertions (“a”, “the”, “pakistani”, “in” “karachi”)  One move (“muslim” under “member”) Hybrid mode:  Co-reference and DIRT as in pure mode  Gap detected: “Pakistani Shia Muslim” is not connected to the predicate “member” 38

39 Hybrid Mode: results Accuracy %F1 % RTE-1RTE-2RTE-3RTE-5RTE-6RTE-7 Pure mode55.8860.5065.6363.1747.4041.95 Hybrid mode56.7559.6365.8861.3345.4237.45 39 Run with the 6 resources that were most commonly applied. Takeouts  Pure mode usually performs better that hybrid mode  However, hybrid mode proofs are clean and explainable, in contrast to pure mode. Deserve further exploration, possibly in conjunction with some on-the-fly operations.

40 Conclusions – The BIUTEE Inference Engine Inference as proof over parse trees  Natural to incorporate many inference types Results - close to best or best on RTEs Open Source  Configurable  Extensible  Visual tracing  Support 40

41 Adding extra-linguistic inferences Some tasks may benefit from extra-linguistic “expert” inferences  Temporal / arithmetic / spatial reasoning / … 2 soldiers and a civilian => 3 people Need to integrate with primary inference over language structures  “Expert” may detect on the fly inferences that would bridge text and hypothesis,  Interleaved within tree-generation process 41

42 Slide from Inderjeet Mani 42


Download ppt "Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern and Ido Dagan (earlier partial version by Roy Bar-Haim) Download at:"

Similar presentations


Ads by Google