1 Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido Dagan TAC November 2011, NIST, Gaithersburg, Maryland, USA Download at:

2 RTE Classify a (T,H) pair as ENTAILING or NON-ENTAILING 2 T: The boy was located by the police. H: Eventually, the police found the child. Example

3 Matching vs. Transformations Matching Sequence of transformations (A proof) – Tree-Edits Complete proofs Estimate confidence – Knowledge based Entailment Rules Linguistically motivated Formalize many types of knowledge 3 T = T 0 → T 1 → T 2 →... → T n = H

4 Transformation based RTE - Example T = T 0 → T 1 → T 2 →... → T n = H Text: The boy was located by the police. Hypothesis: Eventually, the police found the child. 4

5 Transformation based RTE - Example T = T 0 → T 1 → T 2 →... → T n = H Text: The boy was located by the police. The police located the boy. The police found the boy. The police found the child. Hypothesis: Eventually, the police found the child. 5

6 Transformation based RTE - Example T = T 0 → T 1 → T 2 →... → T n = H 6

7 BIUTEE Goals Tree Edits 1.Complete proofs 2.Estimate confidence Entailment Rules 3.Linguistically motivated 4.Formalize many types of knowledge BIUTEE Integrates the benefits of both worlds 7

8 Challenges / System Components 1.generate linguistically motivated complete proofs? 2.estimate proof confidence? 3.find the best proof? 4.learn the model parameters? How to 8

9 1. Generate linguistically motivated complete proofs 9

10 Entailment Rules boy child Generic Syntactic Lexical Syntactic Lexical Bar-Haim et al. 2007. Semantic inference at the lexical-syntactic level.

11 Extended Tree Edits (On The Fly Operations) Predefined custom tree edits – Insert node on the fly – Move node / move sub-tree on the fly – Flip part of speech – … Heuristically capture linguistic phenomena – Operation definition – Features definition 11

12 Proof over Parse Trees - Example T = T 0 → T 1 → T 2 →... → T n = H Text: The boy was located by the police. Passive to active The police located the boy. X locate Y  X find Y The police found the boy. Boy  child The police found the child. Insertion on the fly Hypothesis: Eventually, the police found the child. 12

13 2. Estimate proof confidence 13

14 Cost based Model Define operation cost – Assesses operation’s validity – Represent each operation as a feature vector – Cost is linear combination of feature values Define proof cost as the sum of the operations’ costs Classify: entailment if and only if proof cost is smaller than a threshold 14

15 Feature vector representation Define operation cost – Represent each operation as a feature vector Features (Insert-Named-Entity, Insert-Verb, …, WordNet, Lin, DIRT, …) The police located the boy. DIRT: X locate Y  X find Y (score = 0.9) The police found the boy. (0,0,…,0.457,…,0)(0,0,…,0,…,0) Feature vector that represents the operation 15 An operation A downward function of score

16 Cost based Model Define operation cost –Cost is linear combination of feature values Cost = weight-vector * feature-vector Weight-vector is learned automatically 16

17 Confidence Model Define operation cost – Represent each operation as a feature vector Define proof cost as the sum of the operations’ costs Cost of proof Weight vector Vector represents the proof. Define

18 Feature vector representation - example T = T 0 → T 1 → T 2 →... → T n = H (0,0,……………….………..,1,0) (0,0,………..……0.457,..,0,0) (0,0,..…0.5,.……….……..,0,0) (0,0,1,……..…….…..…....,0,0) (0,0,1..0.5..…0.457,....…1,0) + + + = 18 Text: The boy was located by the police. Passive to active The police located the boy. X locate Y  X find Y The police found the boy. Boy  child The police found the child. Insertion on the fly Hypothesis: Eventually, the police found the child.

19 Cost based Model Define operation cost – Represent each operation as a feature vector Define proof cost as the sum of the operations’ costs Classify: “entailing” if and only if proof cost is smaller than a threshold 19 Learn

20 3. Find the best proof 20

21 Search the best proof 21 T H Proof #1 Proof #2 Proof #3 Proof #4

22 Search the best proof 22 Need to find the “best” proof “Best Proof” = proof with lowest cost ‒Assuming a weight vector is given Search space is exponential ‒AI style search algorithm Proof #1 Proof #2 Proof #3 Proof #4 T  H Proof #1 Proof #2 Proof #3 Proof #4 T  H

23 4. Learn model parameters 23

24 Learning Goal: Learn parameters (w, b) Use a linear learning algorithm – logistic regression, SVM, etc. 24

25 25 Inference vs. Learning Training samples Vector representation Learning algorithm w,b Best Proofs Feature extraction

26 26 Inference vs. Learning Training samples Vector representation Learning algorithm w,b Best Proofs Feature extraction

27 27 Iterative Learning Scheme Training samples Vector representation Learning algorithm w,b Best Proofs 1. W=reasonable guess 2. Find the best proofs 3. Learn new w and b 4. Repeat to step 2

28 Summary- System Components 1.Generate syntactically motivated complete proofs? – Entailment rules – On the fly operations (Extended Tree Edit Operations) 2.Estimate proof validity? – Confidence Model 3.Find the best proof? – Search Algorithm 4.Learn the model parameters? – Iterative Learning Scheme How to 28

29 Results RTE7 29 IDKnowledge ResourcesPrecision % Recall %F1 % BIU1WordNet, Directional Similarity38.9747.4042.77 BIU2WordNet, Directional Similarity, Wikipedia41.8144.1142.93 BIU3WordNet, Directional Similarity, Wikipedia, FrameNet, Geographical database 39.2645.9542.34 BIUTEE 2011 on RTE 6 (F1 %) Base line (Use IR top-5 relevance)34.63 Median (September 2010)36.14 Best (September 2010)48.01 Our system49.54

30 Conclusions Inference via sequence of transformations – Knowledge – Extended Tree Edits Proof confidence estimation Results – Better than median on RTE7 – Best on RTE6 Open Source 30

31 Thank You

