A Confidence Model for Syntactically-Motivated Entailment Proofs Asher Stern & Ido Dagan ISCOL June 2011, Israel 1
Recognizing Textual Entailment (RTE) Given a text, T, and a hypothesis, H Does T entail H 2 T: An explosion caused by gas took place at a Taba hotel H: A blast occurred at a hotel in Taba. Example
Proof Over Parse Trees 3 T = T 0 → T 1 → T 2 →... → T n = H
Bar Ilan Proof System - Entailment Rules 4 explosion blast Generic Syntactic Lexical Syntactic Lexical
Bar Ilan Proof System 5 H: A blast occurred at a hotel in Taba. LexicalLexical syntacticSyntactic An explosion caused by gas took place at a Taba hotel A blast caused by gas took place at a Taba hotel A blast took place at a Taba hotel A blast occurred at a Taba hotel A blast occurred at a hotel in Taba.
Tree-Edit-Distance 6 Insurgents attacked soldiers -> Soldiers were attacked by insurgents
Proof over parse trees Which steps? Tree-Edits – Regular or custom Entailment Rules How to classify? Decide “yes” if and only if a proof was found – Almost always “no” – Cannot handle knowledge inaccuracies Estimate a confidence to the proof correctness 7
Proof systems TED based Estimate the cost of a proof Complete proofs Arbitrary operations Limited knowledge Entailment Rules based Linguistically motivated Rich knowledge No estimation of proof correctness Incomplete proofs – Mixed system with ad-hoc approximate match criteria 8 Our System The benefits of both worlds, and more! – Linguistically motivated complete proofs – Confidence model
Our Method 1.Complete proofs – On the fly operations 2.Cost model 3.Learning model parameters 9
On the fly Operations “On the fly” operations – Insert node on the fly – Move node / move sub-tree on the fly – Flip part of speech – Etc. More syntactically motivated than Tree Edits Not justified, but: Their impact on the proof correctness can be estimated by the cost model. 10
Cost Model 11 The Idea: 1.Represent the proof as a feature-vector 2.Use the vector in a learning algorithm
Cost Model Represent a proof as F (P) = (F 1, F 2 … F D ) Define weight vector w=(w 1,w 2,…,w D ) Define proof cost Classify a proof – b is a threshold Learn the parameters (w,b) 12
Search Algorithm 13 Need to find the “best” proof “Best Proof” = proof with lowest cost ‒Assuming a weight vector is given Search space is exponential ‒pruning
Parameter Estimation Goal: find good weight vector and threshold (w,b) Use a standard machine learning algorithm (logistic regression or linear SVM) But: Training samples are not given as feature vectors – Learning algorithm requires training samples – Training samples construction requires weight vector – Learning weight vector done by learning algorithm Iterative learning 14
Parameter Estimation 15 Weight Vector Training Samples Learning Algorithm
Parameter Estimation 1.Start with w 0, a reasonable guess for weight vector 2.i=0 3.Repeat until convergence 1.Find the best proofs and construct vectors, using w i 2.Use a linear ML algorithm to find a new weight vector, w i+1 3.i = i+1 16
Results 17 SystemRTE-1RTE-2RTE-3RTE-5 Logical Resolution Refutation (Raina et al. 2005) 57.0 Probabilistic Calculus of Tree Transformations (Harmeling, 2009) Probabilistic Tree Edit model (Wang and Manning, 2010) Deterministic Entailment Proofs (Bar-Haim et al., 2007) Our System OperationAvg. in positives Avg. in negatives Ratio Insert Named Entity Insert Content Word DIRT Change “subject” to “object” and vice versa Flip Part-of-speech Lin similarity WordNet
Conclusions 1.Linguistically motivated proofs – Complete proofs 2.Cost model – Estimation of proof correctness 3.Search best proof 4.Learning parameters 5.Results – Reasonable behavior of learning scheme 18
Thank you Q & A 19