Syntactic Contributions in the Entailment Task Lucy Vanderwende, Arul Menezes, Rion Snow (Stanford)
RTE-1 analysis Recap of MSR’s manual analysis of RTE-1 test data; in principle, 74% is achievable using syntax and thesaurus Without thesaurus Using thesaurus True69 (9%)147 (18%) False197 (25%)243 (30%) Not syntax534 (67%)410 (51%)
RTE-1 analysis Recap of MSR’s manual analysis of RTE-1 test data; in principle, 74% is achievable using syntax and thesaurus Without thesaurus Using thesaurus True69 (9%)147 (18%) False197 (25%)243 (30%) Not syntax534 (67%)410 (51%)
MENT algorithm Predicting negative entailment using syntactic features: Obtain syntactic dependency graphs for T and H sentences Attempt to align each H node to a node in T Check syntactic heuristics on aligned nodes if match, then predict false If no match, use lexical similarity model (with threshold)
MENT: heuristic alignment
MENT: superlative heuristic Superlative heuristic (100% accurate, 5 test items): –If the superlatives align, and their heads are aligned, and the head in Text has any additional modifiers, and those modifiers are aligned to some modifier in H, say yes, else say no. (RTE2-test- #477) Crater Lake is the deepest lake in the United States, the second deepest in the Western Hemisphere, and the seventh deepest in the world, dropping downward to 1,932 feet just southeast of Merriam Cone. Crater Lake is the deepest lake in the world.
MENT: superlative heuristic Superlative heuristic (100% accurate, 5 test items): –If the superlatives align, and their heads are aligned, and the head in Text has any additional modifiers, and those modifiers are aligned to some modifier in H, say yes, else say no. (RTE2-test- #477) Crater Lake is the deepest lake in the United States, the second deepest in the Western Hemisphere, and the seventh deepest in the world, dropping downward to 1,932 feet just southeast of Merriam Cone. Crater Lake is the deepest lake in the world.
MENT: superlative heuristic Superlative heuristic (100% accurate, 5 test items): –If the superlatives align, and their heads are aligned, and the head in Text has any additional modifiers, and those modifiers are aligned to some modifier in H, say yes, else say no. (RTE2-test- #477) Crater Lake is the deepest lake in the United States, the second deepest in the Western Hemisphere, and the seventh deepest in the world, dropping downward to 1,932 feet just southeast of Merriam Cone. Crater Lake is the deepest lake in the world.
MENT: superlative heuristic Superlative heuristic (100% accurate, 5 test items): –If the superlatives align, and their heads are aligned, and the head in Text has any additional modifiers, and those modifiers are aligned to some modifier in H, say yes, else say no. (RTE2-test- #477) Crater Lake is the deepest lake in the United States, the second deepest in the Western Hemisphere, and the seventh deepest lake in the world, dropping downward to 1,932 feet just southeast of Merriam Cone. Crater Lake is the deepest lake in the world.
Counterfactual heuristic (80% accurate, 15 test items): –If there is a pair of aligned nodes, and a second pair of aligned nodes, and the PATH in the dependency contains a conditional or counterfactual, say no. (RTE2-test- #473) Blondlot was trying to polarize X-rays when he claimed to have discovered this new form of radiation. Blondlot discovered x-rays. MENT: Counterfactual heuristic
Counterfactual heuristic (80% accurate, 15 test items): –If there is a pair of aligned nodes, and a second pair of aligned nodes, and the PATH in the dependency contains a conditional or counterfactual, say no. (RTE2-test- #473) Blondlot was trying to polarize X-rays when he claimed to have discovered this new form of radiation. Blondlot discovered x-rays. MENT: Counterfactual heuristic
MENT: example of alignment heuristics Unaligned entity (64.49% accuracy, 13.38% of the test items): If a node in H is an entity, but is not aligned to any node in T, say no (RTE2-test #781) Former European Ryder Cup winning captain Sam Torrance says Welshman Ian Woosnam is the right man to lead Europe at the 2006 match in Ireland. Torrance told BBC Sport: "I think Ian Woosnam should get it" (the 2006 captaincy).
MENT: training feature weights “run2”: treating a syntactic heuristic match as a yes/no vote, alignment threshold set using training data “run1”: learning weights (using MaxEnt) for each syntactic and alignment heuristic, as well as for sub-components of these heuristics
MENT: results Run1 (with feature weights) Run2 Training (1717 sents) Dev (450 sents) RTE2 test (800 sents) RUN1 TRUTHYesNo Yes No MENT Run1 says no 43.25% of the time
MENT variations – no thresholds If heuristics apply, say no Else say yes 56% accurate system says no 35% Say no, unless everything is aligned and no heuristics apply 59.25% accurate system says no 74.5% SYSTEM TRUTHYesNo Yes No SYSTEM TRUTHYesNo Yes No65335 ** Note: Run2 = if no heuristics apply, and alignment score is above a threshold trained on the training set, then say yes, else no. Accuracy: 58.50
MENT variations – with threshold With learned alignment and syntactic heuristic weights, with alignment threshold from training, say no Else say yes 60.25% accurate System says no 43% of the time Say no, unless alignment score is above an Oracle threshold and no heuristics apply 61.25% accurate System says no 70% of the time SYSTEM TRUTHYesNo Yes No75325 RUN1 TRUTHYesNo Yes No186214
Lessons? Use syntactic heuristics and sub-components as features and apply discriminative training Thresholding for lexical similarity isn’t stable across data sets Error Analysis …
bad parses (e.g., rte2 test #550)
How far do you take syntactic heuristics? Location : for a pair of aligned verb nodes, if there is an argument in H, and that argument is aligned to a node in T, say no if that node is not also the same argument of the aligned verb (applied 7 times, 5 incorrect) Brandenburg Gate is one of Berlin's best known landmarks and is now regarded as one of the greatest symbols of German unity. Brandenburg Gate is in Berlin.
A great heuristic …but Unaligned Verb: if there is an aligned subject and an aligned object, then if their verb is not aligned, say no This heuristic was not used because of its poor performance, for example: –Rodriguez told detectives he never touched the burning backpack, which was loaded with plastic pipes packed with gunpowder and BBs. –The burning backpack contained plastic pipes packed with gunpowder and BBs. Need to learn paraphrase similarity for verbs – see NAACL-HLT paper forthcoming.
Directions and Plans MSR submission available at Might it be possible to have access to all sites’ submissions? Need to learn paraphrase similarity for verbs More feature engineering Different graph-matching strategies to avoid brittleness of syntactic heuristics Find more data for training to build more stable systems
A plug for Pyramids Conservatives oppose any form of devolution. The conservatives are opposed to devolution. The UK’s Tory Prime Minister adamantly resisted calls for devolution of British rule. Scotts want self-rule … as buoyed as most Scotts by North Ireland’s prospective self-rule Wales is following Scotland, and moving towards a call for an elected assembly with devolved powers … A self-governing Wales would be part of the EU … an independent Wales within the European community … Wales could participate directly in forthcoming EC meetings … … a fully self-governing Wales within the European Community.
A plug for Pyramids Conservatives oppose any form of devolution. The conservatives are opposed to devolution. The UK’s Tory Prime Minister adamantly resisted calls for devolution of British rule. Scotts want self-rule … as buoyed as most Scotts by North Ireland’s prospective self-rule Wales is following Scotland, and moving towards a call for an elected assembly with devolved powers … A self-governing Wales would be part of the EU … an independent Wales within the European community … Wales could participate directly in forthcoming EC meetings … … a fully self-governing Wales within the European Community. SCU name, given by annotator Candidate hypothesis? Candidate Text?