Toward Dependency Path based Entailment Rodney Nielsen, Wayne Ward, and James Martin
Why Entailment Intelligent Tutoring Systems Student Interaction Analysis Are all aspects of the student’s answer entailed by the text and the gold standard answer Are all aspects of the desired answer entailed by the student’s response
Dependency Path-based Entailment DIRT (Lin and Pantel, 2001) Unsupervised method to discover inference rules “X is author of Y ≈ X wrote Y” “X solved Y ≈ X found a solution to Y” Based on Harris’ Distributional Hypothesis words occurring in the same contexts tend to be similar If two dependency paths tend to link the same sets of words, they hypothesize that their meanings are similar
ML Classification Approach Features derived from corpus statistics Unigram co-occurrence Surface form bigram co-occurrence Dependency-derived bigram co-occurrence Mixture of experts: About 18 ML classifiers from Weka toolkit Classify by majority vote or average probability Bag of WordsGraph Matching Dependency Path Based Entailment
Corpora 7.4M articles, 2.5B words, 347 words/doc Gigaword (Graff, 2003) – 77% of documents Reuters Corpus (Lewis et al., 2004) TIPSTER Lucene IR engine Two indices Word surface form Porter stem filter Stop words = {a, an, the}
Word Alignment Features Unigram word alignment
Core Features Core Repeated Features Product of Probabilities Average of Probabilities Geometric Mean of Probabilities Worst Non-Zero Probability Entailing Ngrams for the Lowest Non-Zero Probability Largest Entailing Ngram Count with a Zero Probability Smallest Entailing Ngram Count with a Non-Zero Probability Count of Ngrams in h that do not Co-occur with any Ngrams from t Count of Ngrams in h that do Co-occur with Ngrams in t
Word Alignment Features Bigram word alignment Example: Newspapers choke on rising paper costs and falling revenue. The cost of paper is rising. MLE(cost, t) = n cost of, costs of /n costs of = 6086/35800 = 0.17
Dependency Features Dependency bigram features Hypothesis hText t rising costis Theof paper choke Newspaperson costs and falling risingpaperrevenues
Dependency Features Hypothesis hText t rising costis Theof paper choke Newspaperson costs and falling risingpaperrevenues Descendent relation statistics
Dependency Features Hypothesis hText t rising costis The of paper choke Newspaperson costs and falling risingpaperrevenues Descendent relation statistics
Dependency Features Hypothesis hText t rising cost is Theof paper choke Newspaperson costs and falling risingpaperrevenues Descendent relation statistics
Dependency Features Hypothesis hText t rising costis Theof paper choke Newspaperson costs and falling risingpaperrevenues Descendent relation statistics
Verb Dependency Features Hypothesis hText t rising costis Theof paper choke Newspaperson costs and falling risingpaperrevenues Combined verb descendent relation features Worst verb descendent relation features
Subject Dependency Features Combined and worst subject descendent relations Combined and worst subject-to-verb paths Hypothesis hText t rising cost is Theof paper choke Newspaperson costs and falling risingpaperrevenues
Other Dependency Features Repeat these same features for: Object pcomp-n Other descendent relations
Results RTE2 by Task:IEIRQASUMOverall Accuracy Average Precision RTE2 AccuracySUMNonSUMOverall Test Set Training Set CV RTE1 AccuracyCDNonCDOverall Test Set ( Best submission )83.3 (83.3)56.8 (52.8)61.8 (58.6) Training Set CV
Feature Analysis All feature sets are contributing according to cross validation on the training set Most significant feature set: Unigram stem based word alignment Most significant core repeated feature: Average Probability
Conclusions While our current dependency path features are only a step in the direction of our proposed inference system, they provided a significant improvement over the best results from the first PASCAL Recognizing Textual Entailment challenge (RTE1) Our system (after fixing a couple of bugs) ranked 6 th in accuracy and 4 th in average precision out of 23 entrants at this year’s RTE2 challenge We believe our proposed system will provide an effective foundation for the detailed assessment of students’ responses to an intelligent tutor
Questions Mixture of experts classifier using corpus co-occurrence statistics Moving in the direction of DIRT Domain of Interest: Student response analysis in intelligent tutoring systems RTE2 Task:IEIRQASUMAll Accuracy Average Precision Bag of WordsGraph Matching Dependency Path Based Entailment Hypothesis h RTE2 AccuracySUMNonSUMOverall Test Set Training Set CV Text t rising costis Theof paper choke Newspaperson costs and falling risingpaperrevenues RTE1 AccuracyCDNonCDOverall Test Set (Best Subm) 83.3 (83.3)56.8 (52.8)61.8 (58.6) Training Set CV