Download presentation
Presentation is loading. Please wait.
1
PAE-DIRT Paraphrase Acquisition Enhancing DIRT
David Hall Michael Chung
2
Motivation DIRT: Discovery of Inference Rules from Text
Extracts paraphrases using dependency links: Desperate, Bush asked Congress if it would... Desperate, Bush inquired of Congress if it would... (“asked” and “inquired of” are similar!) Unfortunately, some things aren't so great: Angry, Bush told Congress it would... “told” = “inquired of”? DIRT would think so. Use context clues to determine similar sentences Paraphrases!
3
Dependency Link Overlap Model
Sentences parsed with Minipar – sets of dependency links (triples with path length one) are extracted. Ex. (produce, obj, evidence) Sentence similarity score based on percentage of shared dependency links. Best metric:
4
Bag of Words Models Word overlap:
Use IDF scores to determine important words See how much information two sentences share. ngram Overlap Use a BLEU score style metric to calculate the number of ngrams shared. (Use IDF scores!) LSA + word overlap: Use word-to-word similarity scores to help you find phrase-to-phrase similarity.
5
Results Threshold Precision Recall F-Measure Unigram-IDF 0.24 0.68
0.74 0.7 Bigram 0.20 0.71 0.80 0.75 Trigram 0.16 0.81 0.77 Tetragram 0.12 0.79 0.76 Infomap 0.34 0.35 Infomap-IDF 0.69 0.28 0.47 Dependency-penalize 0.19 0.95 0.53 Dependency-bigger 0.63 Dependency-smaller 0.40 0.88 0.65
6
Conclusion Trigram Overlap Metric has best performance with F-Measure 0.77 Future Work: Combine Metrics ( Filter with a high recall/high speed metric first, then re-calculate with dependency metric ) Try using the various metrics with DIRT.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.