Group 3 Chad Mills Esad Suskic Wee Teck Tan
Outline System and Data Document Retrieval Passage Retrieval Results Conclusion
System and Data DevelopmentTesting TREC 2004 TREC 2005 System: Indri Data:
Document Retrieval Baseline: Remove “?” Add Target String MAP: 0.307
Document Retrieval Attempted Improvement 1: Settings From Baseline Rewrite “When was…” questions as “[target] was [last word] on” queries MAP: Best so far: 0.307
Document Retrieval Attempted Improvement 2: Settings From Baseline Remove “Wh” words Remove Stop Words Replaced Pronoun with Target String MAP: Best so far: “Wh” / Stop WordsWhat, Who, Where, Why, How many, How often, How long, Which, How did, Does, is, the, a, an, of, was, as Pronounhe, she, it, its, they, their, his
Document Retrieval Attempted Improvement 3: Settings From Improvement 2 Index Stemmed (Krovetz Stemmer) MAP: Best so far: 0.319
Document Retrieval Attempted Improvement 4: Settings From Improvement 3 Remove Punctuations Remove Non Alphanumeric Characters MAP: Best so far: 0.336
Document Retrieval Attempted Improvement 5: Settings From Improvement 4 Remove Duplicate Words MAP: Best so far: 0.374
Passage Retrieval Baseline: Out-of-the-box Indri Same Question Formulation Changed “#combine(“ to “#combine[passageX:Y](” Passage Window, Top 20, No Re-ranking X=40 Y=20 Strict0.126 Lenient0.337 X=200 Y=100 Strict0.414 Lenient0.537
Passage Retrieval Attempted Re-ranking Mallet MaxEnt Classifier Training Set TREC 2004 ○ 80% Train : 20% Dev ○ Split by Target ○ Avoid Cheating e.g. Question 1.* all in either Train or Dev Labels: + Passage has Correct Answer - Passage doesn’t have Answer
Passage Retrieval Features used: For both Passage and Question+Target: ○ unigram, bigram, trigram ○ POS tags – unigram, bigram, trigram Question/Passage Correspondence: ○ # of Overlapping Terms (and bigrams) ○ Distance between Overlapping Terms Tried Top 20 Passages from Indri, and Expanding to Top 200 Passages
Passage Retrieval Result: all attempts were worse than before Example confusion matrix: Many negative examples, 67-69% accurate on all feature combinations tried
Indri was very good to start with E.g. Q10.1 Passage Re-Ranking Indri RankHas Answer 1Yes 2No 3Yes 4 5No Our RankHas AnswerP(Yes)P(No)Indri Rank 1No No Yes No Yes Our first 2 were wrong, only 1 of Indri’s top 5 in our top 5 If completely replacing rank, must be very good Many low confidence scores (e.g. 7.6% P(Yes) was best) Slight edit to Indri ranking less bad, but no good system found E.g. bump high-confidence Yes to top of list, leave others in Indri order
Results TREC 2004: TREC 2005: MAP0.377 Strict MRR0.414 Lenient MRR0.537 MAP0.316 Strict MRR0.366 Lenient MRR0.543
References Fang – “A Re-examination of Query Expansion Using Lexical Resources” Tellex – “Quantitative Evaluation of Passage Retrieval Algorithms for Question Answering”
Conclusions Cleaned Input Small Targeted Stop Word List Minimal Setting Indri Performs PR Well OOTB Re-ranking Implementation Needs to be Really Good Feature Selection didn’t Help Slight Adjustment Instead of Whole Different Ranking Might Help