Download presentation
Presentation is loading. Please wait.
Published byIra Little Modified over 9 years ago
1
Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Retrieval Models for Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Xiaobing Xue Presenter Sawood Alam
2
Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Center for Intelligent Information Retrieval, Computer Science Department University of Massachusetts, Amherst, MA 01003 [jeon,croft,joonho]@cs.umass.edu CIKM '05, Proceedings of the 14th ACM Conference on Information and Knowledge Management, 2005
3
Introduction Q&A systems quickly build large archives – Naver, a popular Korean search site gets 25,000+ questions per day Great linguistic resource Answering questions from the archive before a human response appear
4
Q&A Over Usual Search Opinion or summary Direct answers rather than relevant documents Search in collection of questions associated with answers Lexical similarity vs. semantic similarity – Is downloading movies illegal? – Can I share a copy of a DVD online?
5
Solving Word Mismatch Problem Knowledge database (machine readable dictionaries) – unreliable performance Manual rules or templates – hard to scale Statistical technique – most promising – Requires large training data set
6
Question and Answer Archive Average lengths (words) Title: 5.8 Body: 49 Answer: 179
7
Relevance Judgments Eighteen different retrieval results (varying retrieval algorithms) – Query likelihood, Okapi BM25 and overlap coeficient Top 20 Q&A pairs from each retrieval result Manual judgment Correctness of answer was ignored Manual browsing for missing relevant Q&A pairs
8
Field Importance
9
Generation of Training Sample LM-HRANK Sim(A, B) = (1/r 1 + 1/r 2 ) / 2 Where: Answer A retrieves B at rank r 1 Answer B retrieves A at rank r 2
10
Word Translation Probabilities
11
Experiments and Results
12
Examples and Analysis
13
Retrieval Models for Question and Answer Archives Jiwoon Jeon Google, Inc. Mountain View, CA 94043, USA jjeon@google.com W. Bruce Croft and Xiaobing Xue Center for Intelligent Information Retrieval, Computer Science Department University of Massachusetts, Amherst, MA 01003 [croft,xuexb]@cs.umass.edu SIGIR '08, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, 2008
14
Introduction Word mismatch problem Focus on translation based approach Explanation of poor performance of pure IBM model vs. query-likelihood language model Proposed a mixed model – Query part: translation based language model – Answer part: query likelihood language model
15
LM vs. IBM model 1
16
Question Part
17
Answer Part Gamma = 0 : translation based (for question part) Gamma = 1 : query likelihood LM (for answer part) Beta = 0 : combination model
18
Word-to-Word Translation Probability Word “cheat” in question – “trust”, “forgive”, “dump” and “leave” etc. in answer Word “cheat” in answer – “husband” and “boyfriend” etc. in question All these words are useful to attack word mismatch problem – Combined probability used: P(Q|A) and P(A|Q)
19
Examples
20
Experimental Results
21
Conclusions Translation based language model for query part and QL language model for answer part Experiment done on a Q&A web service where people answer others questions Future work – Testing effect of proposed model on FAQ archives – Yahoo! Answers collection – Phrase based machine translation rather than word based translation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.