Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Retrieval Models for Question and Answer Archives.

Slides:



Advertisements
Similar presentations
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Advertisements

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Learning to Cluster Web Search Results SIGIR 04. ABSTRACT Organizing Web search results into clusters facilitates users quick browsing through search.
IS530 Lesson 12 Boolean vs. Statistical Retrieval Systems.
1 Language Model CSC4170 Web Intelligence and Social Computing Tutorial 8 Tutor: Tom Chao Zhou
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
INFO 624 Week 3 Retrieval System Evaluation
 Manmatha MetaSearch R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst.
Using TF-IDF to Determine Word Relevance in Document Queries
Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Important Task in Patents Retrieval Recall is an Important Factor Given Query Patent -> the Task is to Search all Related Patents Patents have Complex.
Evaluating Retrieval Systems with Findability Measurement Shariq Bashir PhD-Student Technology University of Vienna.
Quality-aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim Singapore Management University Aixin Sun.
Retrieval Model and Evaluation Jinyoung Kim UMass Amherst CS646 Lecture 1.
Finding Similar Questions in large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee ACM CIKM ‘05 Presented by Mat Kelly CS895 –
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Search Engines and Information Retrieval Chapter 1.
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
Aardvark Anatomy of a Large-Scale Social Search Engine.
Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.
Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
“ SINAI at CLEF 2005 : The evolution of the CLEF2003 system.” Fernando Martínez-Santiago Miguel Ángel García-Cumbreras University of Jaén.
Accessing the Deep Web Bin He IBM Almaden Research Center in San Jose, CA Mitesh Patel Microsoft Corporation Zhen Zhang computer science at the University.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Which of the two appears simple to you? 1 2.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
Natural Language Processing Guangyan Song. What is NLP  Natural Language processing (NLP) is a field of computer science and linguistics concerned with.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Effective Query Formulation with Multiple Information Sources
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Can Change this on the Master Slide Monday, August 20, 2007Can change this on the Master Slide0 A Distributed Ranking Algorithm for the iTrust Information.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Hypersearching the Web, Chakrabarti, Soumen Presented By Ray Yamada.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
1 Statistical Machine Translation Models for Personalized Search Rohini U AOL India R&D, Bangalore India Vamshi Ambati Language.
Ling573 NLP Systems and Applications May 7, 2013.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Iterative Translation Disambiguation for Cross-Language.
AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Longzhuang Li, Yi Shang, Wei Zhang 2002.ACM. Improvement of HITS-based Algorithms.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Evaluation Anisio Lacerda.
Proposal for Term Project
An Empirical Study of Learning to Rank for Entity Search
Chinese Academy of Sciences, Beijing, China
Do-Gil Lee1*, Ilhwan Kim1 and Seok Kee Lee2
Applying Key Phrase Extraction to aid Invalidity Search
Ranking using Multiple Document Types in Desktop Search
Presentation transcript:

Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Retrieval Models for Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Xiaobing Xue Presenter Sawood Alam

Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Center for Intelligent Information Retrieval, Computer Science Department University of Massachusetts, Amherst, MA CIKM '05, Proceedings of the 14th ACM Conference on Information and Knowledge Management, 2005

Introduction Q&A systems quickly build large archives – Naver, a popular Korean search site gets 25,000+ questions per day Great linguistic resource Answering questions from the archive before a human response appear

Q&A Over Usual Search Opinion or summary Direct answers rather than relevant documents Search in collection of questions associated with answers Lexical similarity vs. semantic similarity – Is downloading movies illegal? – Can I share a copy of a DVD online?

Solving Word Mismatch Problem Knowledge database (machine readable dictionaries) – unreliable performance Manual rules or templates – hard to scale Statistical technique – most promising – Requires large training data set

Question and Answer Archive Average lengths (words) Title: 5.8 Body: 49 Answer: 179

Relevance Judgments Eighteen different retrieval results (varying retrieval algorithms) – Query likelihood, Okapi BM25 and overlap coeficient Top 20 Q&A pairs from each retrieval result Manual judgment Correctness of answer was ignored Manual browsing for missing relevant Q&A pairs

Field Importance

Generation of Training Sample LM-HRANK Sim(A, B) = (1/r 1 + 1/r 2 ) / 2 Where: Answer A retrieves B at rank r 1 Answer B retrieves A at rank r 2

Word Translation Probabilities

Experiments and Results

Examples and Analysis

Retrieval Models for Question and Answer Archives Jiwoon Jeon Google, Inc. Mountain View, CA 94043, USA W. Bruce Croft and Xiaobing Xue Center for Intelligent Information Retrieval, Computer Science Department University of Massachusetts, Amherst, MA SIGIR '08, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, 2008

Introduction Word mismatch problem Focus on translation based approach Explanation of poor performance of pure IBM model vs. query-likelihood language model Proposed a mixed model – Query part: translation based language model – Answer part: query likelihood language model

LM vs. IBM model 1

Question Part

Answer Part Gamma = 0 : translation based (for question part) Gamma = 1 : query likelihood LM (for answer part) Beta = 0 : combination model

Word-to-Word Translation Probability Word “cheat” in question – “trust”, “forgive”, “dump” and “leave” etc. in answer Word “cheat” in answer – “husband” and “boyfriend” etc. in question All these words are useful to attack word mismatch problem – Combined probability used: P(Q|A) and P(A|Q)

Examples

Experimental Results

Conclusions Translation based language model for query part and QL language model for answer part Experiment done on a Q&A web service where people answer others questions Future work – Testing effect of proposed model on FAQ archives – Yahoo! Answers collection – Phrase based machine translation rather than word based translation