SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.

Slides:



Advertisements
Similar presentations
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Advertisements

Metadata in Carrot II Current metadata –TF.IDF for both documents and collections –Full-text index –Metadata are transferred between different nodes Potential.
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.
 Andisheh Keikha Ryerson University Ebrahim Bagheri Ryerson University May 7 th
Information Retrieval in Practice
Evaluating Search Engine
Chapter 7 Retrieval Models.
1 Language Model CSC4170 Web Intelligence and Social Computing Tutorial 8 Tutor: Tom Chao Zhou
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Language Modeling Approaches for Information Retrieval Rong Jin.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Chapter 5: Information Retrieval and Web Search
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous.
Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Retrieval Models for Question and Answer Archives.
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Finding Better Answers in Video Using Pseudo Relevance Feedback Informedia Project Carnegie Mellon University Carnegie Mellon Question Answering from Errorful.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
INTRODUCTION TO RESEARCH. Learning to become a researcher By the time you get to college, you will be expected to advance from: Information retrieval–
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Chapter 6: Information Retrieval and Web Search
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Chapter 23: Probabilistic Language Models April 13, 2004.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
ICIP 2004, Singapore, October A Comparison of Continuous vs. Discrete Image Models for Probabilistic Image and Video Retrieval Arjen P. de Vries.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Using Social Annotations to Improve Language Model for Information Retrieval Shengliang Xu, Shenghua Bao, Yong Yu Shanghai Jiao Tong University Yunbo Cao.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
Automatic Question Answering Beyond the Factoid Radu Soricut Information Sciences Institute University of Southern California Eric Brill Microsoft Research.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 14: Language Models for IR.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Lecture 13: Language Models for IR
John Lafferty, Chengxiang Zhai School of Computer Science
Topic Models in Text Processing
INF 141: Information Retrieval
Conceptual grounding Nisheeth 26th March 2019.
Presentation transcript:

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Discussion Current Information Retrieval systems?

OVERVIEW Introduction Q&A Retrieval Test Collections Translation Based Q&A retrieval framework Learning word-to-word translations

INTRODUCTION Q&A Retrieval problem Challenges Semantically similar questions Problem : Word mismatch problem Solution : Machine translation-based information retrieval model Quality of the Answers Problem : Many answers to a given question Solution : Answer Quality Prediction Technique

What is New? New Type of Information System New Translation-based Retrieval Model New Document Quality Estimation Method Integration of Advances in Multiple research Areas New Paraphrase Generation Method Utilizing Web as a Resource for Retrieval

OVERVIEW Introduction Q&A Retrieval Test Collections Translation Based Q&A retrieval framework Learning word-to-word translations

Q & A RETRIEVAL Question & Answer Archives Websites with FAQ Community based question answering services Task Definition

Q & A Retrieval (Contd..)

Advantages Handle natural language questions Return answers instead of relevant documents Disadvantages Can answer only previously answered questions

Q & A RETRIEVAL SYSTEM ARCHITECTURE

CHALLENGES Finding relevant Question & Answer Pairs Importance of question parts Word mismatch problem Estimating Answer Quality Importance

OVERVIEW Introduction Q&A Retrieval Test Collections Translation Based Q&A retrieval framework Learning word-to-word translations

TEST COLLECTIONS Components : Set of documents Set of information needs (queries) Set of relevance judgment Pooling Method

WONDIR COLLECTION Earliest community based QA service in the US. 1 million question and answer pairs used from this service Average question length = 27 words Average answer length = 28 words

Examples

Queries Closed-class questions that ask fact based short answers. E.g.: Where is Charlotte located? Relevance Judgment 220 relevant Q&A pairs for 50 queries using pooling method. Relevance Judgment Criteria

WebFAQ COLLECTION by Jijkoun and Rijke Collection of FAQs using web crawlers- made public for research purposes. Found web pages that contain the word “FAQ”. Used heuristic methods to automatically extract question and answer pairs from the web pages.

NAVER COLLECTION Leading portal site in South Korea Community-based answering service Collection A : Category information – To test category specific translations Collection B : Non-Textual Information – To build answer quality prediction technique

Naver Collection (Contd..) Question – Title & Body Naver Test Collection A Naver Test Collection B Relevance : Question semantically related to query and Question contains all query terms Q&A pair was clicked multiple times for the query.

Comparison of test Collections

OVERVIEW Introduction Q&A Retrieval Test Collections Translation Based Q&A retrieval framework Learning word-to-word translations

Translation Based Q&A Retrieval framework Use of Machine Translation technique for information retrieval Word mismatch problem Translation based approach

IBM Statistical Machine translation Models Do not require any linguistic knowledge of the source or target language. Exploits only co-occurrence statistics of terms in training data.

IBM Models Model 1 Treats every possible word alignment equally Model 2 Assumes only positions of terms are related to the word alignment Model 3 The first term and the second term generated from the same term are independent

IBM Models (Contd..) Model 4 First order alignment model Every word is dependent only on the previous aligned word. Model 5 Reformulation of Model 4

Advantages of Model 1 Efficient implementation is possible using a form of query expansion. Performance gain of using low level translation models is high. Can be easily integrated into the query likelihood

IBM Model 1 Equation The probability that a query Q of length m is the translation of a document D (of length n) is given as

IBM Model 1 Equation

Translation based Language Models Language model is a mechanism for generating text. Unigram language model Assumes each word is generated independently Concerns only probabilities of sampling a single word.

Language modeling approach to IR In maximum likelihood estimator, unseen words in a document have zero probability. Smoothing : Transfers some probability mass from the seen words to the unseen words. Dirichlet smoothing – good performance and cheap computational cost.

Language modeling approach to IR (Contd..) The ranking function for the query likelihood language model with Dirichlet smoothing can be written as

IBM Model 1 vs. Query Likelihood Comparable components in the two models

Self Translation Model Every word has some probability to translate to itself. Cannot be 1 If too low – deteriorate retrieval performance

TransLM Final ranking Function looks like

Efficiency Issues and Implementation of TransLM Flipped Translation Tables

Term-at-a-time Algorithm

OVERVIEW Introduction Q&A Retrieval Test Collections Translation Based Q&A retrieval framework Learning word-to-word translations

Properties of Word Relationships Not Symmetric Not fixed Change depending on retrieval or translation tasks. must be given as probability values.

Training Sample Generation Key Idea If two answers are very similar, then the corresponding questions are semantically similar. Similarity Measures Cosine Similarity Query Likelihood scores between two answers (LM SCORE) LM-HRANK

Word Relationship Types P(Q|A) Source – Answer ; Target – Question P(A|Q) Source – Question ; Target – Answer P(Q|Q) P(Q Q)

EM Algorithm Find word relationships that maximize the likelihood of sampling the target text from the source text in training samples.

EM Algorithm (Contd..) The translation probability from a source word t to a target word w is given as

EM Algorithm (Contd..) The translation probability from a source word t to a target word w is given as

Examples

Examples (Contd..)

SUMMARY Introduction Q&A Retrieval Test Collections Translation Based Q&A retrieval framework Learning word-to-word translations

Coming Up Next… Estimating Answer Quality Experiments