The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Less is More Probabilistic Model for Retrieving Fewer Relevant Docuemtns Harr Chen and David R. Karger MIT CSAIL SIGIR2006 4/30/2007.

1 Opinion Summarization Using Entity Features and Probabilistic Sentence Coherence Optimization (UIUC at TAC 2008 Opinion Summarization Pilot) Nov 19,

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

Introduction to Information Retrieval (Part 2) By Evren Ermis.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.

Evaluating Search Engine

How to Make Manual Conjunctive Normal Form Queries Work in Patent Search Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science.

Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.

Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University

Modern Information Retrieval

Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.

Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

Exploration & Exploitation in Adaptive Filtering Based on Bayesian Active Learning Yi Zhang, Jamie Callan Carnegie Mellon Univ. Wei Xu NEC Lab America.

Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.

An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,

ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.

Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.

Federated Search of Text Search Engines in Uncooperative Environments Luo Si Language Technology Institute School of Computer Science Carnegie Mellon University.

Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.

Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?

IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.

A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA

1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.

Estimating Topical Context by Diverging from External Resources SIGIR’13, July 28–August 1, 2013, Dublin, Ireland. Presenter: SHIH, KAI WUN Romain Deveaud.

IR System Evaluation Farhad Oroumchian. IR System Evaluation System-centered strategy –Given documents, queries, and relevance judgments –Try several.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals Test collections: evaluating sets  Test collections: evaluating rankings Interleaving.

Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.

Collecting High Quality Overlapping Labels at Low Cost Grace Hui Yang Language Technologies Institute Carnegie Mellon University Anton Mityagin Krysta.

Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.

NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.

Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.

1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.

Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.

Efficient Result-set Merging Across Thousands of Hosts Simulating an Internet-scale GIR application with the GOV2 Test Collection Christopher Fallen Arctic.

Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.

Information Retrieval Quality of a Search Engine.

Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.

The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.

Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.

Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.

Relevant Document Distribution Estimation Method for Resource Selection Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University

A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.

University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G

Federated text retrieval from uncooperative overlapped collections Milad Shokouhi, RMIT University, Melbourne, Australia Justin Zobel, RMIT University,

Evaluation Anisio Lacerda.

Evaluation of IR Systems

An Empirical Study of Learning to Rank for Entity Search

Compact Query Term Selection Using Topically Related Text

Modern Information Retrieval

IR Theory: Evaluation Methods

John Lafferty, Chengxiang Zhai School of Computer Science

Retrieval Evaluation - Measures

Retrieval Performance Evaluation - Measures

Precision and Recall Reminder:

Presentation transcript:

The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the term weights  Top n terms  Relevance model Indri query: #weight(w 1 r 1 w 2 r 2.. w n r n ), where, w i = P(r i | I)  Interpolation with original query #weight( w Original_Query (1-w) Relevant_Model_Query ) Extending Relevance Model for Relevance Feedback Le Zhao Chenmin Liang Jamie Callan Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA Introduction TREC 2008 Relevance Feedback track defines a testbed for evaluating relevance feedback algorithms. It includes different levels of feedback, from only 1 relevant feedback document to over 100 judgments with at least 3 relevant documents per topic. Data Set Documents:  GOV2 collection Topics:  50 topics from previous Terabyte tracks  150 topics from Million Query tracks Feedback:  Top documents ranked by systems from the previous tracks  Judgments also from previous tracks Conclusions & Future Work  The extended relevance model works well. (Otherwise would vary based on the number of relevant documents.)  One randomly-sampled relevant document is more informative than a top-ranked relevant document.  Merging relevance feedback and PRF is significantly better than relevance feedback.  Top ranked negative feedback documents probably carry more information for the system than top ranked relevant feedback documents. Future work. Experiments Baseline  Dependency model queries, for increased top precision  Pseudo relevance feedback (relevance model) for better recall  Best runs in 2005, 2006 Terabyte tracks Extended Relevance Model  Stability of optimal tuning on a per topic basis gives only 3-4% improvement on feedback set C or D suggest tuning the interpolation of the extended relevance model with the original query  Optimal around , significantly better than relevance feedback alone, when only one (the top) relevant document is used for feedback. p<0.004 by paired sign-test  No significant difference between merged model w/ top rel doc fdbk and PRF  Performance change as amount of feedback information increases Goal The design of feedback algorithms is most challenging when the amount of feedback information is minimal. Thus, we aim at designing a robust relevance feedback algorithm that can utilize even a small number of feedback documents to achieve robust performance. Top Docs User Feedback Term Weighting Relevance Model Initial Query Feedback Retrieval Figure 1. Flowchart of our relevance feedback model The Extended Relevance Model  Problem Setup:  weight feedback terms according to relevant feedback docs and pseudo relevant docs – instead of building two queries and combining;  use single tuning parameter to control how much more important true relevant documents should be than the pseudo ones;  Goal: separate out factors that affect term weights from the two sources: #fdbk docs, #rel docs, P(I) etc., so that stable across topics.  Key problem: modeling P(I); can no longer be dropped w/o cost! Empirical judged relevance: Extended Relevance Model (decomposed) Uniform empirical document distribution: 1/|Pseudo|  Empirical distributions normalize out factors like #fdbk documents, and #relevant docs, thus, correct the bias toward the majority source. Modeling P(I) Generated from Collection model:  P(I | C) ~ (approximated with) P(Q | C) Considering documents in the collection:  max D in C P(I | D) ~ max D in C P(Q | D)  Intuition: relevant document is as good as the best document in C  avg D in TopN P(I | D) ~ avg D in TopN P(Q | D)  Intuition: relevant document is as good as the average of TopN in C Goal is to make stable, across topics with different P(I | D) values.  training topics from previous Terabyte (TB) and MQ tracks different from test – TB only  feedback documents randomly sampled from judgments different from test – top ranked by previous TREC runs  almost flat curve PRF is gaining a lot need lower ranked relevant documents for effective feedback?