Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.

Slides:



Advertisements
Similar presentations
Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.
Advertisements

Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Language Models Hongning Wang
Cumulative Progress in Language Models for Information Retrieval Antti Puurula 6/12/2013 Australasian Language Technology Workshop University of Waikato.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Visual Recognition Tutorial
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.
Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.
Formal Multinomial and Multiple- Bernoulli Language Models Don Metzler.
Carnegie Mellon Exact Maximum Likelihood Estimation for Word Mixtures Yi Zhang & Jamie Callan Carnegie Mellon University Wei Xu.
Exploration & Exploitation in Adaptive Filtering Based on Bayesian Active Learning Yi Zhang, Jamie Callan Carnegie Mellon Univ. Wei Xu NEC Lab America.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University.
Language Modeling Approaches for Information Retrieval Rong Jin.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 30, (2014) BERLIN CHEN, YI-WEN CHEN, KUAN-YU CHEN, HSIN-MIN WANG2 AND KUEN-TYNG YU Department of Computer.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
1 Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.
Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Estimating Topical Context by Diverging from External Resources SIGIR’13, July 28–August 1, 2013, Dublin, Ireland. Presenter: SHIH, KAI WUN Romain Deveaud.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Effective Query Formulation with Multiple Information Sources
A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.
Multifactor GPs Suppose now we wish to model different mappings for different styles. We will add a latent style vector s along with x, and define the.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
Gravitation-Based Model for Information Retrieval Shuming Shi, Ji-Rong Wen, Qing Yu, Ruihua Song, Wei-Ying Ma Microsoft Research Asia SIGIR 2005.
Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper.
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.
Supporting Knowledge Discovery: Next Generation of Search Engines Qiaozhu Mei 04/21/2005.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Using Social Annotations to Improve Language Model for Information Retrieval Shengliang Xu, Shenghua Bao, Yong Yu Shanghai Jiao Tong University Yunbo Cao.
Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR
Leveraging Knowledge Bases for Contextual Entity Exploration Categories Date:2015/09/17 Author:Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv Source:KDD'15.
Relevance Feedback Hongning Wang
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
{ Adaptive Relevance Feedback in Information Retrieval Yuanhua Lv and ChengXiang Zhai (CIKM ‘09) Date: 2010/10/12 Advisor: Dr. Koh, Jia-Ling Speaker: Lin,
The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
A Study of Poisson Query Generation Model for Information Retrieval
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Ch3: Model Building through Regression
Relevance Feedback Hongning Wang
John Lafferty, Chengxiang Zhai School of Computer Science
Topic Models in Text Processing
CS590I: Information Retrieval
Language Models for TR Rong Jin
Presentation transcript:

Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18

Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18

Presentation Guideline Review on Feedback Motivation Positional Relevance Model Experiments, Results and Analyses Conclusions

Presentation Guideline Review on Feedback Motivation Positional Relevance Model Experiments, Results and Analyses Conclusions

Review on Feedback (1)Explicit Feedback Easy for training But need user’s interaction

Review on Feedback (2)Implicit Feedback Not need interaction But work more on mining

(3)Pseudo- Relevance Feedback No need user’s interaction No Mining

Problems? Traditional Pseudo-Relevance Feedback assumes that the contents of a document are incoherent (sharing the same topic). What if a document shares different topics ? Term-based? Or document-based?

Presentation Guideline Review on Feedback Motivation Positional Relevance Model Experiments, Results and Analyses Conclusions

Presentation Guideline Review on Feedback Motivation Positional Relevance Model Experiments, Results and Analyses Conclusions

Motivation How to effectively select from feedback documents the words that are focused on the query topic based on positions of terms in feedback documents?

Presentation Guideline Review on Feedback Motivation Positional Relevance Model Experiments, Results and Analyses Conclusions

Presentation Guideline Review on Feedback Motivation Positional Relevance Model Experiments, Results and Analyses Conclusions

Positional Relevance Model Relevance Model (one of the most robust) Θ rep- resent the set of smoothed document models for the pseudo feedback documents. p(θD) is a prior on documents and is often assumed to be uniform without any additional prior knowledge about document D. After the relevance model is estimated, the estimated P (w|Q) can then be interpolated with the original query model θQ to improve performance. α is a parameter to control the amount of feedback.

Positional Relevance Model(PRM) i indicates a position in document D F is the set of feedback documents (assumed to be relevant) Challenge? How to estimate joint probability? Positional Relevance Model

Methods (1) i.i.d. Sampling (2) conditional Sampling estimating P (w, Q, D, i) Positional Relevance Model

i.i.d. sampling Positional Relevance Model

i.i.d. sampling derivation(1) P(D) can be interpreted as a document prior and set to a uniform distribution with no prior knowledge about document D. assume that every position is equally likely but it is possible to estimate P(i|D) based on document structures assume that the generation of word w and that of query Q are independent Positional Relevance Model

i.i.d. sampling derivation(2) In the above equation, P (w|D, i) is the probability of sampling word w at position i in document D. To improve the efficiency of PRM, we simplify P (w|D, i) as: QUESTION: HOW to estimate ? The query likelihood at position i of document D. Positional Relevance Model

conditional sampling …… QUESTION: HOW to estimate ? The query likelihood at position i of document D. Positional Relevance Model

estimate the query likelihood at position i of document D (1)Use Positional Language Model (2)Use Gaussian kernel function (3)Approximate (4)Set parameters Positional Relevance Model

estimate the query likelihood at position i of document D (5)Use JM Smoothing (6)Compute The computation of positional query likelihood is the most time-consuming part in estimating PRM. Positional Relevance Model

Presentation Guideline Review on Feedback Motivation Positional Relevance Model Experiments, Results and Analyses Conclusions

Presentation Guideline Review on Feedback Motivation Positional Relevance Model Experiments, Results and Analyses Conclusions

Experiments and Results Evaluation methods. (1) The basic retrieval model is the KL-divergence retrieval model, and we chose the Dirichlet smoothing method [33] for smoothing document language models, where the smoothing parameter μ was set empirically to This method was labeled as “NoFB”. (2) The baseline pseudo feedback method is the relevance model “RM3”, which is one of the most effective and robust pseudo feedback methods un- der language modeling framework. (3) Another baseline pseudo feedback method is a standard passage- based feed- back model, labeled as “RM3-p”, which estimates the RM3 relevance model based on the best matching passage of each feedback document. (4) We have two variations of PRM, i.e., “PRM1” and “PRM2”, which are based on the two estimation methods described in Section 3.2, respectively. (5) In addition, we also used PRM1 and PRM2 for passage feed- back in a way as RM3-p does. Specifically, we first computed a PLM for each position of the document, and then we estimate a PRM based on a passage of size 2σ centered at the position with the maximum positional query likelihood score

Results Experiments and Results

More results Robustness Analysis Experiments and Results

Presentation Guideline Review on Feedback Motivation Positional Relevance Model Experiments, Results and Analyses Conclusions

Presentation Guideline Review on Feedback Motivation Positional Relevance Model Experiments, Results and Analyses Conclusions

Propose a novel positional relevance model(PRM) PRM exploits term position and proximity to assign more weights to words closer to query words, based on the intuition—words closer to query words are more likely to be consistent with the query topic. Experiments results show that PRM is quite effective and performs significantly better than others based on document or passage. Conclusions

Questions?