Using Statistical Decision Theory and Relevance Models for Query-Performance Prediction Anna Shtok and Oren Kurland and David Carmel SIGIR 2010 Hao-Chin.

Slides:



Advertisements
Similar presentations
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Advertisements

Meeting Presentation sept.12 Things to do since last meeting: (1) find out the number of drug name in FDA website (done, the number is 6244 which is OK.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Federated Search of Text Search Engines in Uncooperative Environments Luo Si Language Technology Institute School of Computer Science Carnegie Mellon University.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Chapter 9: Introduction to the t statistic
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 30, (2014) BERLIN CHEN, YI-WEN CHEN, KUAN-YU CHEN, HSIN-MIN WANG2 AND KUEN-TYNG YU Department of Computer.
Hierarchical Summaries By: Dawn J. Lawrie University of Massachusetts, Amherst for Search.
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Introduction to Regression Analysis. Two Purposes Explanation –Explain (or account for) the variance in a variable (e.g., explain why children’s test.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
Estimating Topical Context by Diverging from External Resources SIGIR’13, July 28–August 1, 2013, Dublin, Ireland. Presenter: SHIH, KAI WUN Romain Deveaud.
Chapter 17 Partial Correlation and Multiple Regression and Correlation.
Correlation & Regression
Correlation and Prediction Error The amount of prediction error is associated with the strength of the correlation between X and Y.
A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.
Introduction to Behavioral Statistics Correlation & Regression.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Chapter 16 Data Analysis: Testing for Associations.
A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
CORRELATION. Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson’s coefficient of correlation.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.
PERSONALIZED DIVERSIFICATION OF SEARCH RESULTS Date: 2013/04/15 Author: David Vallet, Pablo Castells Source: SIGIR’12 Advisor: Dr.Jia-ling, Koh Speaker:
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS Date: 2012/11/22 Author: Anna Shtok, Gideon Dror, Yoelle Maarek, Idan Szpektor Source:
Business Research Methods
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
CORRELATION ANALYSIS.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
{ Adaptive Relevance Feedback in Information Retrieval Yuanhua Lv and ChengXiang Zhai (CIKM ‘09) Date: 2010/10/12 Advisor: Dr. Koh, Jia-Ling Speaker: Lin,
The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.
Language-model-based similarity on large texts Tolga Çekiç /10.
QUERY-PERFORMANCE PREDICTION: SETTING THE EXPECTATIONS STRAIGHT Date : 2014/08/18 Author : Fiana Raiber, Oren Kurland Source : SIGIR’14 Advisor : Jia-ling.
Correlation and Regression Q560: Experimental Methods in Cognitive Science Lecture 13.
CORRELATION.
R. E. Wyllys Copyright 2003 by R. E. Wyllys Last revised 2003 Jan 15
Compact Query Term Selection Using Topically Related Text
Introduction to Behavioral Statistics
Applying Key Phrase Extraction to aid Invalidity Search
Educational Research: Correlational Studies
A Similarity Retrieval System for Multimodal Functional Brain Images
Information Overload and National Culture: Assessing Hofstede’s Model Based on Data from Two Countries Ned Kock, Ph.D. Dept. of MIS and Decision Science.
CORRELATION ANALYSIS.
Non – Parametric Test Dr. Anshul Singh Thapa.
Standard deviation Spearman's Rank Correlation Chi squared test
Learning to Rank with Ties
Information Retrieval and Web Design
Descriptive Statistics
Effect of Sample size on Research Outcomes
Ranking using Multiple Document Types in Desktop Search
A Neural Passage Model for Ad-hoc Document Retrieval
Presentation transcript:

Using Statistical Decision Theory and Relevance Models for Query-Performance Prediction Anna Shtok and Oren Kurland and David Carmel SIGIR 2010 Hao-Chin Chang Department of Computer Science & Information Engineering National Taiwan Normal University 2011/08/01

2 Outline Introduction Relevance-Model Relevance Score –Clarity –WIG –NUC –QF Ranking List Experiment Conclusion

Introduction We present a novel framework for query-performance prediction that is based on statistical decision theory and relevance model. We consider a ranking induced by a retrieval method in response to a query as a decision taken so as to satisfy the underlying information need. Our goal is to predict the query-performance of M with respect to q. We instantiate various query-performance predictors from the framework by varying the –estimates of the relevance-model –measures for the quality of a relevance-model estimate –selects a measure of similarity between ranked lists 3

Relevance-Model represents the information need I q Negative Cross Entropy 4

Relevance Score(Clarity,WIG) The socre be measured by the KL divergence WIG is based on estimating the presumed percentage of relevant documents in the set S from which is constructed 5

Relevance Score(NQC) NQC, is based on the hypothesis that the standard deviation of retrieval scores in the result list is negatively correlated with the potential amount of query drift — i.e., non-query-related information manifested in the list. u is the mean retrieval score in 6

Relevance Score(QF)  this goal is to represent ranked list L by a language model  Terms are ranked by their contribution to the language model’s KL (Kullback-Leibler) divergence from the background collection model.  Top ranked terms will be chosen to form the new query Q’ 7

Relevance Score(QF) P(D|L) is estimated by a linearly decreasing function of the rank of document D Each term in P(w|L) is ranked The top N ranked terms by form a weighted query Q={(w i,t i )} w i denotes the i-th ranked term weight t i is the KL-divergence contribution of w i 8

Similarity between ranked lists Pearson’s coefficient and Spearman’s-ρ and Kendall’s-γ correlation between the original list ranking and its relevance model based ranking are computed 9

Experiment 10

Experiment 11

Experiment 12

13

14 Conclusion Improving the sampling technique used for relevance model construction Devising and adapting better measures of representativeness for relevance models constructed form cluster