QUERY-PERFORMANCE PREDICTION: SETTING THE EXPECTATIONS STRAIGHT Date : 2014/08/18 Author : Fiana Raiber, Oren Kurland Source : SIGIR’14 Advisor : Jia-ling Koh Speaker : Shao-Chun Peng
Outline Introduction Related Work Approach Experimental Conclusion
Introduction What is “Query”? query Browser method corpus Retrieved list
Introduction What the user wants when they query? The document really relevant with query.
Motivation Why we need to “predict” the query Performance ? Improved prediction methods do not lead to improved retrieval methods Bad query Browser method corpus Retrieved list good query Don’t change method
Purpose How to estimate retrieval effectiveness in the absence of relevance judgments.
Outline Introduction Related Work Approach Experimental Conclusion
Prediction task Prediction over corpora Prediction over retrieved lists Prediction over queries pre retrieval post retrieval
Prediction task notations Q queries C document corpora M retrieval methods L Retrieved list R =1 if the retrieval was effective 0 otherwise query corpus method list
Prediction over corpora Federated search Fix Q=q for each c any assignment m query corpus Relevant ?
Prediction over retrieved lists Fusion task Lists differ due to the retrieval method Fix Q=q C=c for each l query list Relevant ?
Prediction over queries pre retrieval Fix C=c for each q post retrieval Fix C=c estimate for each pair of q and m
Related Work why the expectation that using previously proposed query-performance predictors would help to improve retrieval effectiveness was not realized. How to improve retrieval effectiveness by using query-performance predictors?
Outline Introduction Related Work Approach Experimental Conclusion
Approach Prediction over corpora Cluster Ranking Prediction over retrieved lists Learning to rank queries using Markov Random Fields Prediction over queries Learning to rank queries using Markov Random Fields
Markov Random Fields
Features selection SCQ Term and corpus simularity(Tf.idf based) VAR variance of the tf.idf values of a term over the documents in the corpus in which it appears IDF inverse document frequency
Features selection Entropy High entropy of the term distribution in the document potentially indicates content breadth Cohesion compute for each document d in L its similarity with all documents in L(average) Sw1 the ratio between the number of stopwords and non- stopwords Sw2 the fraction of stopwords in a stop word list
Features selection Clarity KL divergence between a relevance language model induced from the list and that induced from the corpus ImpClarity a variant of Clarity proposed for Web corpora
Features selection WIG the difference between the mean retrieval score in the list and that of the corpus which represents a pseudo non- relevant document NCQ the standard deviation of retrieval scores in the list UEF(clarity) UEF(ImpClarity) UEF(WIG) UEF(NCQ)
Outline Introduction Related Work Approach Experimental Conclusion
Data Set
Experimental X QC XLXL X LC X QLC
Experimental
Outline Introduction Related Work Approach Experimental Conclusion
Conclusion why using previously was not shown to improve retrieval effectiveness devised a learning-to-rank approach for predicting performance over queries.