Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
IR Models: Overview, Boolean, and Vector
Evaluating Search Engine
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
Vector Space Model CS 652 Information Extraction and Integration.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Ensemble Learning (2), Tree and Forest
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Chapter 6: Information Retrieval and Web Search
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Predicting Question Quality Bruce Croft and Stephen Cronen-Townsend University of Massachusetts Amherst.
Parallel and Distributed Searching. Lecture Objectives Review Boolean Searching Indicate how Searches may be carried out in parallel Overview Distributed.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Topic by Topic Performance of Information Retrieval Systems Walter Liggett National Institute of Standards and Technology TREC-7 (1999)
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Date: 2012/5/28 Source: Alexander Kotov. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Interactive Sense Feedback for Difficult Queries.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.
To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Relevant Document Distribution Estimation Method for Resource Selection Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University
1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
Query Type Classification for Web Document Retrieval In-Ho Kang, GilChang Kim KAIST SIGIR 2003.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Queensland University of Technology
Compact Query Term Selection Using Topically Related Text
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Relevance and Reinforcement in Interactive Browsing
A Neural Passage Model for Ad-hoc Document Retrieval
Presentation transcript:

Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine, David Carmel, Adam Darlow IBM Haifa Research Labs SIGIR 2005

2 Abstract Novel learning methods are used for estimating the quality of results returned by a search engine in response to a query. Novel learning methods are used for estimating the quality of results returned by a search engine in response to a query. Estimation is based on the agreement between the top results of the full query and the top results of its sub-queries. Estimation is based on the agreement between the top results of the full query and the top results of its sub-queries. Quality estimation are useful for several applications, including improvement of retrieval, detecting queries for which no relevant content exists in the document collection, and distributed information retrieval. Quality estimation are useful for several applications, including improvement of retrieval, detecting queries for which no relevant content exists in the document collection, and distributed information retrieval.

3 Introduction (1/2) Many IR systems suffer from a radical variance in performance. Many IR systems suffer from a radical variance in performance. Estimating query difficulty is an attempt to quantify the quality of results returned by a given system for the query. Estimating query difficulty is an attempt to quantify the quality of results returned by a given system for the query. Reasons for query difficulty estimation Reasons for query difficulty estimation Feedback to the user Feedback to the user The user can rephrase “difficult” queries. The user can rephrase “difficult” queries. Feedback to the search engine Feedback to the search engine To invoke alternative strategies for different queries To invoke alternative strategies for different queries Feedback to the system administrator Feedback to the system administrator To identify queries related to a specific subject, and expand the document collection. To identify queries related to a specific subject, and expand the document collection. For distributed information retrieval For distributed information retrieval

4 Introduction (2/2) The observation and motivation: The observation and motivation: queries answered well are those whose query terms agree on most of the returned documents. queries answered well are those whose query terms agree on most of the returned documents. Agreement is measured by the overlap between the top results. Agreement is measured by the overlap between the top results. Difficult queries are those: Difficult queries are those: A. The query terms cannot agree on top results. B. Most of the terms do agree except a few outliers ( 局外人 ). A TREC query for example: A TREC query for example: “What impact has the chunnel ( 水底隧道 ) had on the British economy and/or the life style of the British”

5 Related Work (1/2) In the Robust track of TREC 2004, systems are asked to rank the topics by predicted difficulty. In the Robust track of TREC 2004, systems are asked to rank the topics by predicted difficulty. The goal is eventually to use such predictions to do topic-specific processing. The goal is eventually to use such predictions to do topic-specific processing. Prediction methods suggested by the participants: Prediction methods suggested by the participants: Measuring clarity based on the system’s score of the top results Measuring clarity based on the system’s score of the top results Analyzing the ambiguity of the query terms Analyzing the ambiguity of the query terms Learning a predictor using old TREC topics as training data Learning a predictor using old TREC topics as training data (Ounis, 2004) showed that IDF-based predictor is positively related to query precision. (Ounis, 2004) showed that IDF-based predictor is positively related to query precision. (Diaz, 2004) used temporal distribution together with content of the documents to improve the prediction of AP for a query. (Diaz, 2004) used temporal distribution together with content of the documents to improve the prediction of AP for a query.

6 Related Work (2/2) The Reliable Information Access (RIA) workshop investigated the reasons for system performance variance across queries. The Reliable Information Access (RIA) workshop investigated the reasons for system performance variance across queries. 10 failure categories were identified. 10 failure categories were identified. 4 of which are due to emphasizing only partial aspects of the query. 4 of which are due to emphasizing only partial aspects of the query. One of the conclusions of this workshop: One of the conclusions of this workshop: “…comparing a full topic ranking against ranking based on only one aspect of the topic will give a measure of the importance of that aspect to the retrieved set”

7 Estimating Query Difficulty Query terms are defined as the keywords and the lexical affinities. Query terms are defined as the keywords and the lexical affinities. Features used for learning: Features used for learning: The overlap between each sub-query and the full query The overlap between each sub-query and the full query Measured by κ-statistics Measured by κ-statistics The rounded logarithm of the document frequency, log(DF), of each of the sub-queries. The rounded logarithm of the document frequency, log(DF), of each of the sub-queries. Two challenges for learning: Two challenges for learning: The number of sub-queries is not constant. The number of sub-queries is not constant. A canonic representation is needed. A canonic representation is needed. The sub-queries are not ordered. The sub-queries are not ordered.

8 Query Estimator Using a Histogram (1/2) The basic procedure: The basic procedure: 1) Find the top N results for the full query and for each sub-query. Build a histogram of the overlaps h ( i, j ) to form a feature vector. Build a histogram of the overlaps h ( i, j ) to form a feature vector. Values of log(DF) are split into 3 discrete values {0 - 1, 2 - 3, 4 + }. Values of log(DF) are split into 3 discrete values {0 - 1, 2 - 3, 4 + }. h ( i, j ) means log(DF) = i & overlaps = j. h ( i, j ) means log(DF) = i & overlaps = j. The rows of h ( i, j ) are concatenated as a feature vector. The rows of h ( i, j ) are concatenated as a feature vector. 1) Compute the linear weight vector c for prediction. An example, suppose a query has 4 sub-queries: An example, suppose a query has 4 sub-queries: log(DF( n )) = [ ], overlap = [ ] → h ( i ) = [ ]

9 Query Estimator Using a Histogram (2/2) Two additional features Two additional features 1) The score of the top-ranked document 2) The number of words in the query Estimate the linear weight vector c (Moore-Penrose pseudo- inverse): Estimate the linear weight vector c (Moore-Penrose pseudo- inverse): c = (H . H T ) -1 . H . t T H = the matrix with columns are feature vectors of training queries t = a vector of the target measure or MAP) of training queries (H and t can be modified according to the objective)

10 Query Estimator Using a Modified Decision Tree (1/2) Useful for sparseness, i.e. queries are too short. Useful for sparseness, i.e. queries are too short. A binary decision tree A binary decision tree Pairs of overlap and log(DF) of sub-queries form features. Pairs of overlap and log(DF) of sub-queries form features. Each node consists of a weight vector, threshold, and score. Each node consists of a weight vector, threshold, and score. An example: An example:

11 Query Estimator Using a Modified Decision Tree (2/2) The concept of Random Forest The concept of Random Forest Better decision trees can be obtained by training a multitude of trees, each in a slightly different manner or using different data. Better decision trees can be obtained by training a multitude of trees, each in a slightly different manner or using different data. Apply AdaBoost algo. to resample training data Apply AdaBoost algo. to resample training data

12 Experiment and Evaluation (1/2) The IR system is Juru. The IR system is Juru. Two document collections Two document collections TREC-8: 528,155 documents, 200 topics TREC-8: 528,155 documents, 200 topics WT10G: 1,692,096 documents, 100 topics WT10G: 1,692,096 documents, 100 topics Four-fold cross-validation, Four-fold cross-validation, Measured by Kendall ’ s-τcoefficient Measured by Kendall ’ s-τcoefficient

13 Experiment and Evaluation (2/2) Compared with some other algorithms Compared with some other algorithms Estimation based on the score of the top result Estimation based on the score of the top result Estimation based on the average score of the top ten results Estimation based on the average score of the top ten results Estimation based on the standard deviation of IDF values of query terms Estimation based on the standard deviation of IDF values of query terms Estimation based on learning a SVM for regression Estimation based on learning a SVM for regression

14 Application 1: Improving IR Using Query Estimation (1/2) Selective automatic query expansion Selective automatic query expansion 1. Adding terms to the query based on frequently appearing terms in the top retrieved documents 2. Only works for easy queries 3. Using the same features to train a SVM classifier Deciding which part of the topic should be used Deciding which part of the topic should be used 1. TREC topics contain two parts: short title and longer description 2. Some topics that are not answered well by the description part are better answered by the title part. 3. Difficult topics use title part and easy topics use description.

15 Application 1: Improving IR Using Query Estimation (2/2)

16 Application 2: Detecting Missing Content (1/2) Missing content queries (MCQs) are those have no relevant document in the collection. Missing content queries (MCQs) are those have no relevant document in the collection. Experiment method Experiment method 166 MCQs are created artificially from 400 TREC queries 166 MCQs are created artificially from 400 TREC queries 200 TREC topics consist of title and description. 200 TREC topics consist of title and description. Ten-fold cross-validation Ten-fold cross-validation A tree-based classifier is trained to classify MCQs from non-MCQs. A tree-based classifier is trained to classify MCQs from non-MCQs. A query difficulty estimator may or may not be used as a pre-filter of easy queries before the MCQ classifier. A query difficulty estimator may or may not be used as a pre-filter of easy queries before the MCQ classifier.

17 Application 2: Detecting Missing Content (2/2)

18 Application 3: Merging the Results of Distributed Retrieval (1/2) It is difficult to rerank the documents from different datasets since the scores are local for each specific dataset. It is difficult to rerank the documents from different datasets since the scores are local for each specific dataset. CORI (W. Croft, 1995) is one of the state-of-the-art algorithm for distributed retrieval, using inference network to do collection ranking. CORI (W. Croft, 1995) is one of the state-of-the-art algorithm for distributed retrieval, using inference network to do collection ranking. Apply the estimator to this problem: Apply the estimator to this problem: A query estimator is trained for each dataset. A query estimator is trained for each dataset. The estimated difficulty is used for weighting the scores. The estimated difficulty is used for weighting the scores. These weighted scores are merged to built the final ranking. These weighted scores are merged to built the final ranking. Ten-fold cross-validation Ten-fold cross-validation Only minimal information is supplied by the search engine. Only minimal information is supplied by the search engine.

19 Application 3: Merging the Results of Distributed Retrieval (2/2) Selective weighting Selective weighting All queries are clustered (2-means) based on their estimations for each of the datasets. All queries are clustered (2-means) based on their estimations for each of the datasets. In one cluster, the variance of the estimations is small In one cluster, the variance of the estimations is small → unweighted scores are better for queries in this cluster. The estimations of difficulty become noise when there is little variance. The estimations of difficulty become noise when there is little variance.

20 Conclusions and Future Work Two methods for learning an estimator of query difficulty are described. Two methods for learning an estimator of query difficulty are described. The learned estimator predicts the expected precision of the query by analyzing the overlap between the results of the full query and the results of its sub-queries. The learned estimator predicts the expected precision of the query by analyzing the overlap between the results of the full query and the results of its sub-queries. We show that such an estimator can be used to several applications. We show that such an estimator can be used to several applications. Our results show that the quality of query prediction strongly depends on the query length. Our results show that the quality of query prediction strongly depends on the query length. One of the future work is to look for additional features not depend on the query length. One of the future work is to look for additional features not depend on the query length. Whether more training data can be accumulated in automatic or semi-automatic manner is left for future research. Whether more training data can be accumulated in automatic or semi-automatic manner is left for future research.