A Machine Learning Approach for Improved BM25 Retrieval

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Improved TF-IDF Ranker
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Evaluating Search Engine
Information Retrieval in Practice
Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
Physical Database Monitoring and Tuning the Operational System.
Overview of Search Engines
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Adapting Deep RankNet for Personalized Search
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
Know your Neighbors: Web Spam Detection Using the Web Topology Presented By, SOUMO GORAI Carlos Castillo(1), Debora Donato(1), Aristides Gionis(1), Vanessa.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Evaluating Search Engines in chapter 8 of the book Search Engines Information Retrieval in Practice Hongfei Yan.
Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Querying Structured Text in an XML Database By Xuemei Luo.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Adish Singla, Microsoft Bing Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
Collecting High Quality Overlapping Labels at Low Cost Grace Hui Yang Language Technologies Institute Carnegie Mellon University Anton Mityagin Krysta.
Enhancing Web Search by Promoting Multiple Search Engine Use Ryen W. W., Matthew R. Mikhail B. (Microsoft Research) Allison P. H (Rice University) SIGIR.
AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.
Post-Ranking query suggestion by diversifying search Chao Wang.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval Ronan Cummins, Colm O’Riordan Digital Enterprise Research Institute SIGIR.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Autumn Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008 Annotations by Michael L. Nelson.
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
Information Retrieval in Practice
Information Retrieval in Practice
Evaluation of IR Systems
An Empirical Study of Learning to Rank for Entity Search
Learning to Rank Shubhra kanti karmaker (Santu)
Intent-Aware Semantic Query Annotation
Data Integration for Relational Web
ISWC 2013 Entity Recommendations in Web Search
1Micheal T. Adenibuyan, 2Oluwatoyin A. Enikuomehin and 2Benjamin S
INF 141: Information Retrieval
Learning to Rank with Ties
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
A Neural Passage Model for Ad-hoc Document Retrieval
Presentation transcript:

A Machine Learning Approach for Improved BM25 Retrieval Krysta M. Svore Microsoft Research One Microsoft Way Redmond, WA 98052 ksvore@microsoft.com Christopher J. C. Burges Microsoft Research One Microsoft Way Redmond, WA 98052 cburges@microsoft.com CIKM’09, November 2–6, 2009, Hong Kong, China. Presenter: SHIH KAI-WUN

Outline 1. Introduction 2. Related Work 3. Document Fields 4. BM25 5. Learning A BM25-Style Function 6. Data And Evaluation 7. Experiments And Results 8. Conclusions And Future Work

Introduction (1/2) BM25 is arguably one of the most important and widely used information retrieval functions. BM25F is an extension of BM25 that prescribes how to compute BM25 across a document description over several fields. Recently, it has been shown that LambdaRank is empirically optimal for NDCG and other IR measures.

Introduction (2/2) Our primary contributions are threefold: We empirically determine the effectiveness of BM25 for different fields. We develop a data-driven machine learning model called LambdaBM25 that is based on the attributes of BM25 and the training method of LambdaRank . We extend our empirical analysis to a document description over various field combinations.

Related Work A drawback of BM25 and BM25F is the difficulty in optimizing the function parameters for a given information retrieval measure. There have been extensive studies on how to set term frequency saturation and length normalization parameters. Our work provides both an extensive study of the contributions of different document fields to information retrieval and a framework for improving BM25-style retrieval.

Document Fields (1/2) A Web document description is composed of several fields of information. The document frequency for term t is the number of documents in the collection that contain term t in their document descriptions. Term frequency is calculated per term and per field by counting the number of occurrences of term t in field F of the document. Field length is the number of terms in the field.

Document Fields (2/2) The query click field is built from query session data extracted from one year of a commercial search engine’s query log files and is represented by a set of query-score pairs (q, Score(d, q)), where q is a unique query string and Score(d, q) is derived from raw session click data as The term frequency of term t for the query click field is calculated as where p is the set of query-score pairs.

BM25 (1/2) BM25 is a function of term frequencies, document frequencies, and the field length for a single field. BM25F is an extension of BM25 to a document description over multiple fields; we refer to both functions as BM25F. BM25F is computed for document d with description over fields F and query q as The sum is over all terms t in query q. It is the Robertson-Sparck-Jones inverse document frequency of term t:

BM25 (2/2) We calculate document frequency over the body field for all document frequency attributes. TFt is a term frequency saturation formula: , where f is calculated as , βF accounts for varying field lengths: BM25F requires the tuning of 2K + 1 parameters, when calculated across K fields, namely k, bF , and wF .

Learning A BM25-Style Function (1/2) We now describe our simple machine learning ranking model that uses the input attributes of BM25F and the training method of LambdaRank. LambdaRank is trained on pairs of documents per query, where documents in a pair have different relevance labels. LambdaRank leverages the fact that neural net training only needs the gradients of the measure with respect to the model scores, and not the function itself, thus avoiding the problem of direct optimization.

Learning A BM25-Style Function (2/2) We directly address these challenges by introducing a new machine learning approach to BM25-like retrieval called LambdaBM25. LambdaBM25 has the flexibility to learn complex relationships between attributes directly from the data. We develop our model as follows. We train our model using LambdaRank and the same input attributes as BM25. We train single- and two-layer LambdaRank neural nets to optimize for NDCG with varying numbers of hidden nodes chosen using the validation set.

Data And Evaluation Our train/validation/test data contains 67683/11911/12185 queries, respectively. Each query-URL pair has a human-generated label between 0, meaning d is not relevant to q, and 4, meaning document d is the most relevant to query q. We evaluate using Normalized Discounted Cumulative Gain(NDCG) at truncation levels 1, 3, and 10. Mean NDCG is defined for query q as

Experiments And Results (1/3) In the upper first three columns of Table 1, We find Title(T), URL(U), and Body(B) are equally effective and popularity fields achieve higher NDCG. In particular, the query click field achieves the highest NDCG accuracy. Table 1 contains results for single-layer LambdaBM25F . Table 1: Mean NDCG accuracy results on the test set for BM25F , 1-layer LambdaBM25F , and 2-layer LambdaBM25F for single fields and multiple field combinations. Statistical significance is determined at the 95% confidence level using a t-test. Bold indicates statistical significance over the corresponding BM25F model. Italic indicates statistical significance of the corresponding BM25F model over the LambdaBM25F model. Parentheses indicate no statistically significant difference.

Experiments And Results (2/3) We train our two-layer LambdaBM25F model and determine if it can outperform BM25F . We find the following numbers of hidden nodes to be best: Title(10), URL(15), Body(15), Anchor(15), Click(5). The first three lower columns of Table 1 list the results of BM25F on various field combinations. Table 1: Mean NDCG accuracy results on the test set for BM25F , 1-layer LambdaBM25F , and 2-layer LambdaBM25F for single fields and multiple field combinations. Statistical significance is determined at the 95% confidence level using a t-test. Bold indicates statistical significance over the corresponding BM25F model. Italic indicates statistical significance of the corresponding BM25F model over the LambdaBM25F model. Parentheses indicate no statistically significant difference.

Experiments And Results (3/3) As shown in the lower middle columns of Table 1, we find that BM25F performs as well or better than single-layer LambdaBM25F for all field combinations. Finally, we train our two-layer LambdaBM25F models using 15 hidden nodes. For every field combination, as shown in Table 1, LambdaBM25F achieves gains with statistical significance over the corresponding BM25F model.

Conclusions And Future Work Ourmain contribution is a new information retrieval model trained using LambdaRank and the input attributes of BM25 called LambdaBM25F. LambdaBM25F optimizes directly for the chosen target IR evaluation measure and avoids the necessity of parameter tuning. Our model is general and can potentially act as a framework for modelling other retrieval functions. In the future we would like to perform more extensive studies to determine the relative importance of attributes in our model.