CS276: Information Retrieval and Web Search

Slides:

Advertisements

Similar presentations

A Support Vector Method for Optimizing Average Precision

Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.

Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning, Pandu Nayak,

Introduction to Information Retrieval

Information Retrieval Models: Probabilistic Models

1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 15-2: Learning to Rank 1.

Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK.

K nearest neighbor and Rocchio algorithm

Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.

Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.

INFO 624 Week 3 Retrieval System Evaluation

Scalable Text Mining with Sparse Generative Models

TransRank: A Novel Algorithm for Transfer of Rank Learning Depin Chen, Jun Yan, Gang Wang et al. University of Science and Technology of China, USTC Machine.

Learning to Rank for Information Retrieval

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,

CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.

Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.

Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.

윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.

Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.

Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,

Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.

Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

CSCI 5417 Information Retrieval Systems Jim Martin Lecture 20 11/3/2011.

Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.

NTU & MSRA Ming-Feng Tsai

A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.

Roughly overview of Support vector machines Reference: 1.Support vector machines and machine learning on documents. Christopher D. Manning, Prabhakar Raghavan.

Queensland University of Technology

Ranking and Learning 293S UCSB, Tao Yang, 2017

CS276: Information Retrieval and Web Search

Ranking and Learning 290N UCSB, Tao Yang, 2014

Information Organization: Overview

Semi-Supervised Clustering

Large-Scale Content-Based Audio Retrieval from Text Queries

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Evaluation of IR Systems

An Empirical Study of Learning to Rank for Entity Search

Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.

IST 516 Fall 2011 Dongwon Lee, Ph.D.

COMP61011 : Machine Learning Ensemble Models

Data Mining Lecture 11.

Learning to Rank Shubhra kanti karmaker (Santu)

CS 4/527: Artificial Intelligence

Information Retrieval Models: Probabilistic Models

INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID

INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID

Most slides were adapted from Stanford CS 276 course.

Feature Selection for Ranking

Information Retrieval

Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007

Information Organization: Overview

Learning to Rank with Ties

Information Retrieval and Web Design

Learning to Rank using Language Models and SVMs

SVMs for Document Ranking

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID

Presentation transcript:

CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar Raghavan Lecture 15: Learning to Rank

Machine learning for IR ranking? Sec. 15.4 Machine learning for IR ranking? We’ve looked at methods for ranking documents in IR Cosine similarity, inverse document frequency, pivoted document length normalization, Pagerank, … We’ve looked at methods for classifying documents using supervised machine learning classifiers Naïve Bayes, Rocchio, kNN, SVMs Surely we can also use machine learning to rank the documents displayed in search results? Sounds like a good idea A.k.a. “machine-learned relevance” or “learning to rank”

Machine learning for IR ranking This “good idea” has been actively researched – and actively deployed by the major web search engines – in the last 5 years Why didn’t it happen earlier? Modern supervised ML has been around for about 15 years… Naïve Bayes has been around for about 45 years…

Machine learning for IR ranking There’s some truth to the fact that the IR community wasn’t very connected to the ML community But there were a whole bunch of precursors: Wong, S.K. et al. 1988. Linear structure in information retrieval. SIGIR 1988. Fuhr, N. 1992. Probabilistic methods in information retrieval. Computer Journal. Gey, F. C. 1994. Inferring probability of relevance using the method of logistic regression. SIGIR 1994. Herbrich, R. et al. 2000. Large Margin Rank Boundaries for Ordinal Regression. Advances in Large Margin Classifiers.

Why weren’t early attempts very successful/influential? Sometimes an idea just takes time to be appreciated… Limited training data Especially for real world use (as opposed to writing academic papers), it was very hard to gather test collection queries and relevance judgments that are representative of real user needs and judgments on documents returned This has changed, both in academia and industry Poor machine learning techniques Insufficient customization to IR problem Not enough features for ML to show value

Why wasn’t ML much needed? Traditional ranking functions in IR used a very small number of features, e.g., Term frequency Inverse document frequency Document length It was easy to tune weighting coefficients by hand And people did

Why is ML needed now Modern systems – especially on the Web – use a great number of features: Arbitrary useful features – not a single unified model Log frequency of query word in anchor text? Query word in color on page? # of images on page? # of (out) links on page? PageRank of page? URL length? URL contains “~”? Page edit recency? Page length? The New York Times (2008-06-03) quoted Amit Singhal as saying Google was using over 200 such features.

Simple example: Using classification for ad hoc IR Sec. 15.4.1 Simple example: Using classification for ad hoc IR Collect a training corpus of (q, d, r) triples Relevance r is here binary (but may be multiclass, with 3–7 values) Document is represented by a feature vector x = (α, ω) α is cosine similarity, ω is minimum query window size ω is the size of the shortest text span that includes all query words (smaller the window size, the more “query term proximity” Train a machine learning model to predict the class r of a document-query pair

Simple example: Using classification for ad hoc IR Sec. 15.4.1 Simple example: Using classification for ad hoc IR Build a classifier trained on this training data Manning et al: SVM; Segaran: Neural Nets; you have experience with Naïve Bayes

Sec. 15.4.2 “Learning to rank” Classification probably isn’t the right way to think about approaching ad hoc IR: Classification problems: Map to a unordered set of classes Regression problems: Map to a real value Ordinal regression problems: Map to an ordered set of classes This formulation gives extra power: Relations between relevance levels are modeled Documents are good versus other documents for query given collection; not an absolute scale of goodness

“Learning to rank” Assume a number of categories C of relevance exist These are totally ordered: c1 < c2 < … < cJ This is the ordinal regression setup Assume training data is available consisting of document-query pairs represented as feature vectors ψi and relevance ranking ci We could do point-wise learning, where we try to map items of a certain relevance rank to a subinterval (e.g, Crammer et al. 2002 PRank) But most work does pair-wise learning, where the input is a pair of results for a query, and the class is the relevance ordering relationship between them

The Ranking SVM [Herbrich et al. 1999, 2000; Joachims et al. 2002] Sec. 15.4.2 The Ranking SVM [Herbrich et al. 1999, 2000; Joachims et al. 2002] Aim is to classify instance pairs as correctly ranked or incorrectly ranked This turns an ordinal regression problem back into a binary classification problem We aren’t covering SVM, so details excluded.

Adapting the Ranking SVM for (successful) Information Retrieval [Yunbo Cao, Jun Xu, Tie-Yan Liu, Hang Li, Yalou Huang, Hsiao-Wuen Hon SIGIR 2006] A Ranking SVM model already works well Using things like vector space model scores as features As we shall see, it outperforms them in evaluations But it does not model important aspects of practical IR well This paper addresses two customizations of the Ranking SVM to fit an IR utility model

The ranking SVM fails to model the IR problem well… Correctly ordering the most relevant documents is crucial to the success of an IR system, while misordering less relevant results matters little The ranking SVM considers all ordering violations as the same Some queries have many (somewhat) relevant documents, and other queries few. If we treat all pairs of results for a query equally, queries with many results will dominate the learning But actually queries with few relevant results are at least as important to do well on

LETOR test collection From Microsoft Research Asia An openly available standard test collection with pregenerated features, baselines, and research results for learning to rank It’s availability has really driven research in this area Includes OHSUMED, MEDLINE 350,000 articles 106 queries 16,140 query-document pairs 3 class judgments: Definitely relevant (DR), Partially Relevant (PR), Non-Relevant (NR) Includes TREC GOV collection 1 million web pages 125 queries

Other machine learning methods for learning to rank Other machine learning methods have also been used successfully Boosting: RankBoost Ordinal regression loglinear models Neural nets: RankNet

Summary The idea of learning ranking functions has been around for about 20 years But only recently have ML knowledge, availability of training datasets, a rich space of features, and massive computation come together to make this a hot research area It’s too early to give a definitive statement on what methods are best in this area … it’s still advancing rapidly But machine learned ranking over many features now easily beats traditional hand-designed ranking functions in comparative evaluations [in part by using the hand-designed functions as features!] And there is every reason to think that the importance of machine learning in IR will only increase in the future.

Resources IIR secs 6.1.2–3 and 15.4 LETOR benchmark datasets Website with data, links to papers, benchmarks, etc. http://research.microsoft.com/users/LETOR/ Everything you need to start research in this area! Nallapati, R. Discriminative models for information retrieval. SIGIR 2004. Cao, Y., Xu, J. Liu, T.-Y., Li, H., Huang, Y. and Hon, H.-W. Adapting Ranking SVM to Document Retrieval, SIGIR 2006. Y. Yue, T. Finley, F. Radlinski, T. Joachims. A Support Vector Method for Optimizing Average Precision. SIGIR 2007.