Learning to Rank for Information Retrieval

Learning to Rank for Information Retrieval

This Talk Learning to rank for information retrieval
Learning in vector space Mainly based on papers at SIGIR, WWW, ICML, NIPS, and KDD Papers at other conferences and journals might not be covered comprehensively.

Background knowledge Required
Information Retrieval Machine Learning Probability Theory Linear Algebra Optimization

Outline Ranking in Information Retrieval Learning to Rank Summary
IR Evaluation Conventional Ranking Models Learning to Rank Pointwise Approach Pairwise Approach Listwise Approach Summary

Ranking in Information Retrieval

We are Overwhelmed by Flood of Information

Information Retrieval
Predated the internet “As We May Think” (1945) Active research topic Vector Space Model (1970s) Probabilistic Models (1980s) Learning to Rank (Recent years)

Search Engine as A Tool

Inside Search Engine Indexed Document Repository
Ranked List of Documents Query Ranking model … …

IR Evaluation Objective A standard test set A measure
Evaluate the effectiveness of a ranking model A standard test set Contain a large number of (randomly sampled) queries, their associated documents, and the labels (relevance judgments) of these documents. A measure Evaluate the effectiveness of a ranking model for a particular query. Average the measure over the entire test set to represent the expected effectiveness of the model.

Widely-used Judgment Binary judgment Multi-level ratings
Relevant vs. Irrelevant Multi-level ratings Perfect > Excellent > Good > Fair > Bad Pairwise preferences Document A is more relevant than document B w.r.t. query q

Evaluation Measures MAP (Mean Average Precision)
NDCG (Normalized Discounted Cumulative Gain) MRR (Mean Reciprocal Rank) WTA (Winners Take All) … …

MAP Precision at position n Average precision
Query Ranked List Precision at position n Average precision MAP: averaged over all queries in the test set d1: Hsing Kuo University d2: Hong Kong University d3: 香港大學 d4: HKUST d5: HKU CS Department

Conventional Ranking Models
Category Example Models Similarity-based models Boolean Model Vector Space Model Latent Semantic Indexing … … Probabilistic models BM25 Model Language Model for IR Hyperlink-based models HITS PageRank

Language Model Answer the question: How likely the ith word in a sequence would occur given the identities of the preceding i-1 words? In speech recognition, such a model tries to predict the next word in a speech sequence Regarded as the probability distribution over sentences of m words

Language Model N-gram model
Example: the probability of the sentence “I like HKU very much” is approximated as unigram (n=1): bigram (n=2): trigram (n=3):

Language Model for IR View each document as a language sample and estimate the probabilities of producing terms (words) A query is treated as a generation process Retrieved documents are ranked based on

Language Model for IR How to estimate ? Maximum Likelihood Estimation!
Suppose we use unigram model, where is the number of occurrences of term in document , and is the total number of terms in document

Discussion on Conventional Ranking Models
For a particular model Manual parameter tuning is usually difficult, especially when there are many parameters. For comparison between two models Given a test set, it is difficult / unfair to compare two models if one is over-tuned while the other is not. For a collection of models There are hundreds of models proposed in the literature. It is non-trivial to combine them to produce a even more effective model.

Machine Learning Can Help
Automatically tune parameters. Combine multiple evidences. Avoid over-fitting. Customize retrieval. “Learning to Rank” Use machine learning techniques to train the ranking model. A hot research topic in recent years.

Learning to Rank Why didn’t it happen earilier?
Modern supervised ML has been around for about 15 years… Naïve Bayes has been around for about 45 years… There is some truth to the fact that the IR community wasn’t very connected to the ML community

Why wasn’t ML much needed?
Traditional ranking functions in IR used a very small number of features, e.g., Term frequency Inverse document frequency Document length It was easy to tune weighting coefficients by hand And people did

Why is ML needed now Modern systems – especially on the Web – use a great number of features: Log frequency of query word in anchor text? Query word in color on page? # of images on page? # of (out) links on page? PageRank of page? URL length? URL contains “~”? Page length? The New York Times ( ) quoted Amit Singhal as saying Google was using over 200 such features

Framework of Learning to Rank

Categorization of Learning to Rank Algorithms
Relation to conventional machine learning Regression Assume labels to be scores Classification Assume labels to have no orders

Ranking in IR has unique properties Relative order is important No need to predict category, or value of f(x). Position sensitive Top-ranked objects are more important.

Categorization: Basic Unit of Learning
Pointwise Input: single documents Output: scores or class labels Pairwise Input: document pairs Output: partial order preference Listwise Input: document collections Output: ranked document list

Pointwise Approach Pairwise Approach Listwise Approach Reduced to classification or regression Discriminative model for IR (SIGIR 2004) McRank(NIPS 2007) … Ranking SVM (ICANN 1999) RankBoost(JMLR 2003) LDM(SIGIR 2005) RankNet(ICML 2005) Frank (SIGIR 2007) GBRank(SIGIR 2007) QBRank(NIPS 2007) MPRank(ICML 2007)… Make use of unique properties of Ranking for IR IRSVM (SIGIR 2006) LambdaRank(NIPS 2006) AdaRank(SIGIR 2007) SVM-MAP (SIGIR 2007) SoftRank(LR4IR2007) GPRank(LR4IR2007) CCA (SIGIR 2007) RankCosine(IP&M 2007) ListNet(ICML 2007) ListMLE(ICML 2008)…

Learning to Rank Algorithms
Pointwise Approach: Ordinal Regression Pairwise Approach: Preference Learning Listwise Approach: Listwise Ranking

Ordinal Regression: A Pointwise Approach
Input space Features of a single document (w.r.t. a query): Output space Ordered categories:

Ordinal Regression vs. Regression/Classification
Learning Approach Output space Regression Real values Classification Non-ordered categories Ordinal regression Discrete values / Ordered categories Ordinal regression can be regarded as something between regression and classification

McRank (P. Li, C. Burges, et al. NIPS 2007)
Input data A training dataset contains queries. The jth query corresponds to URLs; each URL is manually labeled by one of the K = 5 relevance levels. Engineers have developed methodologies to construct “features” by combining the query and URLs, but the details are usually “trade secret”

How to convert classification results into ranking scores? Possibility 1: Classify each data point (i.e., Query + URL) into one of the K = 5 classes and rank the data points according to the class labels Data points with the same labels are arbitrarily ranked Lead to highly unstable ranking results Another solution?

Recall we assume a training dataset where the class label Learn the class probabilities and define a scoring function: where is some monotone (increasing) function of the relevance level k

Discussions Assumption
Relevance is absolute and query-independent E.g. documents associated with different queries are put into the same category, if their labels are all “fair” However, in practice, relevance is query-dependent An irrelevant document for a popular query might have higher term frequency than a relevant document for a rare query

Overview of Pairwise Approach
No longer assume absolute relevance Reduce ranking to classification on document pairs w.r.t. the same query Input space Document pairs: Output space Preference: Query Ranked List d1: Hsing Kuo University d2: Hong Kong University d3: 香港大學 d4: HKUST d5: HKU CS Department

Ranking SVM Consider the linear ranking function for query q:

Ranking SVM Input data Vector of features for each query+document Feature differences for two documents and If is judged more relevant than , assign the vector the class +1, otherwise -1

Discussions Progress made as compared to pointwise approach However,
No longer assume absolute relevance Using pairwise relationship to represent relative ranking However, Unique properties of ranking in IR have not been fully modeled

Problems with Pairwise Approach (1)
Number of instance pairs varies according to query Two queries in total Same error in terms of pairwise classification 780/790 = 98.73% Different errors in terms of query level evaluation 99% vs. 50%

Problems with Pairwise Approach (2)
Negative effects of making errors on top positions Same error in terms of pairwise classification Different errors in terms of position-discounted evaluation p: perfect, g: good, b: bad ideal: p g g b b b b ranking 1: g p g b b b b one wrong pair Worse ranking 2: p g b g b b b one wrong pair Better

Why Listwise Approach? By treating the list of documents associated with the same query as a learning instance, one can naturally obtain the rank (position) information And thus can embed more unique properties of ranking for IR in the learning process

Summary Ranking is a new application, and also a new machine learning problem Learning to rank: from pointwise, pairwise to listwise approach

Thank you!

Learning to Rank for Information Retrieval

Similar presentations

Presentation on theme: "Learning to Rank for Information Retrieval"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning to Rank for Information Retrieval

Similar presentations

Presentation on theme: "Learning to Rank for Information Retrieval"— Presentation transcript:

Similar presentations

About project

Feedback