Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007

Similar presentations


Presentation on theme: "Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007"— Presentation transcript:

1 Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007
Fast Learning of Document Ranking Functions with the Committee Perceptron Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007 Joint work w/ Jaime, Vitor

2 Briefly… Joint work with Vitor Carvalho and Jaime Carbonell
Submitted to Web Search and Data Mining conference (WSDM 2008)

3 Evolution of Features in IR
“In the beginning, there was TF…” It became clear that other features were needed for effective document ranking: IDF, document length… Along came HTML: doc. structure & link network features… Now, we have collective annotation: social book-marking features…

4 Challenges Which features are important? How to best choose the weights for each feature? With just a few features, manual tuning or parameter sweeps sufficed. This approach becomes impractical with more than 5-6 features.

5 Learning Approach to Setting Feature Weights
Goal: Utilize existing relevance judgments to learn optimal weight setting Recently has become a hot research area in IR. “Learning to Rank” (See SIGIR 2007 Learning To Rank workshop Gradient Descent, SVMs, etc. Many approaches to this, but pairwise preference learning has emerged as a favored approach.

6 Pair-wise Preference Learning
Learning a document scoring function Treated as a classification problem on pairs of documents: Resulting scoring function is used as the learned document ranker. Assume our ranking function is of the form: Where Is a vector of feature values for this document-query pair Correct Why pair-wise preference instead of list-wise or classifying rel/nonrel? (1) allows application of existing classification techniques (2) from a operational perspective, it may be easier/more intuitive to collect preference data rather than forcing users to put documents into some graded relevance scale (3) it works better than classifying rel/nonrel Incorrect

7 Perceptron Algorithm Proposed in 1958 by Rosenblatt
Online algorithm (instance-at-a-time) Whenever a ranking mistake is made, update the hypothesis: Provable mistake bounds & convergence Online -> fast training, instance at a time

8 Perceptron Algorithm Variants
Pocket Perceptron (Gallant, 1990) Keep the one-best hypothesis Voted Perceptron (Freund & Schapire, 1999) Keep all the intermediate hypotheses and combine them at the end Often in practice, average hypotheses Ways to convert the online perceptron learner to a batch algorithm. AND improve the stability when applied to non-separable data. In reality, true voting isn’t always practical with the voted perceptron -- there might be MANY hypotheses. Averaging is use

9 Committee Perceptron Algorithm
Ensemble method Selectively chooses N best hypotheses encountered during training Significant advantages over previous perceptron variants Many ways to combine output of hypotheses Voting, score averaging, hybrid approaches Weight by a retrieval performance metric Our approach shows performance improvements over existing rank learning algorithms with a Significant reduction in training time -- 45 TIMES faster

10 Committee Perceptron Training
Training Data Committee q, dR, dN Current Hypothesis R N

11 Committee Perceptron Training
Training Data Committee q, dR, dN Current Hypothesis R N

12 Committee Perceptron Training
Training Data Committee q, dR, dN Current Hypothesis R N If current hypothesis better than worst: Replace worst hypothesis in committee Otherwise: discard current hypothesis Update current hypothesis to better classify this training example

13 Committee Perceptron Training
Training Data Committee q, dR, dN Current Hypothesis R N If current hypothesis better than worst: Replace worst hypothesis in committee Otherwise: discard current hypothesis Update current hypothesis to better classify this training example

14 Evaluation Compared Committee Perceptron to
RankSVM (Joachims et. al., 2002) RankBoost (Freund et. al., 2003) Learning To Rank (LETOR) dataset: Provides three test collections, standardized feature sets, train/validation/test splits

15 Committee Perceptron Learning Curves
Committee/Ensemble approach a better solution faster than existing perceptron variants

16 Committee Perceptron Performance
Comparable or better performance than two state-of-the-art batch leaning algorithms **Added Bonus: more than 45 times faster training time than RankSVM

17 Committee Perceptron Performance (OHSUMED)

18 Committee Perceptron Performance (TD2004)

19 Committee Perceptron Training Time
Much faster than other rank learning algorithms. Training time on OHSUMED dataset: CP: ~450 seconds for 50 iterations RankSVM: > 21k seconds 45-fold reduction in training time with comparable performance. CP in java, rankSVM in C/C++

20 Committee Perceptron: Summary
CP is a fast perceptron-based learning algorithm, applied to document ranking. Significantly outperforms the pocket and average perceptron variants on learning document ranking functions. Performs comparably to two strong baseline rank learning algorithms, but trains in much less time.

21 Future Directions Performance of the Committee Perceptron is good, but it could be better What are we really optimizing? (not MAP or NDCG…)

22 Loss Functions for Pairwise Preference Learners
Minimizing the number of mis-ranked document pairs This only loosely corresponds to ranked-based evaluation measures Problem: All rank positions treated the same

23 Problems with Optimizing the Wrong Metric
Best MAP Best BPREF TREC BLOG data, full parameter sweep over 8 training queries, 5 features Optimizing BPREF (close to mis-ranked document pairs) results in a very sub-optimal MAP vs MAP

24 Ranked Retrieval Pairwise- Preference Loss Functions
Average Precision places more emphasis on higher-ranked documents.

25 Ranked Retrieval Pairwise- Preference Loss Functions
Average Precision places more emphasis on higher-ranked documents. Re-writing AP as a pairwise loss function:

26 Preliminary Results Using MAP-Loss Using Pairs-Loss

27 Questions?


Download ppt "Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007"

Similar presentations


Ads by Google