Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007

Slides:

Advertisements

Similar presentations

A Support Vector Method for Optimizing Average Precision

Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.

Sequential Minimal Optimization Advanced Machine Learning Course 2012 Fall Semester Tsinghua University.

Introduction to Information Retrieval

Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:

On-line learning and Boosting

Linear Classifiers (perceptrons)

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.

Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning Presented by Pinar Donmez joint work with Jaime G. Carbonell Language Technologies.

Lecture: Dudu Yanay.  Input: Each instance is associated with a rank or a rating, i.e. an integer from ‘1’ to ‘K’.  Goal: To find a rank-prediction.

Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK.

Sparse vs. Ensemble Approaches to Supervised Learning

Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.

Fine-tuning Ranking Models: a two-step optimization approach Vitor Jan 29, 2008 Text Learning Meeting - CMU With invaluable ideas from ….

Sparse vs. Ensemble Approaches to Supervised Learning

Linear Discriminant Functions Chapter 5 (Duda et al.)

The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.

Evaluation of Image Retrieval Results Relevant: images which meet user’s information need Irrelevant: images which don’t meet user’s information need Query:

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

Online Learning Algorithms

Machine Learning CS 165B Spring 2012

TransRank: A Novel Algorithm for Transfer of Rank Learning Depin Chen, Jun Yan, Gang Wang et al. University of Science and Technology of China, USTC Machine.

Learning to Rank for Information Retrieval

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.

1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models Kai-Wei Chang and Dan Roth Experiment Settings Block Minimization.

Karthik Raman, Pannaga Shivaswamy & Thorsten Joachims Cornell University 1.

Benk Erika Kelemen Zsolt

윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.

1 A fast algorithm for learning large scale preference relations Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park Balaji Krishnapuram.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical.

Learning to Rank From Pairwise Approach to Listwise Approach.

Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.

ASSOCIATIVE BROWSING Evaluating 1 Jinyoung Kim / W. Bruce Croft / David Smith for Personal Information.

NTU & MSRA Ming-Feng Tsai

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

Autumn Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University.

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008 Annotations by Michael L. Nelson.

Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Sampath Jayarathna Cal Poly Pomona

Ranking and Learning 293S UCSB, Tao Yang, 2017

Ranking and Learning 290N UCSB, Tao Yang, 2014

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Evaluation of IR Systems

An Empirical Study of Learning to Rank for Entity Search

Tingdan Luo 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo

Learning to Rank Shubhra kanti karmaker (Santu)

CS 4/527: Artificial Intelligence

Applying Key Phrase Extraction to aid Invalidity Search

Large Scale Support Vector Machines

Ensemble learning.

Ensemble learning Reminder - Bagging of Trees Random Forest

Feature Selection for Ranking

CMU Y2 Rosetta GnG Distillation

Artificial Intelligence 9. Perceptron

Learning to Rank with Ties

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Presentation transcript:

Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007 Fast Learning of Document Ranking Functions with the Committee Perceptron Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007 Joint work w/ Jaime, Vitor

Briefly… Joint work with Vitor Carvalho and Jaime Carbonell Submitted to Web Search and Data Mining conference (WSDM 2008) http://wsdm2008.org

Evolution of Features in IR “In the beginning, there was TF…” It became clear that other features were needed for effective document ranking: IDF, document length… Along came HTML: doc. structure & link network features… Now, we have collective annotation: social book-marking features…

Challenges Which features are important? How to best choose the weights for each feature? With just a few features, manual tuning or parameter sweeps sufficed. This approach becomes impractical with more than 5-6 features.

Learning Approach to Setting Feature Weights Goal: Utilize existing relevance judgments to learn optimal weight setting Recently has become a hot research area in IR. “Learning to Rank” (See SIGIR 2007 Learning To Rank workshop http://research.microsoft.com/users/LR4IR-2007/) Gradient Descent, SVMs, etc. Many approaches to this, but pairwise preference learning has emerged as a favored approach.

Pair-wise Preference Learning Learning a document scoring function Treated as a classification problem on pairs of documents: Resulting scoring function is used as the learned document ranker. Assume our ranking function is of the form: Where Is a vector of feature values for this document-query pair Correct Why pair-wise preference instead of list-wise or classifying rel/nonrel? (1) allows application of existing classification techniques (2) from a operational perspective, it may be easier/more intuitive to collect preference data rather than forcing users to put documents into some graded relevance scale (3) it works better than classifying rel/nonrel Incorrect

Perceptron Algorithm Proposed in 1958 by Rosenblatt Online algorithm (instance-at-a-time) Whenever a ranking mistake is made, update the hypothesis: Provable mistake bounds & convergence Online -> fast training, instance at a time

Perceptron Algorithm Variants Pocket Perceptron (Gallant, 1990) Keep the one-best hypothesis Voted Perceptron (Freund & Schapire, 1999) Keep all the intermediate hypotheses and combine them at the end Often in practice, average hypotheses Ways to convert the online perceptron learner to a batch algorithm. AND improve the stability when applied to non-separable data. In reality, true voting isn’t always practical with the voted perceptron -- there might be MANY hypotheses. Averaging is use

Committee Perceptron Algorithm Ensemble method Selectively chooses N best hypotheses encountered during training Significant advantages over previous perceptron variants Many ways to combine output of hypotheses Voting, score averaging, hybrid approaches Weight by a retrieval performance metric Our approach shows performance improvements over existing rank learning algorithms with a Significant reduction in training time -- 45 TIMES faster

Committee Perceptron Training Training Data Committee q, dR, dN Current Hypothesis R N

Committee Perceptron Training Training Data Committee q, dR, dN Current Hypothesis R N

Committee Perceptron Training Training Data Committee q, dR, dN Current Hypothesis R N If current hypothesis better than worst: Replace worst hypothesis in committee Otherwise: discard current hypothesis Update current hypothesis to better classify this training example

Committee Perceptron Training Training Data Committee q, dR, dN Current Hypothesis R N If current hypothesis better than worst: Replace worst hypothesis in committee Otherwise: discard current hypothesis Update current hypothesis to better classify this training example

Evaluation Compared Committee Perceptron to RankSVM (Joachims et. al., 2002) RankBoost (Freund et. al., 2003) Learning To Rank (LETOR) dataset: http://research.microsoft.com/users/tyliu/LETOR/default.aspx Provides three test collections, standardized feature sets, train/validation/test splits

Committee Perceptron Learning Curves Committee/Ensemble approach a better solution faster than existing perceptron variants

Committee Perceptron Performance Comparable or better performance than two state-of-the-art batch leaning algorithms **Added Bonus: more than 45 times faster training time than RankSVM

Committee Perceptron Performance (OHSUMED)

Committee Perceptron Performance (TD2004)

Committee Perceptron Training Time Much faster than other rank learning algorithms. Training time on OHSUMED dataset: CP: ~450 seconds for 50 iterations RankSVM: > 21k seconds 45-fold reduction in training time with comparable performance. CP in java, rankSVM in C/C++

Committee Perceptron: Summary CP is a fast perceptron-based learning algorithm, applied to document ranking. Significantly outperforms the pocket and average perceptron variants on learning document ranking functions. Performs comparably to two strong baseline rank learning algorithms, but trains in much less time.

Future Directions Performance of the Committee Perceptron is good, but it could be better What are we really optimizing? (not MAP or NDCG…)

Loss Functions for Pairwise Preference Learners Minimizing the number of mis-ranked document pairs This only loosely corresponds to ranked-based evaluation measures Problem: All rank positions treated the same

Problems with Optimizing the Wrong Metric Best MAP Best BPREF TREC BLOG data, full parameter sweep over 8 training queries, 5 features Optimizing BPREF (close to mis-ranked document pairs) results in a very sub-optimal MAP -- 0.501 vs. 0.450 MAP

Ranked Retrieval Pairwise- Preference Loss Functions Average Precision places more emphasis on higher-ranked documents.

Ranked Retrieval Pairwise- Preference Loss Functions Average Precision places more emphasis on higher-ranked documents. Re-writing AP as a pairwise loss function:

Preliminary Results Using MAP-Loss Using Pairs-Loss

Questions?