Tingdan Luo tl3xd@virginia.edu 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo tl3xd@virginia.edu 05/02/2016
Offline Learning to Rank Goal: Finding an optimal combination of features. Features: BM25, LM, PageRank... Offline Evaluation Metrics: P@k, MAP, NDCG, MRR… Downsides: Separate training and testing data Human annotated dataset Relevance change over time Train / Test split. Requires human annotations, dataset could be small and expensive. Relevance of documents to queries can change over time : news search.
Online Learning to Rank Goal: Finding an optimal combination of features. Features: BM25, LM, PageRank... Learn from interactive feedback in real time Conduct training and testing at the same time No need to calculate P@k, MAP, NDCG, MRR… Adapt and improve the result to the changing preference. Mouse click, mouse movement Behavior-based metrics: clicks per query, time to first click, time to last click, abandon rate, reformulation rate
Ranker BM25 LM PageRank … 1.3 0.8 2.5
How to generate a new ranker without calculation? BM25 LM PageRank … 1.3 0.8 2.5 How to generate a new ranker without calculation? Do not calculate P@k, MAP, NDCG, MRR
How to generate a new ranker without calculation? Random! BM25 LM PageRank … 1.3 0.8 2.5 How to generate a new ranker without calculation? Random!
Dueling Bandit Gradient Descent
Which Ranker is better?
Team-Draft Interleave Which Ranker is better? Team-Draft Interleave
Dueling Bandit Gradient Descent
Experiment Result Making multiple comparisons per update has no impact on performance. However, sampling multiple queries is very realistic, since a search system might be constrained to, e.g.,making daily updates to their ranking function. Performance on the validation and test sets closely follows training set performance (so we omit their results). This implies that our method is not overfitting.
How to choose 𝛿 and 𝛾 ?
How to choose 𝛿 and 𝛾 ? the average (across all iterations) and final training NDCG@10
Theoretical Analysis Regret formulation Regret: The performance gap between the proposed algorithm and the optimal algorithm A good algorithm should achieve sublinear regret in T , which implies decreasing average regret.
Regret Bound Lipschitz condition Choosing to achieve the regret bound requires knowledge of t (i.e., L ), which is typically not known in practical settings. So sublinear regret is achievable using many choices for
Limitations Not efficient enough One single random vector => large variance
Limitations Not efficient enough Do not consider historical exploration One single random vector => large variance
Questions?