Download presentation
Presentation is loading. Please wait.
Published byHope Singleton Modified over 6 years ago
1
Tingdan Luo tl3xd@virginia.edu 05/02/2016
Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo 05/02/2016
2
Offline Learning to Rank
Goal: Finding an optimal combination of features. Features: BM25, LM, PageRank... Offline Evaluation Metrics: MAP, NDCG, MRR… Downsides: Separate training and testing data Human annotated dataset Relevance change over time Train / Test split. Requires human annotations, dataset could be small and expensive. Relevance of documents to queries can change over time : news search.
3
Online Learning to Rank
Goal: Finding an optimal combination of features. Features: BM25, LM, PageRank... Learn from interactive feedback in real time Conduct training and testing at the same time No need to calculate MAP, NDCG, MRR… Adapt and improve the result to the changing preference. Mouse click, mouse movement Behavior-based metrics: clicks per query, time to first click, time to last click, abandon rate, reformulation rate
4
Ranker BM25 LM PageRank … 1.3 0.8 2.5
5
How to generate a new ranker without calculation?
BM25 LM PageRank … 1.3 0.8 2.5 How to generate a new ranker without calculation? Do not calculate MAP, NDCG, MRR
6
How to generate a new ranker without calculation? Random!
BM25 LM PageRank … 1.3 0.8 2.5 How to generate a new ranker without calculation? Random!
7
Dueling Bandit Gradient Descent
8
Which Ranker is better?
9
Team-Draft Interleave
Which Ranker is better? Team-Draft Interleave
10
Dueling Bandit Gradient Descent
11
Experiment Result Making multiple comparisons per update has no impact on performance. However, sampling multiple queries is very realistic, since a search system might be constrained to, e.g.,making daily updates to their ranking function. Performance on the validation and test sets closely follows training set performance (so we omit their results). This implies that our method is not overfitting.
12
How to choose 𝛿 and 𝛾 ?
13
How to choose 𝛿 and 𝛾 ? the average (across all iterations) and final training
14
Theoretical Analysis Regret formulation
Regret: The performance gap between the proposed algorithm and the optimal algorithm A good algorithm should achieve sublinear regret in T , which implies decreasing average regret.
15
Regret Bound Lipschitz condition
Choosing to achieve the regret bound requires knowledge of t (i.e., L ), which is typically not known in practical settings. So sublinear regret is achievable using many choices for
17
Limitations Not efficient enough
One single random vector => large variance
19
Limitations Not efficient enough
Do not consider historical exploration One single random vector => large variance
20
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.