Tingdan Luo tl3xd@virginia.edu 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo tl3xd@virginia.edu.

Tingdan Luo tl3xd@virginia.edu 05/02/2016
Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo 05/02/2016

Offline Learning to Rank
Goal: Finding an optimal combination of features. Features: BM25, LM, PageRank... Offline Evaluation Metrics: MAP, NDCG, MRR… Downsides: Separate training and testing data Human annotated dataset Relevance change over time Train / Test split. Requires human annotations, dataset could be small and expensive. Relevance of documents to queries can change over time : news search.

Online Learning to Rank
Goal: Finding an optimal combination of features. Features: BM25, LM, PageRank... Learn from interactive feedback in real time Conduct training and testing at the same time No need to calculate MAP, NDCG, MRR… Adapt and improve the result to the changing preference. Mouse click, mouse movement Behavior-based metrics: clicks per query, time to first click, time to last click, abandon rate, reformulation rate

Ranker BM25 LM PageRank … 1.3 0.8 2.5

How to generate a new ranker without calculation?
BM25 LM PageRank … 1.3 0.8 2.5 How to generate a new ranker without calculation? Do not calculate MAP, NDCG, MRR

How to generate a new ranker without calculation? Random!
BM25 LM PageRank … 1.3 0.8 2.5 How to generate a new ranker without calculation? Random!

Dueling Bandit Gradient Descent

Which Ranker is better?

Team-Draft Interleave
Which Ranker is better? Team-Draft Interleave

Dueling Bandit Gradient Descent

Experiment Result Making multiple comparisons per update has no impact on performance. However, sampling multiple queries is very realistic, since a search system might be constrained to, e.g.,making daily updates to their ranking function. Performance on the validation and test sets closely follows training set performance (so we omit their results). This implies that our method is not overfitting.

How to choose 𝛿 and 𝛾 ?

How to choose 𝛿 and 𝛾 ? the average (across all iterations) and final training

Theoretical Analysis Regret formulation
Regret: The performance gap between the proposed algorithm and the optimal algorithm A good algorithm should achieve sublinear regret in T , which implies decreasing average regret.

Regret Bound Lipschitz condition
Choosing to achieve the regret bound requires knowledge of t (i.e., L ), which is typically not known in practical settings. So sublinear regret is achievable using many choices for

Limitations Not efficient enough
One single random vector => large variance

Limitations Not efficient enough
Do not consider historical exploration One single random vector => large variance

Questions?

Tingdan Luo tl3xd@virginia.edu 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo tl3xd@virginia.edu.

Similar presentations

Presentation on theme: "Tingdan Luo tl3xd@virginia.edu 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo tl3xd@virginia.edu."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tingdan Luo tl3xd@virginia.edu 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo tl3xd@virginia.edu.

Similar presentations

Presentation on theme: "Tingdan Luo tl3xd@virginia.edu 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo tl3xd@virginia.edu."— Presentation transcript:

Similar presentations

About project

Feedback