Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tingdan Luo tl3xd@virginia.edu 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo tl3xd@virginia.edu.

Similar presentations


Presentation on theme: "Tingdan Luo tl3xd@virginia.edu 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo tl3xd@virginia.edu."— Presentation transcript:

1 Tingdan Luo tl3xd@virginia.edu 05/02/2016
Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo 05/02/2016

2 Offline Learning to Rank
Goal: Finding an optimal combination of features. Features: BM25, LM, PageRank... Offline Evaluation Metrics: MAP, NDCG, MRR… Downsides: Separate training and testing data Human annotated dataset Relevance change over time Train / Test split. Requires human annotations, dataset could be small and expensive. Relevance of documents to queries can change over time : news search.

3 Online Learning to Rank
Goal: Finding an optimal combination of features. Features: BM25, LM, PageRank... Learn from interactive feedback in real time Conduct training and testing at the same time No need to calculate MAP, NDCG, MRR… Adapt and improve the result to the changing preference. Mouse click, mouse movement Behavior-based metrics: clicks per query, time to first click, time to last click, abandon rate, reformulation rate

4 Ranker BM25 LM PageRank 1.3 0.8 2.5

5 How to generate a new ranker without calculation?
BM25 LM PageRank 1.3 0.8 2.5 How to generate a new ranker without calculation? Do not calculate MAP, NDCG, MRR

6 How to generate a new ranker without calculation? Random!
BM25 LM PageRank 1.3 0.8 2.5 How to generate a new ranker without calculation? Random!

7 Dueling Bandit Gradient Descent

8 Which Ranker is better?

9 Team-Draft Interleave
Which Ranker is better? Team-Draft Interleave

10 Dueling Bandit Gradient Descent

11 Experiment Result Making multiple comparisons per update has no impact on performance. However, sampling multiple queries is very realistic, since a search system might be constrained to, e.g.,making daily updates to their ranking function. Performance on the validation and test sets closely follows training set performance (so we omit their results). This implies that our method is not overfitting.

12 How to choose 𝛿 and 𝛾 ?

13 How to choose 𝛿 and 𝛾 ? the average (across all iterations) and final training

14 Theoretical Analysis Regret formulation
Regret: The performance gap between the proposed algorithm and the optimal algorithm A good algorithm should achieve sublinear regret in T , which implies decreasing average regret.

15 Regret Bound Lipschitz condition
Choosing to achieve the regret bound requires knowledge of t (i.e., L ), which is typically not known in practical settings. So sublinear regret is achievable using many choices for

16

17 Limitations Not efficient enough
One single random vector => large variance

18

19 Limitations Not efficient enough
Do not consider historical exploration One single random vector => large variance

20 Questions?


Download ppt "Tingdan Luo tl3xd@virginia.edu 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo tl3xd@virginia.edu."

Similar presentations


Ads by Google