Online Rank Elicitation for Plackett- Luce: A Dueling Bandits Approach Balázs Szörényi Technion, Haifa, Israel / MTA-SZTE Research Group on Artificial.

Slides:

Advertisements

Similar presentations

Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.

Advertisements

Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Reinforcement Learning (II.) Exercise Solutions Ata Kaban School of Computer Science University of Birmingham 2003.

Reinforcement Learning

Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.

Beat the Mean Bandit Yisong Yue (CMU) & Thorsten Joachims (Cornell)

Acoustic design by simulated annealing algorithm

Meeting 3 POMDP (Partial Observability MDP) 資工四阮鶴鳴李運寰 Advisor: 李琳山教授.

Using Monte Carlo and Directional Sampling combined with an Adaptive Response Surface for system reliability evaluation L. Schueremans, D. Van Gemert

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

Matchin: Eliciting User Preferences with an Online Game Severin Hacker, and Luis von Ahn Carnegie Mellon University SIGCHI 2009.

Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University

Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.

Lecture: Dudu Yanay.  Input: Each instance is associated with a rank or a rating, i.e. an integer from ‘1’ to ‘K’.  Goal: To find a rank-prediction.

Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.

XYZ 6/18/2015 MIT Brain and Cognitive Sciences Convergence Analysis of Reinforcement Learning Agents Srinivas Turaga th March, 2004.

Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.

Josu Ceberio Alexander Mendiburu Jose A. Lozano

Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.

Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.

Image Registration of Very Large Images via Genetic Programming Sarit Chicotay Omid E. David Nathan S. Netanyahu CVPR ‘14 Workshop on Registration of Very.

Determining the Significance of Item Order In Randomized Problem Sets Zachary A. Pardos, Neil T. Heffernan Worcester Polytechnic Institute Department of.

CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes April 3, 2012.

Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Introduction to Adaptive Digital Filters Algorithms

6. Experimental Analysis Visible Boltzmann machine with higher-order potentials: Conditional random field (CRF): Exponential random graph model (ERGM):

Surface Simplification Using Quadric Error Metrics Michael Garland Paul S. Heckbert.

1 Dr. Itamar Arel College of Engineering Electrical Engineering & Computer Science Department The University of Tennessee Fall 2009 August 24, 2009 ECE-517:

Reinforcement Learning (II.) Exercise Solutions Ata Kaban School of Computer Science University of Birmingham.

Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.

Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Energy-Aware Scheduling with Quality of Surveillance Guarantee in Wireless Sensor Networks Jaehoon Jeong, Sarah Sharafkandi and David H.C. Du Dept. of.

Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

Bayesian inference for Plackett-Luce ranking models

Mixture of Gaussians This is a probability distribution for random variables or N-D vectors such as… –intensity of an object in a gray scale image –color.

Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

Michael Isard and Andrew Blake, IJCV 1998 Presented by Wen Li Department of Computer Science & Engineering Texas A&M University.

Learning to Detect Events with Markov-Modulated Poisson Processes Ihler, Hutchins and Smyth (2007)

Channel Independent Viterbi Algorithm (CIVA) for Blind Sequence Detection with Near MLSE Performance Xiaohua(Edward) Li State Univ. of New York at Binghamton.

The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.

The Markov Chain Monte Carlo Method Isabelle Stanton May 8, 2008 Theory Lunch.

29 August 2013 Venkat Naïve Bayesian on CDF Pair Scores.

Kernels of Mallows Models for Solving Permutation-based Problems

NTU & MSRA Ming-Feng Tsai

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Introduction to Sampling Methods Qi Zhao Oct.27,2004.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.

1 Representation and Evolution of Lego-based Assemblies Maxim Peysakhov William C. Regli ( Drexel University) Authors: {umpeysak,

1Computer Sciences Department. 2 QUICKSORT QUICKSORT TUTORIAL 5.

Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

A Method to Approximate the Bayesian Posterior Distribution in Singular Learning Machines Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.

ICPR2004 (24 July, 2004, Cambridge) 1 Probabilistic image processing based on the Q-Ising model by means of the mean- field method and loopy belief propagation.

Ranking: Compare, Don’t Score Ammar Ammar, Devavrat Shah (LIDS – MIT) Poster ( No preprint), WIDS 2011.

Eick: Introduction Machine Learning

Tingdan Luo 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo

COMP61011 : Machine Learning Ensemble Models

Representation and Evolution of Lego-based Assemblies

Hidden Markov Models Part 2: Algorithms

Aviv Rosenberg 10/01/18 Seminar on Experts and Bandits

Topic models for corpora and for graphs

Neural Networks Geoff Hulten.

Emna Krichene 1, Youssef Masmoudi 1, Adel M

Christoph F. Eick: A Gentle Introduction to Machine Learning

RANDOM NUMBERS SET # 1:

CHAPTER 11 REINFORCEMENT LEARNING VIA TEMPORAL DIFFERENCES

Presentation transcript:

Online Rank Elicitation for Plackett- Luce: A Dueling Bandits Approach Balázs Szörényi Technion, Haifa, Israel / MTA-SZTE Research Group on Artificial Intelligence, Hungary Róbert Busa-Fekete, Adil Paul, Eyke Hüllermeier Department of Computer Science, University of Paderborn Paderborn, Germany Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS 2015)

Problem: Rank Elicitation from pairwise preferences > > >

Moving to online setting

The online ranking problem

Stochastic transitivity assumptions

Connection with sorting algorithm  Naively apply a sorting algorithm as sampling scheme  Since all the pairwise comparisons are stochastic  A random order will be produced  What can we say about the optimality of such an order? >> > > >

Contributions

Preliminaries

Dueling Bandit Framework Continue or terminate? Prediction

Estimation of pairwise probability

The Plackett-Luce Model

Properties of PL Model  The marginal probabilities are easy to calculate  Satisfies the stochastic transitivity  Most probable ranking:  simply sort the items according to their skill parameters

Harmony of QuickSort and Plackett-Luce model

Budgeted QuickSort algorithm

Random tree of QuickSort algorithm

Random tree of Budgeted QuickSort

Pairwise Stability of Budget QuickSort

Proof sketch of Proposition 2

First Goal of learner: PAC-item

PLPAC Algorithm Goal: Finding the PAC item In each iteration,  Generate a partial ranking (line 6)  Translate ranking into pairwise comparisons  Update the estimates of marginal

PLPAC Algorithm

Sample Complexity analysis of PLPAC

Sample Complexity analysis of PLPAC

Second goal of learner: AMPR

PLPAC-AMPR algorithm

PLPAC-AMPR

PLPAC-AMPR algorithm

Sample Complexity Analysis of PLPAC- AMPR

Experiments – The PAC-item Problem

Experiment – The AMPR Problem

Conclusion