Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014.

Slides:

Advertisements

Similar presentations

The K-armed Dueling Bandits Problem

Advertisements

Topic models Source: Topic models, David Blei, MLSS 09.

Lindsey Bleimes Charlie Garrod Adam Meyerson

Presenting work by various authors, and own work in collaboration with colleagues at Microsoft and the University of Amsterdam.

Incentivize Crowd Labeling under Budget Constraint

LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.

Tuning bandit algorithms in stochastic environments The 18th International Conference on Algorithmic Learning Theory October 3, 2007, Sendai International.

Questions?. Setting a reward function, with and without subgoals Difference between agent and environment AI for games, Roomba Markov Property – Broken.

1 Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern.

Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.

The Roles of Uncertainty and Randomness in Online Advertising Ragavendran Gopalakrishnan Eric Bax Raga Gopalakrishnan 2 nd Year Graduate Student (Computer.

Online Learning in Complex Environments

Nonstochastic Multi-Armed Bandits With Graph-Structured Feedback Noga Alon, TAU Nicolo Cesa-Bianchi, Milan Claudio Gentile, Insubria Shie Mannor, Technion.

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski.

Mortal Multi-Armed Bandits Deepayan Chakrabarti,Yahoo! Research Ravi Kumar,Yahoo! Research Filip Radlinski, Microsoft Research Eli Upfal,Brown University.

Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Jointly Optimal Transmission and Probing Strategies for Multichannel Systems Saswati Sarkar University of Pennsylvania Joint work with Sudipto Guha (Upenn)

An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell

LARGE SAMPLE TESTS ON PROPORTIONS

Multi-armed Bandit Problems with Dependent Arms

Genetic Algorithms Nehaya Tayseer 1.Introduction What is a Genetic algorithm? A search technique used in computer science to find approximate solutions.

Thanks to Nir Friedman, HU

Using Value of Information to Learn and Classify under Hard Budgets Russell Greiner, Daniel Lizotte, Aloak Kapoor, Omid Madani Dept of Computing Science,

Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Christopher Olston (CMU and Yahoo! Research)

Hierarchical Exploration for Accelerating Contextual Bandits Yisong Yue Carnegie Mellon University Joint work with Sue Ann Hong (CMU) & Carlos Guestrin.

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Reinforcement Learning Evaluative Feedback and Bandit Problems Subramanian Ramamoorthy School of Informatics 20 January 2012.

1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,

Reynold Cheng†, Eric Lo‡, Xuan S

Some Vignettes from Learning Theory Robert Kleinberg Cornell University Microsoft Faculty Summit, 2009.

Thesis Proposal PrActive Learning: Practical Active Learning, Generalizing Active Learning for Real-World Deployments.

A Brief Introduction to GA Theory. Principles of adaptation in complex systems John Holland proposed a general principle for adaptation in complex systems:

Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.

Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical.

Hypothesis Testing.  Select 50% users to see headline A ◦ Titanic Sinks  Select 50% users to see headline B ◦ Ship Sinks Killing Thousands  Do people.

© 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

Restless Multi-Arm Bandits Problem (RMAB): An Empirical Study Anthony Bonifonte and Qiushi Chen ISYE8813 Stochastic Processes and Algorithms 4/18/2014.

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

COMP 2208 Dr. Long Tran-Thanh University of Southampton Bandits.

A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.

Distributed Learning for Multi-Channel Selection in Wireless Network Monitoring — Yuan Xue, Pan Zhou, Tao Jiang, Shiwen Mao and Xiaolei Huang.

Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.

Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.

By: Kenny Raharjo 1. Agenda Problem scope and goals Game development trend Multi-armed bandit (MAB) introduction Integrating MAB into game development.

Contextual Bandits in a Collaborative Environment Qingyun Wu 1, Huazheng Wang 1, Quanquan Gu 2, Hongning Wang 1 1 Department of Computer Science 2 Department.

Basics of Multi-armed Bandit Problems

An Evolutionary Approach

Tradeoffs Between Fairness and Accuracy in Machine Learning

Math 6330: Statistical Consulting Class 11

Xi Chen Mentor: Denny Zhou In collaboration with: Qihang Lin

István Szita & András Lőrincz

Reinforcement Learning (1)

Reinforcement learning (Chapter 21)

Multimodal Learning with Deep Boltzmann Machines

Feedback-Aware Social Event-Participant Arrangement

Bandits for Taxonomies: A Model-based Approach

Tuning bandit algorithms in stochastic environments

Adaptive entity resolution with human computation

Chapter 2: Evaluative Feedback

Statistical Data Analysis

Active Learning with Unbalanced Classes & Example-Generation Queries

Test Drop Rules: If not:

Lecture 4. Niching and Speciation (1)

Chapter 2: Evaluative Feedback

Dr. Arslan Ornek MATHEMATICAL MODELS

Presentation transcript:

Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014

Decision making with limited information An “algorithm” that we use everyday Initially, nothing/little is known Explore (to gain a better understanding) Exploit (make your decision) Balance between exploration and exploitation We would like to explore widely so that we do not miss really good choices We do not want to waste too much resource exploring bad choices (or try to identify good choices as quickly as possible)

The Stochastic Multi-armed Bandit classic problems in stochastic control, stochastic optimization and online learning

The Stochastic Multi-armed Bandit Stochastic Multi-armed Bandit (MAB) MAB usually refers to a class of problems (many variations) Problem 1 (Top-K Arm identification problem) You can take N samples -A sample: Choose an arm, play it once, and observe the reward (Approximately) Identify the best K arms (arms with largest means) Goal: Minimize N.

Motivating Applications Wide Applications: Clinical Trials (Thompson, 1933). Sequential experimental design, Evolutionary Computing, Online learning, Stochastic network optimization. Motivating Application: Crowdsourcing Crowd

Motivating Applications Worker Test with golden taskObtain a binary-valued sample (correct/wrong)

Evaluation Metric

Simplification Assume Bernoulli distributions from now on Think of a collection of biased coins Try to (approximately) find K coins with largest bias (towards head) …….

Why regret? Misidentification Probability (Bubeck et. al., ICML13): Consider the case: (K=1)

Naïve Solution

Uniform Sampling

Can we do better?? Consider the following example: 0.9, 0.5, 0.5, …………………., 0.5 (a million coins with mean 0.5) Uniform sampling spends too many samples on bad coins. Should spend more samples on good coins However, we do not know which one is good and which is bad…… Sample each coin M=O(1) times. If the empirical mean of a coin is large, we DO NOT know whether it is good or bad But if the empirical mean of a coin is very small, we DO know it is bad (with high probability)

Optimal Multiple Arm Identification (OptMAI) Eliminate one quarter arms with lowest empirical means

Quartile-Elimination

Sample Complexity

Matching Lower Bounds

19/41

Experiments

Simulated Experiment

Simulated Data

Real Data 23 RTE data for textual entailment (Snow et. al., 08) 800 binary labeling tasks with true labels 164 workers

Real Data Crowdsourcing: Impossible to assign too many tasks to a single worker

Real Data

Multi-armed Bandit Each time play 1 arm Each time play K arms

Multi-armed Bandit There are many variations of MAB Stochastic, Adversial, Markovian Side information Contextual bandit Several meaningful objective functions Regret, expected regret, pseudo-regret See the comprehensive survey by Bubeck and Cesa-Bianchi

Multi-armed Bandit An application in ranking documents A set of documents D={d 1,…,d n }. A population of users. Users of type i will click d j with prob p ij Each time the system presents to the user a top-k list. The user click the first doc she likes. Objective: Maximizes E[#users who click at least once] Learning diverse ranking with multi-armed bandits. Radlinski, Kleinberg and Joachims. ICML08. Top-k click Top-k click unknown Play k arms rewards

Final Remarks MAB is a very powerful framework to describe sequential decision problems with limited information. Has found many applications in several areas. I am not aware of much work in DB community, except an application in data cleaning [Cheng at al. VLDB10] Might be useful/helpful in your own research problems!

Thanks.