Blinded Bandits Ofer Dekel, Elad Hazan, Tomer Koren NIPS 2014 (Yesterday)

Slides:

Advertisements

Similar presentations

Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU.

Advertisements

IDSIA Lugano Switzerland Master Algorithms for Active Experts Problems based on Increasing Loss Values Jan Poland and Marcus Hutter Defensive Universal.

Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

Near-Optimal Algorithms for Online Matrix Prediction Elad Hazan (Technion) Satyen Kale (Yahoo! Labs) Shai Shalev-Shwartz (Hebrew University)

A pre-Weekend Talk on Online Learning TGIF Talk Series Purushottam Kar.

TAU Agent Team: Yishay Mansour Mariano Schain Tel Aviv University TAC-AA 2010.

Primal Dual Combinatorial Algorithms Qihui Zhu May 11, 2009.

Introduction to Machine Learning Fall 2013 Perceptron (6) Prof. Koby Crammer Department of Electrical Engineering Technion 1.

Online Learning for Online Pricing Problems Maria Florina Balcan.

Characterizing distribution rules for cost sharing games Raga Gopalakrishnan Caltech Joint work with Jason R. Marden & Adam Wierman.

Optimizing Recommender Systems as a Submodular Bandits Problem Yisong Yue Carnegie Mellon University Joint work with Carlos Guestrin & Sue Ann Hong.

Fast Convergence of Selfish Re-Routing Eyal Even-Dar, Tel-Aviv University Yishay Mansour, Tel-Aviv University.

Regret Minimizing Audits: A Learning-theoretic Basis for Privacy Protection Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha Carnegie Mellon.

Response Regret Martin Zinkevich AAAI Fall Symposium November 5 th, 2005 This work was supported by NSF Career Grant #IIS

Online learning, minimizing regret, and combining expert advice

No-Regret Algorithms for Online Convex Programs Geoffrey J. Gordon Carnegie Mellon University Presented by Nicolas Chapados 21 February 2007.

Tuning bandit algorithms in stochastic environments The 18th International Conference on Algorithmic Learning Theory October 3, 2007, Sendai International.

Power of Selective Memory. Slide 1 The Power of Selective Memory Shai Shalev-Shwartz Joint work with Ofer Dekel, Yoram Singer Hebrew University, Jerusalem.

From Cognitive Science and Machine Learning Summer School 2010

Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.

Linear Submodular Bandits and their Application to Diversified Retrieval Yisong Yue (CMU) & Carlos Guestrin (CMU) Optimizing Recommender Systems Every.

Nonstochastic Multi-Armed Bandits With Graph-Structured Feedback Noga Alon, TAU Nicolo Cesa-Bianchi, Milan Claudio Gentile, Insubria Shie Mannor, Technion.

Sample-based Planning for Continuous Action Markov Decision Processes Ari Weinstein.

(A fast quadratic program solver for) Stress Majorization with Orthogonal Ordering Constraints Tim Dwyer 1 Yehuda Koren 2 Kim Marriott 1 1 Monash University,

The Value of Knowing a Demand Curve: Regret Bounds for Online Posted-Price Auctions Bobby Kleinberg and Tom Leighton.

1 PEGASOS Primal Efficient sub-GrAdient SOlver for SVM Shai Shalev-Shwartz Yoram Singer Nati Srebro The Hebrew University Jerusalem, Israel YASSO = Yet.

Jointly Optimal Transmission and Probing Strategies for Multichannel Systems Saswati Sarkar University of Pennsylvania Joint work with Sudipto Guha (Upenn)

A machine learning approach to improve precision for navigational queries in a Web information retrieval system Reiner Kraft

Shuchi Chawla, Carnegie Mellon University Static Optimality and Dynamic Search Optimality in Lists and Trees Avrim Blum Shuchi Chawla Adam Kalai 1/6/2002.

Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Christopher Olston (CMU and Yahoo! Research)

Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.

Online Learning Algorithms

Hierarchical Exploration for Accelerating Contextual Bandits Yisong Yue Carnegie Mellon University Joint work with Sue Ann Hong (CMU) & Carlos Guestrin.

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.

An equivalent version of the Caccetta-Häggkvist conjecture in an online load balancing problem Angelo Monti 1, Paolo Penna 2, Riccardo Silvestri 1 1 Università.

Monte-Carlo Tree Search

online convex optimization (with partial information)

Online Learning Avrim Blum Carnegie Mellon University Your guide: [Machine Learning Summer School 2012]

Yossi Azar Tel Aviv University Joint work with Ilan Cohen Serving in the Dark 1.

1 Multiplicative Weights Update Method Boaz Kaminer Andrey Dolgin Based on: Aurora S., Hazan E. and Kale S., “The Multiplicative Weights Update Method:

Karthik Raman, Pannaga Shivaswamy & Thorsten Joachims Cornell University 1.

Online Learning Rong Jin. Batch Learning Given a collection of training examples D Learning a classification model from D What if training examples are.

Connections between Learning Theory, Game Theory, and Optimization Maria Florina (Nina) Balcan Lecture 2, August 26 th 2010.

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

Restless Multi-Arm Bandits Problem (RMAB): An Empirical Study Anthony Bonifonte and Qiushi Chen ISYE8813 Stochastic Processes and Algorithms 4/18/2014.

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar.

1 Monte-Carlo Tree Search Alan Fern. 2 Introduction  Rollout does not guarantee optimality or near optimality  It only guarantees policy improvement.

Reduced Rate Switching in Optical Routers using Prediction Ritesh K. Madan, Yang Jiao EE384Y Course Project.

Smooth ε -Insensitive Regression by Loss Symmetrization Ofer Dekel, Shai Shalev-Shwartz, Yoram Singer School of Computer Science and Engineering The Hebrew.

Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.

By: Kenny Raharjo 1. Agenda Problem scope and goals Game development trend Multi-armed bandit (MAB) introduction Integrating MAB into game development.

Basics of Multi-armed Bandit Problems

Carnegie Mellon University

IENG 366 Exam I Review.

Net metabolic rate of walking at 1. 5 m/s vs

Static Optimality and Dynamic Search Optimality in Lists and Trees

Tuning bandit algorithms in stochastic environments

The Nonstochastic Multiarmed Bandit Problem

Variance priming. Variance priming. (A) Mean RTs for both levels of prime variance (x axis; high or low) and both levels of target variance (lines; high.

Chapter 2: Evaluative Feedback

EXP file structure.

Change in %A1C over 5 years in response to 12-week intensive lifestyle intervention used in a real-world clinical practice. Change in %A1C over 5 years.

Change in (A) systolic blood pressure and (B) diastolic blood pressure over 5 years in response to 12-week intensive lifestyle intervention in a real-world.

The Lyapunov stability of the model depends on length feedback.

Zachary L. Steinberg et al. JACC 2017;69:

Chapter 2: Evaluative Feedback

Kinematic measurements recorded as squid approached shrimp and fish.

Percentage of weight loss over 5 years in response to 12-week intensive lifestyle intervention in a real-world clinical practice. Percentage of weight.

Presentation transcript:

Blinded Bandits Ofer Dekel, Elad Hazan, Tomer Koren NIPS 2014 (Yesterday)

Overview Online Learning setting with Bandit feedback No feedback when we switch action “Blinded” Multi-Armed Bandit

Online Learning Regret:

Oblivious vs. non-oblivious (adaptive) Oblivious: Simple non-oblivious cost: Switching : m-memory: Max: Average:

Review on works discussed at class [1] Weighted-Majority: Littlestone and K Warmuth. The weighted majority algorithm, [2] Follow-The-Perturbed-Leader: Kalai and Vempala. Effcient algorithms for online decision problems [3] EXP3: Auer et al. The nonstochastic multiarmed bandit problem, [4] Switching Cost: Dekel et al. Bandits with switching costs: T^{2/3} regret, [5] Linear Composite Costs: Dekel et al. Online learning with composite loss functions, Bandit Full- Information

Reminder: A Bandit Game EXP3 algorithm, 2002 EXP3: Auer et al. The nonstochastic multiarmed bandit problem, 2002.

Blinded Bandit

(Proof on the board)

Blinded EXP3: The guarantee Proofs on the board!