Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)

Slides:

Advertisements

Similar presentations

Chapter 20 Computational complexity. This chapter discusses n Algorithmic efficiency n A commonly used measure: computational complexity n The effects.

Advertisements

22C:19 Discrete Math Algorithms and Complexity

all-pairs shortest paths in undirected graphs

On-line learning and Boosting

Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.

Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization.

COP 3502: Computer Science I (Note Set #21) Page 1 © Mark Llewellyn COP 3502: Computer Science I Spring 2004 – Note Set 21 – Balancing Binary Trees School.

Fast Convergence of Selfish Re-Routing Eyal Even-Dar, Tel-Aviv University Yishay Mansour, Tel-Aviv University.

AVL Trees COL 106 Amit Kumar Shweta Agrawal Slide Courtesy : Douglas Wilhelm Harder, MMath, UWaterloo

Regret Minimizing Audits: A Learning-theoretic Basis for Privacy Protection Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha Carnegie Mellon.

Lecture: Algorithmic complexity

Online learning, minimizing regret, and combining expert advice

1 Learning with continuous experts using Drifting Games work with Robert E. Schapire Princeton University work with Robert E. Schapire Princeton University.

CSCE 580 ANDREW SMITH JOHNNY FLOWERS IDA* and Memory-Bounded Search Algorithms.

Online Scheduling with Known Arrival Times Nicholas G Hall (Ohio State University) Marc E Posner (Ohio State University) Chris N Potts (University of Southampton)

Majority and Minority games. Let G be a graph with all degrees odd. Each vertex is initially randomly assigned a colour (black or white), and at each.

The Rate of Concentration of the stationary distribution of a Markov Chain on the Homogenous Populations. Boris Mitavskiy and Jonathan Rowe School of Computer.

Introduction to Analysis of Algorithms

Algorithmic Complexity 2 Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Dasgupta, Kalai & Monteleoni COLT 2005 Analysis of perceptron-based active learning Sanjoy Dasgupta, UCSD Adam Tauman Kalai, TTI-Chicago Claire Monteleoni,

Algorithm Analysis CS 201 Fundamental Structures of Computer Science.

The Price of Uncertainty Maria-Florina Balcan Georgia Tech Avrim Blum Carnegie Mellon Yishay Mansour Tel-Aviv/Google ACM-EC 2009.

Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 23 Algorithm Efficiency.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Data Structure Algorithm Analysis TA: Abbas Sarraf

Analysis of Algorithm.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.

Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.

Data Structure & Algorithm Lecture 3 –Algorithm Analysis JJCAO.

Liang, Introduction to Java Programming, Seventh Edition, (c) 2009 Pearson Education, Inc. All rights reserved Chapter 23 Algorithm Efficiency.

1 Chapter 24 Developing Efficient Algorithms. 2 Executing Time Suppose two algorithms perform the same task such as search (linear search vs. binary search)

Discrete Structures Lecture 11: Algorithms Miss, Yanyan,Ji United International College Thanks to Professor Michael Hvidsten.

Yossi Azar Tel Aviv University Joint work with Ilan Cohen Serving in the Dark 1.

259 Lecture 6 Spring 2013 Recurrence Relations in Excel.

Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

Dynamic Games & The Extensive Form

CS 61B Data Structures and Programming Methodology July 28, 2008 David Sun.

Fundamentals CSE 373 Data Structures Lecture 5. 12/26/03Fundamentals - Lecture 52 Mathematical Background Today, we will review: ›Logs and exponents ›Series.

Chapter 11 Heap. Overview ● The heap is a special type of binary tree. ● It may be used either as a priority queue or as a tool for sorting.

A Introduction to Computing II Lecture 5: Complexity of Algorithms Fall Session 2000.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

CSE 326: Data Structures Lecture #3 Asymptotic Analysis Steve Wolfman Winter Quarter 2000.

Search Algorithms Written by J.J. Shepherd. Sequential Search Examines each element one at a time until the item searched for is found or not found Simplest.

Copyright © 2014 Curt Hill Algorithm Analysis How Do We Determine the Complexity of Algorithms.

Ch03-Algorithms 1. Algorithms What is an algorithm? An algorithm is a finite set of precise instructions for performing a computation or for solving a.

Chapter 15 Running Time Analysis. Topics Orders of Magnitude and Big-Oh Notation Running Time Analysis of Algorithms –Counting Statements –Evaluating.

Online Learning Model. Motivation Many situations involve online repeated decision making in an uncertain environment. Deciding how to invest your money.

Recurrence Relations in Excel

Reinforcement Learning in POMDPs Without Resets

Routing: Distance Vector Algorithm

Frequency Domain Design Demo I EE 362K (Buckman) Fall 03

A Balanced Introduction to Computer Science David Reed, Creighton University ©2005 Pearson Prentice Hall ISBN X Chapter 13 (Reed) - Conditional.

Maximal Independent Set

Introduction to Algorithms Analysis

The Curve Merger (Dvir & Widgerson, 2008)

Presented By Aaron Roth

Data Structures Review Session

CS 201 Fundamental Structures of Computer Science

Applied Discrete Mathematics Week 6: Computation

Analyzing an Algorithm Computing the Order of Magnitude Big O Notation

CSE 2010: Algorithms and Data Structures Algorithms

CSE 373 Data Structures Lecture 5

More advanced aspects of search

A Balanced Introduction to Computer Science David Reed, Creighton University ©2005 Pearson Prentice Hall ISBN X Chapter 13 (Reed) - Conditional.

Presentation transcript:

Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn) Yishay Mansour (Tel Aviv) Jenn Wortman (Penn)

Learner maintains a weighting over N experts On each of T trials, learner observes payoffs for all K –Payoff to the learner = weighted payoff –Learner then dynamically adjusts weights Let R i,T be cumulative payoff of expert i on some sequence of T trials Let R A,T be cumulative payoff of learning algorithm A Classical no-regret results: We can produce a learning algorithm A such that on any sequence of trials, R A,T > max{R i,T } – sqrt(log(N)*T) –No regret: per-trial regret sqrt(log(N)/T) approaches 0 as T grows The No-Regret Setting

We simultaneously examine: –Regret to best expert in hindsight –Regret to the average return of all experts Note that no learning is required to achieve just this! Why look at the average? –A safety net or sanity check –Simple algorithm outperforms –Future direction: S&P 500 We assume a fixed horizon T –But this can easily be relaxed… This Work

Every difference based algorithm with regret O(T α ) to the best expert has Ω(T 1-α ) regret to the average There exists simple difference based algorithm achieving the tradeoff Every algorithm with O(T 1/2 ) regret to the best expert must have regret Ω(T 1/2 ) regret to the average We can produce an algorithm with O(logT T 1/2 ) regret to the best and O(1) regret to the average Our Results

Consider 2 experts with instantaneous gains in {0,1} Let w be the weight on first expert and initialize w = ½ Suppose expert 1 gets a gain of 1 on the first time step, and expert 2 gets a gain of 1 on the second… Best, worst, and average all earn 1 Algorithm earns w + (1 – w – ) = 1 – Regret to Best = Regret to Worst = Regret to Average = ww w + (1,0)(0,1) Oscillations: The Cost of an Update

Consider the following sequence –Expert 1: 1,0,1,0,1,0,1,0,…,1,0 –Expert 2: 0,1,0,1,0,1,0,1,…,0,1 We can examine w over time for existing algorithms… Follow the Perturbed Leader: ½, ½ + 1/(T(1+ln(2)) 1/2 - 1/2T, ½, ½ +1/(T(1+ln(2)) 1/2 - 1/2T, ½, … Weighted Majority: ½, ½ + (ln(2)/2T) 1/2 /(1+(ln(2)/2T) 1/2 ), ½, ½+(ln(2)/2T) 1/2 /(1+(ln(2)/2T) 1/2 ), ½,... Both will lose to best, worst, and average A Bad Sequence

… w = ½ w = ½ + w = 2/3 L steps, regret to best > L/3 Some t > 1/6L … … … T steps, regret to average ~ (T/2)*(1/6L) ~ (T/L) Again, consider 2 experts with instantaneous gains in {0,1} Let w be the weight on first expert and initialize w = ½ Will first examine algorithms that depend only on cumulative difference in payoffs –Insight holds more generally for aggressive updating Regret to Best * Regret to Average ~ (T) ! (1,0) A Simple Trade-off: The (T) Barrier

Unnormalized weight on expert i at time t: w i,t = e ηRi,t Define W t = w i,t, so we have p i,t = w i,t / W t Let N be the number of experts Setting η = O(1/T 1/2 ) achieves O(T 1/2 ) regret to the best Setting η = O(1/T 1/2+α ) achieves O(T 1/2+α ) regret to the best Can be shown that Setting η = O(1/T 1/2+α ) regret to the average is O(T 1/2-α ) Exponential Weights [F94]

Regret to best ~ T x Regret to average ~ T y 1/2 1 cumulative difference algorithms So far…

Any algorithm achieving O(T 1/2 ) regret to best must suffer (T 1/2 ) regret to average Any algorithm achieving O(log(T)T) 1/2 regret to best must suffer (T regret to the average Not restricted to cumulative difference algorithms! Regret to best ~ T x Regret to average ~ T y 1/2 1 all algorithms cumulative difference algorithms An Unrestricted Lower Bound

Once again, 2 experts with instantaneous gains in {0,1}, w initialized to ½ Let D t be difference in cumulative payoffs of the two experts at time t The algorithm will make the following updates –If expert gains are (0,0) or (1,1): no change to w –If expert gains are (1,0): w w + –If expert gains are (0,1): w w – Assume we never reach w =1 For any difference D t = d we have w = ½ + d A Simple Additive Algorithm

While |D t | < H –(0,0) or (1,1): no change to w –(1,0): w w + –(0,1): w w – Play EW with Will analyze what happens: 1. If we stay in the loop 2. If we exit the loop Breaking the (T) Barrier

While |D_t| < H –(0,0) or (1,1): no change to w –(1,0): w w + –(0,1): w w – Observe R best,t - R avg,T < H Enough to compute regret to the average Time t Distance D t d d+1 ww w + (1,0)(0,1) Lose to Best & Average Regret to the Average at most T Regret to the Best at most T Staying in the Loop

While |D_t| < H –(0,0) or (1,1): no change to w –(1,0): w w + –(0,1): w w – Play EW with Upon exit from loop: –Regret to the best: still at most H + T –Gain over the average: (... H - T ~ H 2 - T So e.g. H = T 2/3 and = 1/T gives –Regret to best: < T 2/3 in loop or upon exit –Regret to average: constant in loop; but gain T 1/3 upon exit Now EW regret to the best T 2/3 and to the average T 1/3 w w + (1,0) Lose 1-w to Best Gain w-½ over Average Time t d d+1 Exiting the Loop Distance D t

Regret to best ~ T x Regret to avg ~ T y 1/2 1 all algorithms cumulative difference algorithms 2/3

Instead of playing additive algorithm inside the loop, we can play EW with η = Δ = 1/T Instead of having one phase, we can have many Set η = 1/T, k = logT For i = 1 to k –Reset and run EW with the current value of η until R best,t – R avg,t > H = O(T 1/2 ) –Set η = η * 2 Reset and run EW with final value of η Obliterating the (T) Barrier

Known Extensions to Our Algorithm: –Instead of average, can use any static weight inside the simplex Future Goals: –Nicer dependence on the number of experts Ours is O(logN), typically O(sqrt(logN)) –Generalization to the returns setting and to other loss functions Extensions and Open Problems

Thanks! Questions?