IDSIA Lugano Switzerland Master Algorithms for Active Experts Problems based on Increasing Loss Values Jan Poland and Marcus Hutter Defensive Universal.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU.
Lower Bounds for Local Search by Quantum Arguments Scott Aaronson.
Effective Change Detection Using Sampling Junghoo John Cho Alexandros Ntoulas UCLA.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen.
Source of slides: Introduction to Automata Theory, Languages and Computation.
Spectral Analysis of Function Composition and Its Implications for Sampling in Direct Volume Visualization Steven Bergner GrUVi-Lab/SFU Torsten Möller.
and 6.855J Cycle Canceling Algorithm. 2 A minimum cost flow problem , $4 20, $1 20, $2 25, $2 25, $5 20, $6 30, $
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Coordinate Plane Practice The following presentation provides practice in two skillsThe following presentation provides practice in two skills –Graphing.
0 - 0.
ALGEBRAIC EXPRESSIONS
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
ADDING INTEGERS 1. POS. + POS. = POS. 2. NEG. + NEG. = NEG. 3. POS. + NEG. OR NEG. + POS. SUBTRACT TAKE SIGN OF BIGGER ABSOLUTE VALUE.
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Year 6 mental test 10 second questions Numbers and number system Numbers and the number system, fractions, decimals, proportion & probability.
Assumptions underlying regression analysis
ZMQS ZMQS
Reductions Complexity ©D.Moshkovitz.
Koby Crammer Department of Electrical Engineering
A pre-Weekend Talk on Online Learning TGIF Talk Series Purushottam Kar.
Measuring Time Complexity
Von Karman Institute for Fluid Dynamics RTO, AVT 167, October, R.A. Van den Braembussche von Karman Institute for Fluid Dynamics Tuning of Optimization.
Hash Tables.
5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.
THE PRICE OF STOCHASTIC ANARCHY Christine ChungUniversity of Pittsburgh Katrina LigettCarnegie Mellon University Kirk PruhsUniversity of Pittsburgh Aaron.
Primal Dual Combinatorial Algorithms Qihui Zhu May 11, 2009.
O X Click on Number next to person for a question.
© S Haughton more than 3?
Introduction to Machine Learning Fall 2013 Perceptron (6) Prof. Koby Crammer Department of Electrical Engineering Technion 1.
Squares and Square Root WALK. Solve each problem REVIEW:
Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.
This, that, these, those Number your paper from 1-10.
Scalable and Dynamic Quorum Systems Moni Naor & Udi Wieder The Weizmann Institute of Science.
1 First EMRAS II Technical Meeting IAEA Headquarters, Vienna, 19–23 January 2009.
Addition 1’s to 20.
25 seconds left…...
Test B, 100 Subtraction Facts
11 = This is the fact family. You say: 8+3=11 and 3+8=11
2 4 Theorem:Proof: What shall we do for an undirected graph?
Week 1.
We will resume in: 25 Minutes.
Vincent Conitzer CPS Repeated games Vincent Conitzer
1 Chapter 11 Price Searcher Markets with High Entry Barriers.
Bart Jansen 1.  Problem definition  Instance: Connected graph G, positive integer k  Question: Is there a spanning tree for G with at least k leaves?
Lirong Xia Reinforcement Learning (2) Tue, March 21, 2014.
Infinitely Repeated Games
On-line learning and Boosting
Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization.
Response Regret Martin Zinkevich AAAI Fall Symposium November 5 th, 2005 This work was supported by NSF Career Grant #IIS
Follow the regularized leader
Tuning bandit algorithms in stochastic environments The 18th International Conference on Algorithmic Learning Theory October 3, 2007, Sendai International.
1 Learning with continuous experts using Drifting Games work with Robert E. Schapire Princeton University work with Robert E. Schapire Princeton University.
Blinded Bandits Ofer Dekel, Elad Hazan, Tomer Koren NIPS 2014 (Yesterday)
Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.
On Bounded Rationality and Computational Complexity Christos Papadimitriou and Mihallis Yannakakis.
Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.
online convex optimization (with partial information)
1 Monte-Carlo Planning: Policy Improvement Alan Fern.
CS154, Lecture 18:.
Vincent Conitzer CPS Repeated games Vincent Conitzer
Tuning bandit algorithms in stochastic environments
Multiagent Systems Repeated Games © Manfred Huber 2018.
Vincent Conitzer Repeated games Vincent Conitzer
Vincent Conitzer CPS Repeated games Vincent Conitzer
Presentation transcript:

IDSIA Lugano Switzerland Master Algorithms for Active Experts Problems based on Increasing Loss Values Jan Poland and Marcus Hutter Defensive Universal Learning or

2 What Universal learning: Algorithm that performs asymptotically optimal for any online decision problem. AgentEnvironment Actions/ Decisions Loss/ Reward

3 How Prediction with Expert Advice Bandit techniques Infinite Expert classes Growing Losses

4 Prediction with Expert Advice t = Expert 1: loss = Expert 2: loss = Expert 3: loss =... Master: loss = 1 ½ ½ ½ instantaneous losses are bounded in [0,1]

5 Prediction with Expert Advice Expert 1: loss = Expert 2: loss = 0 ½ Do not follow the leader:... but the perturbed leader: Regret = E loss(Master) – loss(Best expert) [Hannan, Littlestone, Warmuth, Cesa-Bianchi, McMahan, Blum, etc.] proven for adversarial environments!

6 Learning Rate Cumulative loss grows has to be scaled down (otherwise we end up following the leader) dynamic learning rate [Cesa-Bianchi et al., Auer et al., etc.] learning rate and bounds can be significantly improved for small losses things get more complicated

7 Priors for Expert Classes Expert class of finite size n uniform prior 1/n is common uniform complexity log n [Hutter, Poland for dynamic learning rate] Countably infinite expert class n = uniform prior impossible bounds are instead in log w i -1 i is the best expert in hindsight

8 Universal Expert Class Expert i = i th program on some universal Turing machine Prior complexity = length of the program Interprete the output appropriately, depending on the problem This construction is common in Algorithmic Information Theory

9 Bandit Case t = Expert 1: loss = Expert 2: loss = Expert 3: loss =... Master: loss = 1 ? ? 1 ? ? 0 0 ? ? ½ ½ ? 0 ? 0

10 Bandit Case Explore with small probability t, otherwise exploit t is the exploration rate t 0 as t Deal with estimated losses: observed loss of the action probability of the action unbiased estimate [Auer et al., McMahan and Blum]

11 Bandit Case: Consequences Bounds are in w i instead of -log w i this (exponentially worse) bound is sharp in general Analysis gets harder for adaptive adversaries (with martingales...)

12 Reactive Environments Repeated game: Prisoners Dilemma CD Cooperate (C)0.31 Defect (D)00.7 [de Farias and Megiddo, 2003] Cooperate: loss = Defect: loss =

13 Prisoners Dilemma [de Farias and Megiddo, 2003] defecting is dominant but still cooperating may have the better long term reward e.g. against Tit for Tat Expert Advice fails against Tit for Tat Tit for Tat is reactive Cooperate: loss = Defect: loss =

14 Time Scale Change Idea: Yield control to selected expert for increasingly many time steps original time scale new master time scale instantaneous losses may grow in time

15 Follow or Explore (FoE) Need master algorithm +analysis for losses in [0,B t ], B t grows countable expert classes dynamic learning rate dynamic exploration rate technical issue: dynamic confidence for almost sure assertions Algorithm FoE (Follow or Explore) (details are in the paper)

16 Main Result Theorem: For any online decision problem, FoE performs in the limit as well as any computable strategy (expert). That is, FoEs average per round regret converges to 0. Moreover, FoE uses only finitely many experts at each time, thus is computationally feasible.

17 Universal Optimality universal optimal = the average per-round regret tends to zero the cumulative regret grows slower than t

18 Universal Optimality finite regret logarithmic regret (square) root regret linear

19 Conclusions Adversary Actions/ Decisions FoE is universally optimal in the limit but maybe too defensive for practice?! Loss/ Reward Thank you!