Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization.

Slides:

Advertisements

Similar presentations

Regret to the Best vs. Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn)

Advertisements

Ulams Game and Universal Communications Using Feedback Ofer Shayevitz June 2006.

Online Learning for Online Pricing Problems Maria Florina Balcan.

Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.

Game Theory Assignment For all of these games, P1 chooses between the columns, and P2 chooses between the rows.

Mechanism Design without Money Lecture 1 Avinatan Hassidim.

Mixed Strategies CMPT 882 Computational Game Theory Simon Fraser University Spring 2010 Instructor: Oliver Schulte.

15 THEORY OF GAMES CHAPTER.

The basics of Game Theory Understanding strategic behaviour.

MIT and James Orlin © Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)

Managerial Decision Modeling with Spreadsheets

Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.

EC941 - Game Theory Lecture 7 Prof. Francesco Squintani

Regret Minimization and the Price of Total Anarchy Paper by A. Blum, M. Hajiaghayi, K. Ligett, A.Roth Presented by Michael Wunder.

Learning in games Vincent Conitzer

Online learning, minimizing regret, and combining expert advice

1 Learning with continuous experts using Drifting Games work with Robert E. Schapire Princeton University work with Robert E. Schapire Princeton University.

Online Learning Avrim Blum Carnegie Mellon University Your guide: [Machine Learning Summer School 2012]

Machine Learning Theory Machine Learning Theory Maria Florina Balcan 04/29/10 Plan for today: - problem of “combining expert advice” - course retrospective.

Part 3: The Minimax Theorem

Games of Prediction or Things get simpler as Yoav Freund Banter Inc.

The Evolution of Conventions H. Peyton Young Presented by Na Li and Cory Pender.

Interchanging distance and capacity in probabilistic mappings Uriel Feige Weizmann Institute.

Mechanism Design without Money Lecture 4 1. Price of Anarchy simplest example G is given Route 1 unit from A to B, through AB,AXB OPT– route ½ on AB and.

A camper awakens to the growl of a hungry bear and sees his friend putting on a pair of running shoes, “You can’t outrun a bear,” scoffs the camper. His.

Algorithmic and Economic Aspects of Networks Nicole Immorlica.

Duality Lecture 10: Feb 9. Min-Max theorems In bipartite graph, Maximum matching = Minimum Vertex Cover In every graph, Maximum Flow = Minimum Cut Both.

1 Computing Nash Equilibrium Presenter: Yishay Mansour.

AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.

An Introduction to Game Theory Part III: Strictly Competitive Games Bernhard Nebel.

UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.

Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.

UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.

An Intro to Game Theory Avrim Blum 12/07/04.

Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.

Minimax strategies, Nash equilibria, correlated equilibria Vincent Conitzer

CPS Learning in games Vincent Conitzer

Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)

1 Mining surprising patterns using temporal description length Soumen Chakrabarti (IIT Bombay) Sunita Sarawagi (IIT Bombay) Byron Dom (IBM Almaden)

online convex optimization (with partial information)

The Multiplicative Weights Update Method Based on Arora, Hazan & Kale (2005) Mashor Housh Oded Cats Advanced simulation methods Prof. Rubinstein.

Yossi Azar Tel Aviv University Joint work with Ilan Cohen Serving in the Dark 1.

Nash equilibrium Nash equilibrium is defined in terms of strategies, not payoffs Every player is best responding simultaneously (everyone optimizes) This.

Standard and Extended Form Games A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor, SIUC.

Dynamic Games & The Extensive Form

Moshe Tennenholtz, Aviv Zohar Learning Equilibria in Repeated Congestion Games.

Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.

Online Learning Yiling Chen. Machine Learning Use past observations to automatically learn to make better predictions or decisions in the future A large.

Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part III) (Some slides.

1 Intrinsic Robustness of the Price of Anarchy Tim Roughgarden Stanford University.

Connections between Learning Theory, Game Theory, and Optimization Maria Florina (Nina) Balcan Lecture 2, August 26 th 2010.

1 What is Game Theory About? r Analysis of situations where conflict of interests is present r Goal is to prescribe how conflicts can be resolved 2 2 r.

Shall we play a game? Game Theory and Computer Science Game Theory /06/05 - Zero-sum games - General-sum games.

Experts and Multiplicative Weights slides from Avrim Blum.

Vincent Conitzer CPS Learning in games Vincent Conitzer

On-Line Algorithms in Machine Learning By: WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ KIM HYEONGCHEOL HE RUIDAN SHANG XINDI.

Online Learning Model. Motivation Many situations involve online repeated decision making in an uncertain environment. Deciding how to invest your money.

Carnegie Mellon University

The Duality Theorem Primal P: Maximize

Game Theory Just last week:

Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part II)

Communication Complexity as a Lower Bound for Learning in Games

Presented By Aaron Roth

The Weighted Majority Algorithm

Presentation transcript:

Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization

Announcements Next Week (Dec 24 th ): Israeli Seminar on Computational Game Theory (10:00 - 4:30) At Microsoft Israel R&D Center, 13 Shenkar St. Herzliya. January: course will be 1300:-15:00 – The meetings on Jan 7th, 14th and 21st 2009

The Peanuts Game There are n bins. At each round ``nature” throws a peanut into one of the bins If you (the player) are at the chosen bin – you catch the peanut –Otherwise – you miss it You may choose to move to any bin before any round Game ends when d peanuts were thrown at some bin –Independent of whether they were caught or not Goal: to catch as many peanuts as possible. Hopeless if the opponent is tossing the peanuts based on knowing where you stand. Say sequence of bins is predetermined (but unknown). How well can we do as a function of d and n ?

Basic setting View learning as a sequence of trials. In each trial, algorithm is given x, asked to predict f, and then is told the correct value. Make no assumptions about how examples are chosen. Goal is to minimize number of mistakes. Focus on number of mistakes. Need to “learn from our mistakes”.

Using “expert” advice We solicit n “experts” for their advice: will the market go up or down tomorrow? Want to use their advice to make our prediction. E.g., Want to predict the stock market. What is a good strategy for using their opinions, No a priori knowledge which expert is the best? “Expert”: someone with an opinion.

Some expert is perfect We have n “experts”. One of these is perfect (never makes a mistake). We just don’t know which one. Can we find a strategy that makes no more than log n mistakes? Simple algorithm: take majority vote over all experts that have been completely correct so far. What if we have a “prior” p over the experts. Want no more than log(1/p i ) mistakes, where expert i is the perfect one? Take weighted vote according to p.

Relation to concept learning If computation time is no object, can have one “expert” per concept in C. If target in C, then number of mistakes at most log|C|. More generally, for any description language, number of mistakes is at most number of bits to write down f.

What if no expert is perfect? Goal is to do nearly as well as the best one in hindsight. Strategy #1: Iterated halving algorithm. Same as before, but once we've crossed off all the experts, restart from the beginning. Makes at most log(n)*OPT mistakes, –OPT is # mistakes of the best expert in hindsight. Seems wasteful: constantly forgetting what we've “learned”. x log n

Weighted Majority Algorithm Intuition : Making a mistake doesn't completely disqualify an expert. So, instead of crossing off, just lower its weight. Weighted Majority Alg: – Start with all experts having weight 1. – Predict based on weighted majority vote. – Penalize mistakes by cutting weight in half.

Weighted Majority Algorithm Weighted Majority Alg: – Start with all experts having weight 1. – Predict based on weighted majority vote. – Penalize mistakes by cutting weight in half. Example: Day 1 Day 2 Day 3 Day 4

Analysis: not far from best expert in hindsight M = # mistakes we've made so far. b = # mistakes best expert has made so far. W = total weight. Initially W is set to n. After each mistake: W drops by at least 25%. So, after M mistakes, W is at most n(3/4) M. Weight of best expert is (1/2) b. So: (1/2) b · n(3/4) M and M ·  (b+log n) constant comp. ratio

Randomized Weighted Majority If the best expert makes a mistake 20% of the time, then  (b+log n) not so good: Can we do better? Instead of majority vote: use weights as probabilities. If 70% on up, 30% on down, then pick 70:30 Idea: smooth out the worst case. Also, generalize ½ to 1- .

Randomized Weighted Majority /* Initialization */: W i ← 1 for i 2 1..n /* Main loop */: For t=1.. T Let P t (i) = W i /(  j=1 n W j ). Choose i according to P t /*Update Scores */ Observe losses for i 2 1..n W i ← W i ¢ (1-  ) loss(i)

Analysis Say at time t fraction F t of weight on experts that made mistake. We have probability F t of making a mistake: remove an  F t fraction of the total weight. –W final = n(1-  F 1 )(1 -  F 2 )... –ln(W final ) = ln(n) +  t [ ln(1 -  F t )] · ln(n) -   t F t (using ln(1-x) < -x ) = ln(n) -  M. (  F t = E [# mistakes]) If best expert makes b mistakes, then ln(W final ) > ln((1-  ) b ). Now solve: ln(n) -  M > b ln(1 -  ): M · b ln(1 -  )/(-  )  ln(n)  /  · b (1+  )  ln(n)  / . M = Expected # mistakes b = # mistakes best expert made W = total weight. -ln(1-x) · -x+x 2 for 0 · x · 1/2

Summarizing: RWM Can be (1+  )- competitive with best expert in hindsight, with additive  -1 log(n ). If running T time steps, set  = (ln n/T) 1/2 to get M · b (1+ (ln n/T) 1/2 )+  ln(n)  / (ln n/T) 1/2 =b + (b 2 ln n/T) 1/2 )+  (ln(n)  T) 1/2 · b + 2(ln(n)  T) 1/2 M = # mistakes made b = # mistakes best expert made M · b (1+  )  ln(n)  /  additive loss

Questions Isn’t it better to sometimes take the majority? –The best expert may have a hard time on the easy question and we would be better of using the wisdom of the crowds Answer: if it a good idea, make the majority expert one of the experts!

Lower Bounds Cannot hope to do better than: log n T 1/2

What can we use this for? Can use to combine multiple algorithms to do nearly as well as best in hindsight. –E.g., online auctions: one expert per price level. Play repeated game to do nearly as well as best strategy in hindsight Regret Minimization Extensions: “bandit problem”, movement costs.

“No-regret” algorithms for repeated games Repeated play of matrix game with N rows. (Algorithm is row-player, rows represent different possible actions). At each step t : algorithm picks row; life picks column. –Alg pays cost for action chosen. C t (i t ) –Alg gets column as feedback : C t or just its own cost in the “bandit” model). –Assume bound on max cost: all costs between 0 and 1. Algorithm Adversary – world - life CtCt itit

“No-regret” algorithms for repeated games At each time step, algorithm picks row, life picks column. –Alg pays cost for action chosen: C t (i) –Alg gets column as feedback –Assume bound on max cost: all costs between 0 and 1. Define average regret in T time steps as: [avg cost of alg] – [avg cost of best fixed row in hindsight] 1/T ¢ [  t=1 T C t (i t ) - min i  t=1 T C t (i)] Want this to go to 0 or better as T gets large [= “ no-regret” algorithm ].

Randomized Weighted Majority /* Initialization */: W i ← 1 for i 2 1..n /* Main loop */: For t=1.. T Let P t (i) = W i /(  j=1 n W j ). Choose i according to P t /*Update Scores */ Observe column C t for i 2 1..n W i ← W i ¢ (1-  ) C t (i) Algorithm Adversary – world - life CtCt i

Analysis Similar to {0,1} case. E[cost of RWM] · (min i  t=1 T C t (i) )/(1-  + ln(n)/  · (min i  t=1 T C t (i) (1+2  ) + ln(n)/  No Regret: as T grows difference goes to 0 For  · ½

Properties of no-regret algorithms. Time-average performance guaranteed to approach minimax value V of game –or better, if life isn’t adversarial. Two NR algorithms playing against each other: have empirical distribution approaching minimax optimal. Existence of no-regret algorithms: yields proof of minimax theorem. Algorithm Adversary – world - life

von Neuman’s Minimax Theorem Zero-sum game: u 2 (a 1,a 2 ) = -u 1 (a 1,a 2 ) Theorem: RFor any two-player zero sum game with finite strategy set A 1, A 2 there is a value v 2 R, the game value, s.t. v = max p 2  (A 1 ) min q 2  (A 2 ) u 1 (p,q) = min q 2  (A 2 ) max p 2  (A 1 ) u 1 (p,q) For all mixed Nash equilibria (p,q): u 1 (p,q)=v  (A) = mixed strategies over A

Convergence to Minimax Suppose we know v = max p 2  (A 1 ) min q 2  (A 2 ) u 1 (p,q) = min q 2  (A 2 ) max p 2  (A 1 ) u 1 (p,q) Consider distribution q for player 2: observed frequencies of player 2 for T steps There is best response x 2 A 1 for q so that u 2 (x,q) · v If player 1 always plays x : then expected gain is vT If player 1 follows a no-regret procedure: loss is at most vT +R where R/T → 0 Using RWM: average loss is v + O(log n/T) 1/2 )

Proof of the Minimax Want to show v = max p 2  (A 1 ) min q 2  (A 2 ) u 1 (p,q) = min q 2  (A 2 ) max p 2  (A 1 ) u 1 (p,q) Consider for player 1 v 1 max = max x 2 A 1 min q 2  (A 2 ) u 1 (x,q) v 1 min = min y 2 A 2 max p 2  (A 1 ) u 1 (p,y) For player 2 v 2 max = max y 2 A 2 min p 2  (A 2 ) -u 1 (p,y) v 2 min = min x 2 A 1 max q 2  (A 1 ) -u 1 (x,q) Need to prove v 1 max = v 1 min. Easy: v 1 max ¸ v 1 min Suppose v 1 max = v 1 min +  for  ¸ 0 Player 1 and 2 follow a no-regret procedure for T steps with regret R Need R/T <  /2 Best choice given player 2 distribution Best distribution not given player 2 distribution

Proof of the Minimax Consider for player 1 v 1 max = max x 2 A 1 min q 2  (A 2 ) u 1 (x,q) v 1 min = min y 2 A 2 max p 2  (A 1 ) u 1 (p,y) For player 2 v 2 max = max y 2 A 2 min p 2  (A 2 ) -u 1 (p,y) v 2 min = min x 2 A 1 max q 2  (A 1 ) -u 1 (x,q) Suppose v 1 max = v 1 min +  for  ¸ 0 Player 1 and 2 follow a no-regret procedure for T steps with regret R Need R/T <  /2 : Losses are L T and - L T. Let L 1 and L 2 be best response losses for the empirical distributions Then L 1 /T ¸ v 1 max and L 2 /T ¸ v 2 max But L 1 · L T +R and L 2 ¸ - L T - R

History and development [Hannan’57, Blackwell’56]: Alg. with regret O((N/T) 1/2 ). –Need T = O(N/  2 ) steps to get time-average regret . Call this quantity T  (or  ). –Optimal dependence on T (or  ). View N as constant Learning-theory 80s-90s: “combining expert advice” –Perform (nearly) as well as best f 2 C. View N as large. [L ittlestone W armuth ’89]: Weighted-majority algorithm · E[cost] · OPT(1+e) + (log N)/e · OPT+  T+(log N)/  Regret O((log N)/T) 1/2. T  = O((log N)/  2 ).

Why Regret Minimization? Finding Nash equilibria can be computationally difficult Not clear that agents would converge to it, or remain in one if there are several Regret minimization is realistic: –There are efficient algorithms that minimize regret – It is locally computed, –Players improve by lowering regret

Bounds have only log dependence on n. So, conceivably can do well when n is exponential in natural problem size, if only could implement efficiently. E.g., case of paths… Recent years: series of results giving efficient implementation/alternatives in various settings Efficient implicit implementation for large n… target

The Evesdropping Game Let G=(V,E) Player 1 chooses and edge e of E Player 2 chooses a spanning tree T Payoff u 1 (e,T) = 1 if e 2 T and 0 otherwise The number of moves: exponential in |G| But: best response for Player 2 given distribution on edges: – solve a minimum spanning tree on the probabilities

Correlated Equilibrium and Swap Regret What about Nash? For correlated equilibrium: if algorithm has low swap regret then converges to correlated equilibrium.

Back to the Peanuts Game There are n bins. At each round ``nature” throws a peanut into one of the bins If you (the player) are at the chosen bin – you catch the peanut –Otherwise – you miss it You may choose to move to any bin before any round Game ends when d peanuts were thrown at some bin Homework: what guarantees can you provide