Moshe Tennenholtz, Aviv Zohar Learning Equilibria in Repeated Congestion Games.

Slides:



Advertisements
Similar presentations
Inefficiency of equilibria, and potential games Computational game theory Spring 2008 Michal Feldman TexPoint fonts used in EMF. Read the TexPoint manual.
Advertisements

Some Problems from Chapt 13
6.896: Topics in Algorithmic Game Theory Lecture 20 Yang Cai.
Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.
Price Of Anarchy: Routing
This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.
Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization.
Evolution and Repeated Games D. Fudenberg (Harvard) E. Maskin (IAS, Princeton)
Congestion Games with Player- Specific Payoff Functions Igal Milchtaich, Department of Mathematics, The Hebrew University of Jerusalem, 1993 Presentation.
Two-Player Zero-Sum Games
Infinitely Repeated Games. In an infinitely repeated game, the application of subgame perfection is different - after any possible history, the continuation.
Game Theory and Computer Networks: a useful combination? Christos Samaras, COMNET Group, DUTH.
EC3224 Autumn Lecture #04 Mixed-Strategy Equilibrium
Repeated games with Costly Observations Eilon Solan, Tel Aviv University Ehud Lehrer Tel Aviv University with.
 1. Introduction to game theory and its solutions.  2. Relate Cryptography with game theory problem by introducing an example.  3. Open questions and.
EC941 - Game Theory Lecture 7 Prof. Francesco Squintani
How Bad is Selfish Routing? By Tim Roughgarden Eva Tardos Presented by Alex Kogan.
Regret Minimization and the Price of Total Anarchy Paper by A. Blum, M. Hajiaghayi, K. Ligett, A.Roth Presented by Michael Wunder.
Learning in games Vincent Conitzer
Dynamic Games of Complete Information.. Repeated games Best understood class of dynamic games Past play cannot influence feasible actions or payoff functions.
Bundling Equilibrium in Combinatorial Auctions Written by: Presented by: Ron Holzman Rica Gonen Noa Kfir-Dahav Dov Monderer Moshe Tennenholtz.
A camper awakens to the growl of a hungry bear and sees his friend putting on a pair of running shoes, “You can’t outrun a bear,” scoffs the camper. His.
Rational Learning Leads to Nash Equilibrium Ehud Kalai and Ehud Lehrer Econometrica, Vol. 61 No. 5 (Sep 1993), Presented by Vincent Mak
An Introduction to Game Theory Part IV: Games with Imperfect Information Bernhard Nebel.
1 Best-Reply Mechanisms Noam Nisan, Michael Schapira and Aviv Zohar.
An Introduction to Game Theory Part II: Mixed and Correlated Strategies Bernhard Nebel.
Computational Game Theory
The price of anarchy of finite congestion games Kapelushnik Lior Based on the articles: “ The price of anarchy of finite congestion games ” by Christodoulou.
Beyond selfish routing: Network Formation Games. Network Formation Games NFGs model the various ways in which selfish agents might create/use networks.
AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.
APEC 8205: Applied Game Theory Fall 2007
Robust Mechanisms for Information Elicitation Aviv Zohar & Jeffrey S. Rosenschein The Hebrew University.
Load Balancing, Multicast routing, Price of Anarchy and Strong Equilibrium Computational game theory Spring 2008 Michal Feldman.
Near-Optimal Network Design with Selfish Agents By Elliot Anshelevich, Anirban Dasgupta, Eva Tardos, Tom Wexler STOC’03 Presented by Mustafa Suleyman CIFTCI.
Potential games, Congestion games Computational game theory Spring 2010 Adapting slides by Michal Feldman TexPoint fonts used in EMF. Read the TexPoint.
Convergence Time to Nash Equilibria in Load Balancing Eyal Even-Dar, Tel-Aviv University Alex Kesselman, Tel-Aviv University Yishay Mansour, Tel-Aviv University.
Extensive Game with Imperfect Information Part I: Strategy and Nash equilibrium.
QR 38, 2/22/07 Strategic form: dominant strategies I.Strategic form II.Finding Nash equilibria III.Strategic form games in IR.
Algorithms and Economics of Networks Abraham Flaxman and Vahab Mirrokni, Microsoft Research.
On Bounded Rationality and Computational Complexity Christos Papadimitriou and Mihallis Yannakakis.
DANSS Colloquium By Prof. Danny Dolev Presented by Rica Gonen
Network Formation Games. Netwok Formation Games NFGs model distinct ways in which selfish agents might create and evaluate networks We’ll see two models:
Network Formation Games. Netwok Formation Games NFGs model distinct ways in which selfish agents might create and evaluate networks We’ll see two models:
Inefficiency of equilibria, and potential games Computational game theory Spring 2008 Michal Feldman.
1 Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב Congestion Games, Potential Games and Price of Anarchy Liad Blumrosen ©
CPS Learning in games Vincent Conitzer
The Hat Game 11/19/04 James Fiedler. References Hendrik W. Lenstra, Jr. and Gadiel Seroussi, On Hats and Other Covers, preprint, 2002,
1 Game Theory Sequential bargaining and Repeated Games Univ. Prof.dr. M.C.W. Janssen University of Vienna Winter semester Week 46 (November 14-15)
Chapter 12 Choices Involving Strategy Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written.
Nash equilibrium Nash equilibrium is defined in terms of strategies, not payoffs Every player is best responding simultaneously (everyone optimizes) This.
Dynamic Games of complete information: Backward Induction and Subgame perfection - Repeated Games -
Standard and Extended Form Games A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor, SIUC.
Dynamic Games & The Extensive Form
On a Network Creation Game PoA Seminar Presenting: Oren Gilon Based on an article by Fabrikant et al 1.
Game Theory: introduction and applications to computer networks Game Theory: introduction and applications to computer networks Lecture 2: two-person non.
Game Theory: introduction and applications to computer networks Game Theory: introduction and applications to computer networks Introduction Giovanni Neglia.
Chapters 29, 30 Game Theory A good time to talk about game theory since we have actually seen some types of equilibria last time. Game theory is concerned.
Beyond selfish routing: Network Games. Network Games NGs model the various ways in which selfish agents strategically interact in using a network They.
Beyond selfish routing: Network Games. Network Games NGs model the various ways in which selfish users (i.e., players) strategically interact in using.
Final Lecture. Problem 2, Chapter 13 Exploring the problem Note that c, x yields the highest total payoff of 7 for each player. Is this a Nash equilibrium?
1 What is Game Theory About? r Analysis of situations where conflict of interests is present r Goal is to prescribe how conflicts can be resolved 2 2 r.
1. 2 You should know by now… u The security level of a strategy for a player is the minimum payoff regardless of what strategy his opponent uses. u A.
6.853: Topics in Algorithmic Game Theory Fall 2011 Constantinos Daskalakis Lecture 22.
Negotiating Socially Optimal Allocations of Resources U. Endriss, N. Maudet, F. Sadri, and F. Toni Presented by: Marcus Shea.
Market Design and Analysis Lecture 2 Lecturer: Ning Chen ( 陈宁 )
A useful reduction (SAT -> game)
Multiagent Systems Game Theory © Manfred Huber 2018.
Presented By Aaron Roth
Multiagent Systems Repeated Games © Manfred Huber 2018.
Normal Form (Matrix) Games
Presentation transcript:

Moshe Tennenholtz, Aviv Zohar Learning Equilibria in Repeated Congestion Games

The Nash equilibrium is an important concept Exists for many reasonable games. Provides a good recommendation for a group of players. It assumes that all players are fully aware of the game. Given each other’s strategies, they can compute a best response. Motivation

What about games with missing information? We show the existence of an equilibrium even if players learn the game while playing it, for a broad family of games – repeated symmetric congestion games. Our equilibrium will be in pure strategies (rare even for the regular Nash eq.) It will be very efficient – the total cost of all players will be minimal. The repeated game must be long enough. Motivation

A congestion game [Rosenthal] is defined by: A set of players N={1…n} A set of resources R A set of bundles each player i can pick A cost function for each resource r, that depends only on the number of players that have picked this resource This cost is applied to all players that use this resource. The strategy of a player is to pick a bundle. The cost to a player: the sum of all costs for his bundle. Congestion Games

Number of players: n. Resources: the edges in a graph. Allowed bundles: All simple paths connecting S to T Costs per edge: Example of a Congestion Game S T 0/1/2 0/3/2 0/4/4 2/2/2 0/0/1 2/2/0

Theorem: Every congestion game is a potential game and thus has a pure Nash equilibrium. I.e., there exists a pure strategy profile (each player deterministically picks a bundle) such that no player wishes to change his bundle given the choices of the others. Def: A symmetric congestion game is a game where all players can pick the same set of bundles. Congestion Games

Dfn: A resource selection game is a symmetric congestion game where all possible bundles are of size 1. Example: identical processes running on machines. Every process chooses the machine to run on Running time depends on the number of processes on that machine. Resource Selection Games

Assume that players repeatedly play some congestion game. The number of rounds T is finite. The cost of every resource is unknown initially. At every round, everyone picks a bundle and observes: Their own payment for each resource, The actions of all others. The total payment is the average over the rounds of the game. Is there an equilibrium of some sort? How efficient can we be? Our Setting

Lots of work in learning that tries to converge to a Nash equilibrium of the game while learning it. The problem: The converging strategies themselves are not in equilibrium Work on equilibria in fully known repeated games. Folk theorems guarantee a wide variety of equilibria However, it is less realistic to expect players to fully know the game. Previous (slightly related) work

Due to Brafman and Tennenholtz A form of ex-post equilibrium There is an unknown state of the world S (that affects the payments in the game) Def: Given an unknown repeated game, a strategy profile for the players is a learning equilibrium if no player wishes to change strategy, even if it knows the game. The Learning Equilibrium

Repeated 2 player games have a (mixed) learning equilibrium (if you can see the payment of the other player) [Brafman, Tennenholtz] All repeated symmetric 2 player games have a (mixed) learning equilibrium All repeated monotonic resource selection games have a (mixed) learning equilibrium [Ashlagi, Monderer, Tennenholtz] Our result: A pure equilibrium in all repeated symmetric congestion games (no limit on number of players). Previous (more related) work:

For the remainder of the talk, I will assume agents can communicate (Cheap talk) to coordinate through some channel. This assumption is not a must. Agents can communicate through the repeated game via the actions they take. Such signaling makes the proofs a bit more complex but has little effect on the game (provided that the game is long enough, and we communicate only a finite amount of data) Communication between agents

The cooperative solution is the best we can hope for. Denote its total cost by OPT. The cooperative solution: Players play all combinations of bundles, and learn the cost of each resource for any load. Then compute the optimal joint action Play the joint action while taking turns playing the different roles in it. Each player gets OPT/n cost if the game is long enough. What can we hope to achieve?

For any symmetric congestion game G with n players, and for every ε>0 there exists a number (of rounds) T such that the repeated game that has T rounds in which we play G in every round, has an ε-equilibrium in which each player suffers a cost of at most (OPT/n)+ε. An equivalent statement could be made about infinite games with discounted payments. (Discuss) Our Main Result – Exact Statement

The equilibrium strategy will consist of 3 behaviors. 1. Cooperative learning: Players play all combinations of bundles to learn the game. If some player deviates, start punishing, otherwise start playing optimaly. 2. Playing optimaly Players play optimally taking turns in different roles. If someone deviates, start punishing. 3. Punishment of a deviator This is the tricky part. Proof

Let us start by assuming that the deviation occurs after the game has been learned. Assume w.l.o.g. that player n has deviated. A punishment strategy in this case: All n-1 honest players compute a Nash equilibrium for the congestion game G, while ignoring the n’th player. I.e., they consider the game as having only n-1 players. Note that the equilibrium always exists. How to Punish

Lemma: if all other players play a Nash eq. for n-1 players, the deviating player has a cost no lower than any other player. Proof: Assume that some honest player i gets less than the deviator. Fix all other players. Effective Punishment Bundle played by the deviator. Bundle played by player i.

Total cost of player i was greater. The difference must come from resources that are not shared. Due to symmetry, player I could have picked the deviator’s bundle, and would gain by it (in the game of n-1 players in which player n does not exist) This contradicts the fact that player i is playing a Nash eq. for n-1 players. Bundle played by the deviator. Bundle played by player i.

Our proof so far relied on full knowledge of the game. If the game is unknown, we cannot compute the Nash eq. of n-1 players. In case of missing knowledge, we will optimistically under-evaluate the costs. Punishing when information is missing

Now, players will use these under-evaluated costs when they try to punish a deviator. They will compute the Nash equilibrium of the congestion game with n-1 players, that has resource costs defined by And they will repeatedly play this equilibrium.

Now, again, assume that during some round the deviator pays less than some player i. The difference must come from resources they did not have in common. But then, why didn’t player i switch to the bundle the deviator has? There can be only one reason: He under-evaluated his own bundle. Therefore, he must have observed something new. Bundle played by the deviator. Bundle played by player i.

At every round of punishment, at least one of two things must happen: 1. One of the players learns a previously unknown value 2. or, the deviator has a higher cost than any other player. Once a player learns a new value, he will broadcast it to the other honest players. This way, they have common knowledge of the values found and can continue to compute the Nash eq. strategy. The modified Lemma

If no one deviates, players spend a finite amount of time learning the game, and then play optimally. If the game is long enough, they will gain a payment of OPT/n+ε/2. If one player deviates, he can only do better than the other players for a finite number of rounds. For the rest of the game he gets So if the game is long enough, his gains in the finite number of rounds are dwarfed by this high cost. He gains at best some small ε. Proving the Theorem

What can players observe during the game is critical. The theorem also holds for weaker levels of monitoring. E.g., Let us now assume that players see the actions of other players only where they select the same resources that they have. Can we still detect deviations, punish and coordinate? One of the main problems is communication. Players can still signal, but no longer broadcast to all others at the same time (unless they are on the same resource). Imperfect monitoring

Assume some honest player observes some other player deviating from the proposed strategy. It has to call this into the attention of the other players. He does so by deviating himself, and notifying some of the others. They in turn deviate and notify others, etc. After every player has seen some other player deviate, we have to find out who to punish, and how. Imperfect Monitoring

Each player will signal which other player he has seen deviating, and when this deviation occurred. Everyone suspects the player who has been reported as the deviator in the earliest round. Blaming others. Actual deviator T T+1 T+2

But the deviator may also lie. To throw off the blame, he can try to say that he saw someone else deviate in an earlier round. So the players must suspect: The earliest reported deviator The player that reported him Blaming others. Actual deviator T T+1 T+2 T-1

So how can we punish in this case? Note that the identity of the deviator is important. All other players need to compute a Nash eq. for n-1 players, and play it. Each bundle in the Nash eq. has to be picked by one of the players. Solution: tell both suspect players to pick the same bundle (that is part of the Nash eq.). At least one of them is honest, and will play that strategy. The other player must have a high cost. How to Punish

Another way to restrict the level of monitoring, is to allow players to see only their total cost, without details regarding each resource. If all players see enough combinations of profiles, they will be able to deduce all the needed information about the cost of resources. The problem: There is no way to under-evaluate the costs of resources in a way that can be used to punish the deviator. Non-Detailed monitoring

Let us look at a game with 2 players that has 3 bundles A,B,C. Assume that player 1 has observed the costs in the table below Example Player 1’s actionPlayer 2’s actioncost ACC(A)=1 BAC(B)=1 CBC(C)=1

The scenario is completely symmeric A possible assignment of costs: All 3 symmetric assignments are also possible So in fact, any resource can be valued at cost 0 (when one player visits it) Player 1’s actionPlayer 2’s actioncost ACC(A)=1 BAC(B)=1 CBC(C)=1

There is in fact no sure way to punish the deviating player with a constant pure strategy. If player 1 picks bundle α, the deviator can pick γ and gain without revealing new information A pure strategy that does punish (or learn): Select α,β,γ in sequence

Conjecture: There is a pure strategy learning equilibrium even in the non-detailed monitoring case. Theorem: there is a mixed strategy equilibrium in the case of non detailed monitoring. The equilibrium strategy punishes by playing the Nash equilibrium of the known part of the game, and with some small probability does a random exploratory action.

Theorem: There exists an asymmetric congestion game with no learning equilibrium (not even mixed). In fact, it is even an asymmetric resource selection game. Asymetric Congestion Games Bundles allowed for player 1 Bundles allowed for player or / 10.5 or 1000

If some player has a cost of 1000 on his private resource, his best option is to select the shared resource all the time. If the other player has 0.5 on his resource, his best choice is to play that resource all the time. If both players have 0.5, at least one of them pays more than 0.25 He can pretend to have 1000 on his resource, and play the shared resource, to get a cost of 0. Bundles allowed for player 1 Bundles allowed for player or / 10.5 or 1000

This is a very interesting game. It is quite unclear how to play it rationally. Bundles allowed for player 1 Bundles allowed for player or / 10.5 or 1000

There exists a (symmetric) resource selection game that has no strong equilibrium. Assumption: the deviators can correlate their action. Observe the following game with 3 players: Strong Equilibria in repeated congestion games 1 / 2 / 2

The total cost to all players is at least 5 in any profile. Any pair of players have a cost of at least 3. In any strategy profile there exist 2 players that each have a cost of 1.5 or more, and at least one pays strictly more. These 2 players can deviate, play on different resources, and get a payment of 1.5 each. 1 / 2 / 2