Moshe Tennenholtz, Aviv Zohar Learning Equilibria in Repeated Congestion Games
The Nash equilibrium is an important concept Exists for many reasonable games. Provides a good recommendation for a group of players. It assumes that all players are fully aware of the game. Given each other’s strategies, they can compute a best response. Motivation
What about games with missing information? We show the existence of an equilibrium even if players learn the game while playing it, for a broad family of games – repeated symmetric congestion games. Our equilibrium will be in pure strategies (rare even for the regular Nash eq.) It will be very efficient – the total cost of all players will be minimal. The repeated game must be long enough. Motivation
A congestion game [Rosenthal] is defined by: A set of players N={1…n} A set of resources R A set of bundles each player i can pick A cost function for each resource r, that depends only on the number of players that have picked this resource This cost is applied to all players that use this resource. The strategy of a player is to pick a bundle. The cost to a player: the sum of all costs for his bundle. Congestion Games
Number of players: n. Resources: the edges in a graph. Allowed bundles: All simple paths connecting S to T Costs per edge: Example of a Congestion Game S T 0/1/2 0/3/2 0/4/4 2/2/2 0/0/1 2/2/0
Theorem: Every congestion game is a potential game and thus has a pure Nash equilibrium. I.e., there exists a pure strategy profile (each player deterministically picks a bundle) such that no player wishes to change his bundle given the choices of the others. Def: A symmetric congestion game is a game where all players can pick the same set of bundles. Congestion Games
Dfn: A resource selection game is a symmetric congestion game where all possible bundles are of size 1. Example: identical processes running on machines. Every process chooses the machine to run on Running time depends on the number of processes on that machine. Resource Selection Games
Assume that players repeatedly play some congestion game. The number of rounds T is finite. The cost of every resource is unknown initially. At every round, everyone picks a bundle and observes: Their own payment for each resource, The actions of all others. The total payment is the average over the rounds of the game. Is there an equilibrium of some sort? How efficient can we be? Our Setting
Lots of work in learning that tries to converge to a Nash equilibrium of the game while learning it. The problem: The converging strategies themselves are not in equilibrium Work on equilibria in fully known repeated games. Folk theorems guarantee a wide variety of equilibria However, it is less realistic to expect players to fully know the game. Previous (slightly related) work
Due to Brafman and Tennenholtz A form of ex-post equilibrium There is an unknown state of the world S (that affects the payments in the game) Def: Given an unknown repeated game, a strategy profile for the players is a learning equilibrium if no player wishes to change strategy, even if it knows the game. The Learning Equilibrium
Repeated 2 player games have a (mixed) learning equilibrium (if you can see the payment of the other player) [Brafman, Tennenholtz] All repeated symmetric 2 player games have a (mixed) learning equilibrium All repeated monotonic resource selection games have a (mixed) learning equilibrium [Ashlagi, Monderer, Tennenholtz] Our result: A pure equilibrium in all repeated symmetric congestion games (no limit on number of players). Previous (more related) work:
For the remainder of the talk, I will assume agents can communicate (Cheap talk) to coordinate through some channel. This assumption is not a must. Agents can communicate through the repeated game via the actions they take. Such signaling makes the proofs a bit more complex but has little effect on the game (provided that the game is long enough, and we communicate only a finite amount of data) Communication between agents
The cooperative solution is the best we can hope for. Denote its total cost by OPT. The cooperative solution: Players play all combinations of bundles, and learn the cost of each resource for any load. Then compute the optimal joint action Play the joint action while taking turns playing the different roles in it. Each player gets OPT/n cost if the game is long enough. What can we hope to achieve?
For any symmetric congestion game G with n players, and for every ε>0 there exists a number (of rounds) T such that the repeated game that has T rounds in which we play G in every round, has an ε-equilibrium in which each player suffers a cost of at most (OPT/n)+ε. An equivalent statement could be made about infinite games with discounted payments. (Discuss) Our Main Result – Exact Statement
The equilibrium strategy will consist of 3 behaviors. 1. Cooperative learning: Players play all combinations of bundles to learn the game. If some player deviates, start punishing, otherwise start playing optimaly. 2. Playing optimaly Players play optimally taking turns in different roles. If someone deviates, start punishing. 3. Punishment of a deviator This is the tricky part. Proof
Let us start by assuming that the deviation occurs after the game has been learned. Assume w.l.o.g. that player n has deviated. A punishment strategy in this case: All n-1 honest players compute a Nash equilibrium for the congestion game G, while ignoring the n’th player. I.e., they consider the game as having only n-1 players. Note that the equilibrium always exists. How to Punish
Lemma: if all other players play a Nash eq. for n-1 players, the deviating player has a cost no lower than any other player. Proof: Assume that some honest player i gets less than the deviator. Fix all other players. Effective Punishment Bundle played by the deviator. Bundle played by player i.
Total cost of player i was greater. The difference must come from resources that are not shared. Due to symmetry, player I could have picked the deviator’s bundle, and would gain by it (in the game of n-1 players in which player n does not exist) This contradicts the fact that player i is playing a Nash eq. for n-1 players. Bundle played by the deviator. Bundle played by player i.
Our proof so far relied on full knowledge of the game. If the game is unknown, we cannot compute the Nash eq. of n-1 players. In case of missing knowledge, we will optimistically under-evaluate the costs. Punishing when information is missing
Now, players will use these under-evaluated costs when they try to punish a deviator. They will compute the Nash equilibrium of the congestion game with n-1 players, that has resource costs defined by And they will repeatedly play this equilibrium.
Now, again, assume that during some round the deviator pays less than some player i. The difference must come from resources they did not have in common. But then, why didn’t player i switch to the bundle the deviator has? There can be only one reason: He under-evaluated his own bundle. Therefore, he must have observed something new. Bundle played by the deviator. Bundle played by player i.
At every round of punishment, at least one of two things must happen: 1. One of the players learns a previously unknown value 2. or, the deviator has a higher cost than any other player. Once a player learns a new value, he will broadcast it to the other honest players. This way, they have common knowledge of the values found and can continue to compute the Nash eq. strategy. The modified Lemma
If no one deviates, players spend a finite amount of time learning the game, and then play optimally. If the game is long enough, they will gain a payment of OPT/n+ε/2. If one player deviates, he can only do better than the other players for a finite number of rounds. For the rest of the game he gets So if the game is long enough, his gains in the finite number of rounds are dwarfed by this high cost. He gains at best some small ε. Proving the Theorem
What can players observe during the game is critical. The theorem also holds for weaker levels of monitoring. E.g., Let us now assume that players see the actions of other players only where they select the same resources that they have. Can we still detect deviations, punish and coordinate? One of the main problems is communication. Players can still signal, but no longer broadcast to all others at the same time (unless they are on the same resource). Imperfect monitoring
Assume some honest player observes some other player deviating from the proposed strategy. It has to call this into the attention of the other players. He does so by deviating himself, and notifying some of the others. They in turn deviate and notify others, etc. After every player has seen some other player deviate, we have to find out who to punish, and how. Imperfect Monitoring
Each player will signal which other player he has seen deviating, and when this deviation occurred. Everyone suspects the player who has been reported as the deviator in the earliest round. Blaming others. Actual deviator T T+1 T+2
But the deviator may also lie. To throw off the blame, he can try to say that he saw someone else deviate in an earlier round. So the players must suspect: The earliest reported deviator The player that reported him Blaming others. Actual deviator T T+1 T+2 T-1
So how can we punish in this case? Note that the identity of the deviator is important. All other players need to compute a Nash eq. for n-1 players, and play it. Each bundle in the Nash eq. has to be picked by one of the players. Solution: tell both suspect players to pick the same bundle (that is part of the Nash eq.). At least one of them is honest, and will play that strategy. The other player must have a high cost. How to Punish
Another way to restrict the level of monitoring, is to allow players to see only their total cost, without details regarding each resource. If all players see enough combinations of profiles, they will be able to deduce all the needed information about the cost of resources. The problem: There is no way to under-evaluate the costs of resources in a way that can be used to punish the deviator. Non-Detailed monitoring
Let us look at a game with 2 players that has 3 bundles A,B,C. Assume that player 1 has observed the costs in the table below Example Player 1’s actionPlayer 2’s actioncost ACC(A)=1 BAC(B)=1 CBC(C)=1
The scenario is completely symmeric A possible assignment of costs: All 3 symmetric assignments are also possible So in fact, any resource can be valued at cost 0 (when one player visits it) Player 1’s actionPlayer 2’s actioncost ACC(A)=1 BAC(B)=1 CBC(C)=1
There is in fact no sure way to punish the deviating player with a constant pure strategy. If player 1 picks bundle α, the deviator can pick γ and gain without revealing new information A pure strategy that does punish (or learn): Select α,β,γ in sequence
Conjecture: There is a pure strategy learning equilibrium even in the non-detailed monitoring case. Theorem: there is a mixed strategy equilibrium in the case of non detailed monitoring. The equilibrium strategy punishes by playing the Nash equilibrium of the known part of the game, and with some small probability does a random exploratory action.
Theorem: There exists an asymmetric congestion game with no learning equilibrium (not even mixed). In fact, it is even an asymmetric resource selection game. Asymetric Congestion Games Bundles allowed for player 1 Bundles allowed for player or / 10.5 or 1000
If some player has a cost of 1000 on his private resource, his best option is to select the shared resource all the time. If the other player has 0.5 on his resource, his best choice is to play that resource all the time. If both players have 0.5, at least one of them pays more than 0.25 He can pretend to have 1000 on his resource, and play the shared resource, to get a cost of 0. Bundles allowed for player 1 Bundles allowed for player or / 10.5 or 1000
This is a very interesting game. It is quite unclear how to play it rationally. Bundles allowed for player 1 Bundles allowed for player or / 10.5 or 1000
There exists a (symmetric) resource selection game that has no strong equilibrium. Assumption: the deviators can correlate their action. Observe the following game with 3 players: Strong Equilibria in repeated congestion games 1 / 2 / 2
The total cost to all players is at least 5 in any profile. Any pair of players have a cost of at least 3. In any strategy profile there exist 2 players that each have a cost of 1.5 or more, and at least one pays strictly more. These 2 players can deviate, play on different resources, and get a payment of 1.5 each. 1 / 2 / 2