When is it Best to Best-Reply? Michael Schapira (Yale University and UC Berkeley) Joint work with Noam Nisan (Hebrew U), Gregory Valiant (UC Berkeley) and Aviv Zohar (Hebrew U)
Motivation: Internet Routing Establish routes between Autonomous Systems (ASes). Currently handled by the Border Gateway Protocol (BGP). AT&T Qwest Comcast Sprint
Internet Routing as a Game [Levin-S-Zohar] Internet routing is a game! –players = ASes –players’ types = preferences over routes –strategies = routes BGP = Best-Response Dynamics –each AS constantly selects its best available route to each destination –… until a “stable state” (= PNE) is reached.
But… Challenge I: No synchronization of players’ actions –players can best-reply simultaneously. –players can best-reply based on outdated information. –When is BGP guaranteed to converge to a stable state? Challenge II: Are players incentivized to follow best-response dynamics? –Can an AS gain from not executing BGP?
Agenda Mechanism design approach to best- response dynamics. (main focus of this talk) Convergence of best-response dynamics in asynchronous environments. [Jaggard-S-Wright] (if time permits)
Main Questions When is myopic best-replying also good in the long run? When can stable outcomes be implemented in partial-information settings? Can we reason about partial-information settings via complete-information games?
Our Results Have Implications For Internet protocols –Internet routing (BGP), congestion control (TCP) Auctions –1 st -price auctions, unit-demand auctions, GSP Matching –correlated markets, interns and hospitals Cost-sharing mechanisms –Moulin mechanisms, …
1 st Price Auction Bids B:2A:3A:2A:1A:0A:-1 1B:1 A:2A:1A:0A:-1 2B:0 A:1A:0A:-1 3B:-1 A:0A:-1 Alice (v a =4) Bob (v b =2) winner:utility
Bids B:2A:3A:2A:1A:0A:-1 1B:1 A:2A:1A:0A:-1 2B:0 A:1A:0A:-1 3B:-1 A:0A:-1 Alice (v a =4) Bob (v b =2) Ascending-Price English Auction
Bids B:2A:3A:2A:1A:0A:-1 1B:1 A:2A:1A:0A:-1 2B:0 A:1A:0A:-1 3B:-1 A:0A:-1 Alice (v a =4) Bob (v b =2) Best-Reply (with some-tie breaking)
The Model n players Player i has –action set A i –(private) type t i єT i –utility function u i
The Model: Dynamic Interaction Discrete time steps. Initial action profile a 0. One player is activated in each time step –round-robin (cyclic) order –our results are independent of the order (and also hold for asynchronous environments) Players’ strategies specify which actions are selected in each time step. –can be history-dependent Best-response dynamics = the strategy profile in which each player constantly best-replies to others’ actions
Two Possible Payoff Models Cumulative model –Payoffs are accumulated –Alternative formulation with discount factors Payoff at the limit –If the dynamics converges to a stable outcome a* –If no convergence, the resulting payoff is low. More natural. sometimes too restrictive Weaker (actively discourages oscillations), interesting applications
Solution Concept A strategy profile is an ex-post Nash equilibrium if no player wishes to deviate from regardless of the types (this is essentially the best possible in a distributed environment [Shneidman-Parkes] )
2,12,10,00,0 3,03,01,31,3 Row Player: Type 1 3,13,11,01,0 2,02,00,30,3 Row Player: Type 2 Best-Replying is Not Always Best dominance-solvable potential game unique and Pareto optimal PNE
When is it Good to Best-Reply? Goal: identify a class of games in which best-response dynamics is an ex-post Nash equilibrium. –i.e., best-replying is incentive-compatible –close in spirit to “learning equilibria” [Brafman-tennenholtz] This class is going to be VERY restricted. Still… a variety of mechanisms/protocols. Remark: The best replies are not always unique. Thus, we must handle tie-breaking.
One Class of Games Lemma: If each realization of types yields a game in which each player has a single dominant strategy, then best-response dynamics is an ex-post Nash equilibrium.
9,09,01,11,11,31,3 10,00,20,20,10,1 0,10,10,3 9,09,01,21,21,11,1 no player has a dominant strategy (in both realizations). best-response dynamics is an ex-post Nash equilibrium. This game is blindly solvable. On the Other Hand… Row Player: Type 1 Row Player: Type 2
Blindly-Dominated Strategy Sets T
Blindly-Solvable Games Defn: A game is blindly-solvable if iterated elimination of blindly-dominated strategy sets results in a single strategy profile. –Observation: the “surviving” strategy profile is the unique PNE of the game. Defn: A partial-information game is blindly- solvable if every realization of types yields a blindly-solvable game.
Bids B:2A:3A:2A:1A:0A:-1 1B:1 A:2A:1A:0A:-1 2B:0 A:1A:0A:-1 3B:-1 A:0A:-1 Alice (v a =4) Bob (v b =2) 1 st -Price Auctions Revisited
Merits of Blindly-Solvable Games Thm: Let G be a blindly-solvable partial- information game. Let a* be the surviving strategy profile. Then, 1.Best-response dynamics converges to a* within n( j |A j |) time steps. 2.In the “payoff at the limit” model, best- response dynamics is incentive- compatible, and even collusion-proof, in ex-post Nash.
Intuition for Proof of (2) The first action that was not “eliminated” in the elimination sequence of G must belong to a manipulator. The manipulator’s utility from that action is lower than his utility from a*.
Bids B:2A:3A:2A:1A:0A:-1 1B:1 A:2A:1A:0A:-1 2B:0 A:1A:0A:-1 3B:-1 A:0A:-1 Alice (v a =4) Bob (v b =2) Best-Response 1 st -Price Auction Mechanism
Implications for Internet Environments Under realistic conditions routing with the Border Gateway Protocol is incentive compatible. [Levin-S-Zohar] Convergence and incentive compatibility results for congestion control. [Godfrey-S-Zohar-Shenker] Mechanism design without money!
Generalized 2 nd -Price Auction (GSP) Used for selling ads on search engines. k slots. Each slot j with click-through-rate j. Users submit bids (per click) b i. They are ranked in order of bids. If ad is clicked: pay next highest bid.
No dominant strategy equilibrium. There exists an equilibrium with VCG payments. [Edelman-Ostrovsky-Schwarz, Varian] Best-response dynamics (with tie- breaking) converge with probability 1 to that equilibrium. [Cary et al.] Thm (informal): Best-replying in GSP is incentive-compatible. –Generalizes the English auction of [Edelman-Ostrovsky-Schwarz] Generalized 2 nd -Price Auction (GSP)
Auctions With Unit-Demand Bidders n bidders. m items. Each bidder i has value v i,j for each item j, and is interested in at most one item. Thm: There exists a best-response mechanism for auctions with unit-demand bidders that is incentive-compatible in ex-post Nash and converges to the VCG outcome. –Generalizes the English auction of [Demange-Gale-Sotomayer] The proof of incentive-compatibility is simple. The proof of convergence is more complex and is based on Kuhn’s Hungarian method.
Centralized vs. Distributed players declare types output the outcome simulate interaction players reach a stable outcome in a distributed manner ex-post equilibrium in the decentralized setting dominant strategy implementation in the centralized setting. centralized distributed
The Centralized Setting Each player i has an action set A i, a private type t i, and a utility function u i (as before). Wanted: a direct revelation mechanism that outputs a pure Nash equilibrium of the game. and incentivizes truthfulness
2,12,10,00,0 3,03,01,31,3 Row Player: Type 1 3,13,11,01,0 2,02,00,30,3 Row Player: Type 2 Clearly, This is Not Always Possible
Corollary I If every player has a single dominant strategy in every realization, then the direct-revelation mechanism is truthful. –Give each player his dominant strategy in the reported realization.
Corollary II If the game is blindly solvable, then the direct-revelation mechanism is truthful. 9,09,01,11,11,31,3 10,00,20,20,10,1 0,10,10,3 9,09,01,21,21,11,1 Row Player: Type 1 Row Player: Type 2
More Blindly-Solvable Games Cost-Sharing mechanisms –Moulin mechanisms [Moulin, Moulin-Shenker] –Acyclic mechanisms [Mehta-Roughgarden-Sundararajan] Matching games –Interns and Hospitals –Correlated two sided markets
Directions for Future Research Implementability of other kinds of equilibria (mixed Nash, correlated, …)? Incentive-compatibility of other kinds of dynamics (fictitious play, regret minimization)?
Agenda Part I: mechanism design approach to best-response dynamics. Part II: on the convergence of best-response dynamics in asynchronous environments. Best-Response Dynamics Out of Sync
Synchronous Environments In traditional best-response dynamics players are activated one at a time. More generally, the study of game dynamics normally supposes synchrony. What if the interaction between players is asynchronous? (Internet, markets)
Illustration 2,12,1 0,00,0 1,21,2 0,00,0 Row Player Column Player
Illustration 2,12,1 0,00,0 1,21,2 0,00,0 Row Player Column Player
But… 2,12,1 0,00,0 1,21,2 0,00,0 Row Player Column Player
Infinite sequence of discrete time-steps In each time-step a subset of the players best-replies. The “schedule” is chosen by an adversarial entity (“the Scheduler”). The schedule must be fair (no player is indefinitely “starved” from best-replying). Model for Analyzing Asynchronous Best-Response Dynamics
Thm: If two pure Nash equilibria (or more) exist in a game then asynchronous best-reply dynamics can potentially oscillate. Implications for Internet protocols, diffusion of innovations in social networks, and more. Result [Jaggard-S-Wright]
Directions for Future Research Characterization of games for which asynchronous best- response dynamics converge. More generally, exploring game dynamics in the realm that lies beyond synchronization (fictitious play, regret minimization).