Best Reply Mechanisms Justin Thaler and Victor Shnayder.

Best Reply Mechanisms Justin Thaler and Victor Shnayder

What are best-reply dynamics? Start with an arbitrary strategy profile In each step let some player switch his strategy to be a best reply to the current strategies of the others.

What are best-reply dynamics? Definition: A repeated-reply mechanism for a private info game G: Extensive form game with perfect recall (same players) At most M steps. In each step: A single player announces an element of A i Players play in round-robin order Stop when all players “pass” in n consecutive steps. Enforce action proﬁle of the most recently announced actions If M steps go by without stopping, penalize the players.

What are best-reply dynamics? Need a penalty to ensure non-convergence is not in best interest of any player. Realistic modeling assumption for BGP, TCP, etc. Best-reply dynamics is the strategy profile of a repeated-reply mechanism in which each player i updates to i’s best-reply to the other players’ strategies each time it is i’s turn.

Why best reply dynamics? If convergence occurs, we have a highly justifiable Nash Equilibrium Computationally simple Players only need private information Feasible in distributed, asynchronous settings Prescribed by existing protocols (Ex: BGP)

Why best reply dynamics? In light of Theorems 1 and 2 (which we’ll see soon): Often gives a non-VCG way of creating incentive compatible mechanisms (?). And sometimes without $$$. Often get collusion-proofness, Pareto- efficiency

Outline When do best reply dynamics work? Universal max-solvability (UMS) Thm: UMS implies convergence to unique NE, collusion-proofness Example applications (correlated markets, BGP, etc) Connections to strategy-proofness Discussion

Universal max- dominance A subset T of S is universally max- dominated if: Very strong condition! Existence of max-dominated set is strictly stronger than existence of dominated strategy. Exists s i, s i ’ s.t. u i (s i, s -i ) < u i (s i ’, s -i ) for all s -i

Universal max- solveability (UMS) A game G is universally max-solvable if we can iteratively remove universally max- dominated strategy sets and get to a single strategy for each player. Stronger condition than solvable by iterated removal of strictly dominated strategies (IRSDS)

Example 1 5, 50, 0 10, 04, 4 Solvable by IRSDS, but not UMS. Neither player has a universally max-dominated set. Note unique NE is not PE, and best-reply dynamics are not incentive compatible for the row player.

Example 2 0, 1 1, 11, 0 UMS

Example 3 (UMS) 1, 92, 9 3, 13, 2 3, 14, 35, 4 L M R A C B

Theorems Theorem 1: G is UMS ⇒ G has unique, pure NE, and it is collusion-proof. Corollary: Collusion-proof NE ⇒ NE is Pareto optimal Theorems Note that solvable by IRSDS suffices for unique, pure NE. UMS is needed for collusion-proofness and PE.

Proof of theorem 1: By contradiction: G is UMS, so fix an elimination sequence of dominated strategy-sets. Let s* be the final strategy profile. If s* is not collusion proof NE, some set of players T can deviate and be better off. Let s be new strategies where players in T change strategy from s* Let s i be first strategy eliminated. Then it was max-dominated, so s i * is strictly better, so i can’t be better off.

Example 1 5, 50, 0 10, 04, 4 Solvable by IRSDS, but not UMS. Neither player has a universally max-dominated set. Note unique NE is not PE, and best-reply dynamics are not incentive compatible for the row player.

Theorems Theorem 2: If G is UMS with private information, then best reply dynamics are incentive-compatible in ex-post NE, and converge to the unique NE of the induced full-information game. Theorems Proof: Similar to Theorem 1. The main idea is that a strategy eliminated in the t‘th step of the UMS elimination process can never be used after the nt’th step of the best-reply mechanism.

Correlated two-sided markets Agents: buyers and sellers Game: weighted bipartite graph -- buyers on one side, sellers on the other Buyers have preference order over sellers (higher edge weight = higher preference) Sellers prefer buyers connected by heavier edges

Correlated two-sided markets are UMS Let e be maximum weight edge. Choosing it universally max-dominates all other strategies of both endpoints. Remove the two endpoints of e and all incident edges, repeat. Therefore, best reply dynamics converge to ex-post NE.

Extended Example: BGP

Internet routing: BGP Receive update messages from neighbours announcing routes to d. Choose a single neighbor, whose route you prefer most, to send tra ﬃ c through. Announce your new route to all your neighbors d12 12d 1d 21d 2d

Internet routing: BGP BGP is asynchronous, distributed Prescribes best-reply dynamics But does BGP converge? And is BGP “incentive compatible”? Do ASes have an incentive to deviate from the protocol?

Does BGP Converge? We can break this into two questions: Does a stable solution even exist in the static game? If so, will BGP find such a solution? But we only need one answer.

Does a Stable Solution Exist? d123 13d 1d 21d 2d 32d 3d No stable solution exists! It is actually NP- complete to determine existence in general networks

Does BGP Converge When A Stable Solution Exists? d12 12d 1d 21d 2d Notice that multiple NE exist. And asynchronous best-reply dynamics do not necessarily converge. So must not be UMS.

So What Do We Do? Approach #1: Use mechanism design to achieve IC convergence, but solution must be distributed. Approach #2: Identify conditions (on network topology and/or AS preferences) under which BGP converges and is IC. Both approaches are canonical problems in Distributed Algorithmic Mechanism Design.

Approach #2 for Convergence Griffin et al. (1999): If BGP fails to converge, then there exists a Dispute Wheel. Each u i would rather route clockwise through u i+1 than Q i Image Source: Levin et al. “Internet Routing and Games,” 2008.

Approach #2 for Convergence Gao and Rexford (2001): Identified reasonable conditions based on economic structure of the Internet that guarantee No Dispute Wheel and hence convergence. (No bounds on convergence rate given). But limited progress made until recently on conditions for guaranteeing that BGP is IC.

Approach #2 for Incentive Compatibility Theorem 3: Assuming non-convergence after n 3 rounds is a penalty, and No Dispute Wheel holds, then routing games are UMS. Corollary: Under the above conditions, best- reply strategies are IC in collusion-proof ex-post NE. Corollary: Under the Gao-Rexford conditions, BGP converges in O(n 3 ) time and is IC.

Theorem 3 Proof sketch: The case of finding the first universally max-dominated action set is general. Find a node a 1 with at least 2 actions. Let R be a 1 ’s most preferred existing route. One of two cases must occur:

Theorem 3 1.Every node a 2 on R prefers the suffix of R leading from a 2 to d. In this case, if u is the closest node to d on R with at least two actions, then (u, d) universally max- dominates all other actions of u, and we’re done. 2. Some node a 2 on R prefers some other path over the suffix of R leading from a 2 to d. In this case, we repeat the analysis at a 2. Eventually we either form a dispute wheel or find ourselves in Case 1.

What’s left in Routing? Complete characterization of BGP convergence (No Dispute Wheel sufficient, not necessary). Conditions for convergence to globally optimal solution. Can it even be efficiently found? Do mechanism design and/or $$$ have a role to play? Changes in network topology?

Other applications Congestion control Criticism: Best-reply dynamics are only somewhat descriptive of how TCP works in practice. Cost sharing games Matching games (stable-roommate, intern assignment) Auctions (unit demand bidders, GSP) Relies a lot on VCG results Main contribution is proof of convergence! (opposite of BGP)

Relationship to DSIC Outcomeθ Ex-postNE Play s(θ) Given UMS game, best-replying is a strategy that gives ex-post NE. Get a direct-revelation, dominant strategy IC mechanism. Good: New way to create DSIC mechanisms. Bad: Impossibility results limit the class of problems amenable to this approach (at least without money or limits on preferences).

Discussion What is the main contribution? 1. Sufficient conditions for IC convergence of best-reply dynamics. General enough to encompass many applications, esp. BGP. 2. Bounds on time to convergence. 3. New framework for developing IC mechanisms?

Next Steps 1. Necessary conditions for best-reply dynamics to converge? To be IC (under what definition?)? 2. Better-reply dynamics? Other types of dynamics aka algorithms? What types of dynamics are reasonable or “natural”?

Economists and Complexity See recent blog post by Noam Nisan: Does complexity of equilibria matter? Kamal Jain: “If your laptop can’t find it then neither can the market“. Jeff Ely: “Solving the n-body problem is beyond the capabilities of the world’s smartest mathematicians. How do those rocks-for-brains planets manage to do pull it off?“

Best Reply Mechanisms Justin Thaler and Victor Shnayder.

Similar presentations

Presentation on theme: "Best Reply Mechanisms Justin Thaler and Victor Shnayder."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Best Reply Mechanisms Justin Thaler and Victor Shnayder.

Similar presentations

Presentation on theme: "Best Reply Mechanisms Justin Thaler and Victor Shnayder."— Presentation transcript:

Similar presentations

About project

Feedback