Communication Complexity as a Lower Bound for Learning in Games

Communication Complexity as a Lower Bound for Learning in Games
Vincent Conitzer and Tuomas Sandholm Computer Science Department Carnegie Mellon University

Overview Little work has been done on finding lower bounds on convergence time for learning in games Here, we derive such lower bounds for settings where the other player’s payoffs are not observed Other player’s actions can be observed We use tools from communication complexity theory to get these lower bounds

Player 2 (simultaneously) chooses a column
Matrix games Player 2 (simultaneously) chooses a column 4, 2 0, 3 3, 0 1, 1 Player 1 chooses a row Player 1’s payoff Player 2’s payoff

Solution concepts (Strict) Dominance: one strategy always better than another 4, 2 0, 3 3, 0 1, 1 4, 2 0, 3 3, 0 1, 1 Iterated Dominance: repeat dominance process (Pure) Nash Equilibrium: pair of strategies so that neither player wants to unilaterally deviate 4, 2 0, 3 3, 0 1, 1

Learning in games (in this paper)
4, 2 0, 3 3, 0 1, 1 4, ? 0, ? 3, ? 1, ? ?, 2 ?, 3 ?, 0 ?, 1

Many different algorithms for learning in games…
Iterated best response Fictitious play … Sometimes don’t converge, but when they do: How do we know there are no faster algorithms?

Communication lower bound
Suppose players are only interested in converging to a solution as soon as possible No concern for payoffs while learning No concern for which of the solutions they converge to They still need to communicate enough information to compute a solution In n x n game, in a round, each player can communicate log(n) bits to the other by choice of strategy If c is the number of bits of communication required to compute the solution… … then require at least c/(2log(n)) rounds to convergence Communication complexity theory tells us c

Two-player model for communication complexity [Yao 79]
Player 1 holds input x, player 2 holds input y They seek to compute the value of a known function f(x, y) Mapping to {0, 1} Players alternatingly send a bit according to protocol When protocol terminates function value should be known Deterministic communication complexity = worst case number of bits sent for one pair x, y for the best protocol Nondeterministic protocols: next bit to be sent chosen nondeterministically Allowed false positives OR false negatives as long as always correct for some nondeterministic choices

A lower bounding technique from communication complexity
Tree of all possible communications: Player 1 1 Player 2 1 1 Every input pair x, y goes to some leaf f=0 f=1 f=1 f=0 x1, y1 x2, y1 x2, y3 If x1, y1 goes to same leaf as x2, y2, then so must x1, y2 and x2, y1 Only possible if f is same in all 4 cases Suppose we have a fooling set {(x1, y1), (x2, y2), …, (xn, yn)} all with f(xi, yi) = f0, but for any i j, either f(xi, yj)  f0 or f(xj, yi)  f0 So all n pairs xi, yi go to different leaves => tree depth  log(n) Also lower bound on nondeterministic comm. complexity With false positives or negatives allowed, depending on f0

Our strategy For each solution concept we study, we give a communication protocol for finding a solution Gives upper bound on communication complexity O(f(n)) Then we give a large fooling set Matching lower bound on communication complexity (f(n)) Fooling set F here is set of games each of which has (does not have) at least one solution, and for any pair of games g1 and g2 in F, either using the row player’s payoffs from g1 and the column player’s payoffs from g2 yields a game that has no solution (at least one solution), or using the row player’s payoffs from g2 and the column player’s payoffs from g1 yields a game that has no solution (at least one solution) 1, 1 0, 0 Row payoffs 0, 0 1, 1 1, 0 0, 1 Column payoffs

Pure Nash – upper bound What is the communication complexity of determining whether a pure-strategy Nash equilibrium exists? One protocol: communicate for every entry whether you would deviate from it 3, ? 2, ? 0, ? 4, ? 1, ? ?, 2 ?, 3 ?, 4 ?, 0 Nash Eq. One bit per entry, so O(n2) communication

Pure Nash – lower bound Consider games with all entries 0, 1 or 1, 0 Almost never have a pure Nash eq Exception: whole row of 1, 0 or whole column of 0, 1 Exclude exceptions from fooling set But for any two different games, can combine payoffs to get 1, 1 Nash equilibrium: 0, 1 1, 0 0, 1 1, 0 Row payoffs Column payoffs 0, 1 1, 0 1, 1 Fooling set of size (almost) 2^(n2), so require (n2) communication

Pure Nash with unique best response – upper bound
What if the column player always has a unique best response? Let just her communicate her best responses ?, 2 ?, 3 ?, 4 ?, 0 Now row player knows whether a pure Nash eq exists can communicate this in one bit 3, ? 2, ? 0, ? 4, ? 1, ? Pointing out a unique best response requires log(n) bits of communication, so O(n log(n)) communication total

Pure Nash with unique best response - lower bound
Consider (generalization to n x n of) this game: No Nash eq Column player has unique best response 0, 1 1, 0 0, 1 1, 0 Any permutation of the rows will still have no Nash eq swapped Row payoffs Column payoffs But combining payoffs as before always yields game with Nash equilibria 0, 1 1, 0 0, 0 1, 1 Fooling set of size n! => require (log(n!)) = (n log(n)) communication

Iterated strict dominance – upper bound
One protocol: let players alternatingly communicate a dominated strategy 3, ? 2, ? 1, ? 4, ? 0, ? ?, 2 ?, 3 ?, 0 ?, 4 ?, 1 ?, 5 3, ? 2, ? 1, ? 4, ? 0, ? ?, 2 ?, 3 ?, 0 ?, 4 ?, 1 ?, 5 Communicating a strategy takes log(n) bits, at most 2n strategies to eliminate, so O(n log(n)) communication

Iterated strict dominance lower bound
0, 1 1, 0 1, 1 Consider (generalization to n x n of) this game: 0, 1 1, 0 1, 1 Any permutation of the rows will also solve: swapped But combining the payoffs will not solve: Row payoffs Column payoffs 0, 1 1, 0 1, 1 0, 0 Fooling set of size n!, so require at least (log(n!)) = (n log(n)) communication STUCK!

Conclusions & additional results
Pure Nash equilibrium: (n2) Pure Nash eq with unique best response: (n log n) Iterated strict dominance: (n log n) Whether or not dominance by mixed strategies allowed Iterated weak dominance: (n log n) O(n log n) only known to hold for nondeterministic Backward induction: (n) where n = #nodes in tree Divide by log n to get minimum #rounds needed

Future research How do specific learning algorithms compare?
Can we find sensible learning algorithms that achieve the lower bounds? Taking into account agents’ preferences over solutions and over paths of learning If not, can we get stronger lower bounds by restricting the allowed communication? E.g., allow players to only communicate with actions that appear to give good payoffs

Thank you for your attention!

Communication Complexity as a Lower Bound for Learning in Games

Similar presentations

Presentation on theme: "Communication Complexity as a Lower Bound for Learning in Games"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Communication Complexity as a Lower Bound for Learning in Games

Similar presentations

Presentation on theme: "Communication Complexity as a Lower Bound for Learning in Games"— Presentation transcript:

Similar presentations

About project

Feedback