Bilinear Games: Polynomial Time Algorithms for Rank Based Subclasses Ruta Mehta Indian Institute of Technology, Bombay Joint work with Jugal Garg and Albert X. Jiang
A Game: Rock-Paper-Scissor
Rock-Paper-Scissor: A Play Winner $1$1
Rock-Paper-Scissor: A Play Winner $1$1
Rock-Paper-Scissor: A Play Winner $1$1
0,0-1,11,-1 0,0-1,1 1,-10,0 Rock-Paper-Scissor Payoffs
RPC R01 P10 C 10 Bimatrix Game Steady State: No player gains by unilateral deviation RPC R01 P 01 C1 0 S 1 = { R, P, C } S 2 = { R, P, C } AB
RPC R01 P10 C 10 Bimatrix Game No Steady State RPC R01 P 01 C1 0 S 1 = { R, P, C } S 2 = { R, P, C } AB
R 1/3 P 1/3 C 1/3 R01 P10 C 10 Mixed Play Steady State RPC R 1/301 P 1/301 C 1/310 S 1 = { R, P, C } A B ∆ 1 ={r 1, p 1, c 1 ≥0; r 1 +p 1 +c 1 =1} S 1 = { R, P, C } ∆ 2 ={r 2, p 2, c 2 ≥0; r 2 +p 2 +c 2 =1}
John Nash (1951) Finite Game: Finitely many players, each with finitely many strategies. Nash: Every finite game has a steady state in mixed strategy. Hence forth called Nash equilibrium (NE) Proved using Kakutani fixed point theorem: Highly non-constructive.
Nash Equilibrium Computation Papadimitriou (JCSS’94) : PPAD-class Problems where existence is guaranteed like fixed point, Sperner’s Lemma, Nash equilibrium. Chen and Deng (FOCS’06) : It is PPAD-hard. CDT (FOCS’06) : Even approximation is PPAD- hard.
Rank and Computation Kannan and Theobald (SODA’07) : Define rank of (A,B) as rank(A+B). FPTAS for fixed rank games. Polynomial time algorithms for exact Nash. Dantzig (1963) : Zero-sum (rank-0) is equiv. to LP. AGMS (STOC’11) : Rank-1 games.
Bilinear Games Bimatrix Game with polyhedral strategy sets. Two players: 1 and 2 Polyhedral strategy sets: X={x | Ex = e; x ≥ 0}, Y={y | Fy=f; y ≥ 0} Payoff matrices: A, B Bilinear Payoff: (x, y) fetches x T Ay to player 1, and x T By to player 2. Motivation: Koller et al. (STOC’94) for two-player extensive form game with perfect recall.
Nash Equilibrium in Bilinear NE: No player gains by unilateral deviation. Existence: Corollary of Glicksberg’s result. Symmetric Game: B=A T and Y=X. (x, y) is a symmetric profile if y=x. Existence of symmetric NE: An adaptation of Nash’s proof for symmetric bimatrix games.
Bilinear Contains: Bimatrix, Polymatrix, Bayesian, etc. Bimatrix: X = ∆ 1, Y = ∆ 2 Polymatrix: N players. Each pair plays a bimatrix game. Player i: S i finite strategy set, ∆ i Mixed strategy set. Goal of i: Choose x i from ∆ i to maximize total payoff. A ij i j
Polymatrix to Bilinear M= |S 1 |+ … + |S n |. X = {(x 1,…,x n ) | x i in ∆ i }, Y=X. A, B=A T Symmetric NE of (A,B) maps to a NE of the polymatrix game 0 0 A ij 0 0 i j A =
Best Response (Koller et al.) Fix a strategy y of player 2. Player 1 solves max: x T (Ay) min: e T p Ex = e p T E ≥ (Ay) T x ≥ 0 At optimal: p s.t. A i y ≤ p T E i & x i > 0 => A i y = p T E i Given x X, for player 2 we get At optimal: q s.t. B j x ≤ q T F j & y j > 0 => q T F j = B j x
Best Response Polytopes (BRPs) (x,y) is a NE iff p: Ay ≤ E T p; x i > 0 => A i y = p T E i q: x T B ≤ q T F; y j > 0 => q T F j = B j x x T (Ay - E T p) ≤ 0 and (x T B - q T F)y ≤ 0 x T (A+B)y – e T p – f T y ≤ 0
Nash Equilibrium in BRPs NE iff x T (Ay - E T p)=0 and (x T B - q T F)y=0 x T (A+B)y – e T p – f T y=0 Assumption: P and Q are non-degnerate. (u, v) of P x Q gives a NE => (u, v) is a vertex.
QP Formulation max: x T (A+B)y – e T p – f T y s.t.(y, p) P (x, q) Q Optimal value 0. Only vertex solutions.
Our Results Rank-1 games: rank(A+B)=1 Extend Adsul et al. algorithm for exact NE. Fixed rank games: rank(A+B)=k Extend FPTAS of Kannan et al. Rank of A or B is constant Enumerate all NE in polynomial time.
Rank-1 Case Zero-sum ~ rank(A+B)=0: LP formulation (Charnes’53) rank(A+B)=1 then A+B = a. b T The QP formulation: max: (x T a)(b T y) – e T p – f T y s.t.(y, p) P (x, q) Q
Rank-1 Case Replace (x T a) by z. Recall B = -A + a. b T x T (A+B)y – e T p – f T y=0 z(b T y) – e T p – f T y=0 N = Points of P x Q’ with z(b T y) – e T p – f T y=0 Forms paths and cycles, since z gives one degree of freedom. NE of (A,B): Points in intersection of N and z – x T a =0.
Parameterized LP LP(z) = max: z(b T y) – e T p – f T y s.t.(y, p) P (x, z, q) Q’ Given any c, Optimal value of LP(c) is 0. OPT(c) lies on N, and Let N (c)={Points of N with z=c}, then OPT(c)= N (c). N is a single path on which z is monotonic.
Rank-1: The Algorithm NE: Intersection of N and H: z – x T a =0. . c 1 =a min, c 2 =a max H N H–H– H+H+ NE N (c 1 ) N (c 2 )
Rank-1: Binary Search Algorithm NE of (A,B): Points in intersection of N and H. c=c 1 +c 2 /2. H NE N (c 1 ) N (c 2 ) N N (c) H+H+ H–H–
Rank-1: Binary Search Algorithm NE of (A,B): Points in intersection of N and H. c=c 1 +c 2 /2. If N (c) in H –,then c 1 =c else c 2 =c. H NE N (c 2 ) N N (c 1 ) H+H+ H–H–
Analysis Terminates because, z is monotonic on N. Increase in z on each edge is lower bounded by 1/d where d is polynomial sized in the input. Time complexity: Solve LP(c) to get N (c) in each pivot. log(d) * log(a max – a min ) pivots.
Conclusions Bilinear games: Bimatrix with polytopal strategy sets. Fairly general. Contains polymatrix, bayesian, etc. Polynomial time algorithm for rank based subclasses. Open problems: Designing a Lemke-Howson type algorithm. Degree, index, stability concepts. Computation of approximate equilibrium.
Thank You