A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California Proc. Allerton Conference on Communication, Control, and Computing, Oct Game manager Player 1 Player 2 Player 3 Player 4 Player 5
Game structure Slotted time t in {0, 1, 2, …}. N players, 1 game manager. Slot t utility for each player depends on: (i) Random events ω(t) = (ω 0 (t), ω 1 (t),…,ω N (t)) (ii) Control actions α(t) = (α 1 (t), …, α N (t)) Players Maximize time average utility. Game manager Provides suggestions. Maintains fairness of utilities subject to equilibrium constraints.
Random events ω(t) Player i sees ω i (t). Manager sees: ω(t) = (ω 0 (t), ω 1 (t), …, ω N (t)) Game manager Player 1 ω 1 (t) Player 2 ω 2 (t) Player 3 ω 3 (t) (ω 0 (t), ω 1 (t), …, ω Ν (t)) Only known to manager!
Random events ω(t) Player i sees ω i (t). Manager sees: ω(t) = (ω 0 (t), ω 1 (t), …, ω N (t)) Vector ω(t) is i.i.d. over slots (components are possibly correlated) Game manager Player 1 ω 1 (t) Player 2 ω 2 (t) Player 3 ω 3 (t) (ω 0 (t), ω 1 (t), …, ω Ν (t))
Actions and utilities Manager sends suggested actions M i (t). Players take actions α i (t) in A i. U i (t) = u i ( α(t), ω(t) ). Game manager Player 1 α 1 (t) Player 2 α 2 (t) Player 3 α 3 (t) (ω 0 (t), ω 1 (t), …, ω Ν (t)) M 1 (t) M 2 (t) M 3 (t)
Example: Wireless MAC game Manager knows current channel conditions: ω 0 (t) = (C 1 (t), C 2 (t), …, C N (t)) Users do not have this knowledge: ω i (t) = NULL User 1 User 2 User 3 Access Point C 1 (t) C 2 (t) C 3 (t)
Example: Economic market ω 0 (t) = vector of current prices. Prices are commonly known to everyone: ω i (t) = ω 0 (t) for all i. Game manager Player 1 Player 2 Player 3 ω 0 (t) = [price HAM (t)] [price EGGS (t)]
Participation At beginning of game, players choose either: (i) Participate: Receive messages M i (t). Always choose α i (t) = M i (t). (ii) Do not participate: Do not receive messages M i (t). Can choose α i (t) however they like. Need incentives for participation…
Participation At beginning of game, players choose either: (i) Participate: Receive messages M i (t). Always choose α i (t) = M i (t). (ii) Do not participate: Do not receive messages M i (t). Can choose α i (t) however they like. Need incentives for participation… Nash equilibrium (NE) Correlated equilibrium (CE) Coarse Correlated Equilibrium (CCE)
ΝΕ for Static Game Consider special case with no ω(t) process. Nash equilibrium (NE): Players actions are independent: Pr[α] = Pr[α 1 ]Pr[α 2 ]…Pr[α N ] Game manager not needed. Definition: Distribution Pr[α] is a Nash equilibrium (NE) if no player can benefit by unilaterally changing its action probabilities. Finding a NE in a general game is a nonconvex problem!
CΕ for Static Game Manager suggests actions α(t) i.i.d. Pr[α]. Suppose all players participate. Definition: [Aumann 1974, 1987] Distribution Pr[α] is a Correlated Equilibrium (CE) if: E[ U i (t)| α i (t)=α ] ≥ E[ u i (β, α{-i}) | α i (t)=α] for all i in {1, …, N}, all pairs α, β in A i. LP with | A 1 | 2 + | A 2 | 2 + … + | A N | 2 constraints
Criticism of CE Manager gives suggestions M i (t) to players even if they do not participate. Without knowing message M i (t) = α i : Player i only knows a-priori likelihood of other player actions via joint distribution Pr[α]. Knowing M i (t) = α i : Player i knows a-posteriori likelihood of other player actions via conditional distribution Pr[α | α i ]
CCΕ for Static Game Manager suggests α(t) i.i.d. Pr[α]. Gives suggestions only to participating players. Suppose all players participate. Definition: [Moulin and Vial, 1978] Distribution Pr[α] is a Coarse Corr. Eq. (CCE) if: E[ U i (t) ] ≥ E[ u i (β, α{-i}) ] for all i in {1, …, N}, all pairs β in A i. LP with | A 1 | + | A 2 | + … + | A N | constraints. ( significantly less complex! )
Superset Theorem The NE, CE, CCE definitions extend easily to the stochastic game. Theorem: {all NE} {all CE} {all CCE}
Example (static game) Player 1 Player 2 Utility function 1Utility function Player 1 Player Avg. Utility 2 Avg. Utility 1 (3.5, 2.4) (3.5, 9.3) (3.87, 3.79) NE and CE point All players benefit if non-participants are denied access to the suggestions of the game manager. CCE region
Pure strategies for stochastic games Player i observes: ω i (t) in Ω i Player i chooses: α i (t) in A i Definition: A pure strategy for player i is a function b i : Ω i A i. There are | A i | |Ωi| pure strategies for player i. Define S i as this set of pure strategies. ΩiΩi AiAi bi(ωi)bi(ωi)
Stochastic optimization problem Subject to: U i ≥ U i (s) for all i in {1, …, N} for all s in S i φ ( U 1, U 2, …, U N ) Maximize: α(t) in A 1 x A 2 x … x A N for all t in {0, 1, 2, …} 1) 2) Concave fairness function CCE Constraints
Lyapunov optimization approach U i ≥ U i (s) for all i in {1, …, N}, for all s in S i Constraints: Virtual queue: Q i (s) (t) u i (α(t), ω(t))u i (s) (α(t), ω(t)) Formally: u i (s) (α(t), ω(t)) = u i ( (b i (s) (ω i (t)), α{-i}(t)), ω(t) )
Online algorithm (main part): Every slot t: Game manager observes queues and ω(t). Chooses α(t) in A 1 x A 2 x … x A N to minimize: Do an auxiliary variable selection (omitted here). Update virtual queues. Knowledge of Pr[ω(t) = (ω 0, ω 1, …., ω N )] not required!
Conclusions: CCE constraints are simpler and lead to improved utilities. Online algorithm for the stochastic game. No knowledge of Pr[ω(t) = (ω 0, ω 1, …., ω N )] required! Complexity and convergence time is independent of size of Ω 0. Scales gracefully with large N.
Aux variable update: Choose x i (t) in [0, 1] to maximize: Vφ(x 1 (t), …, x N (t)) – ∑ Z i (t)x i (t) Where Z i (t) is another virtual queue, one for each player i in {1, …, N}. See paper for details: