Games with Secure Equilibria Krishnendu Chatterjee (Berkeley) Thomas A. Henzinger (EPFL) Marcin Jurdzinski (Warwick)
Classification of 2-Player Games Zero-sum games: complementary payoffs. Non-zero-sum games: arbitrary payoffs. 1,-10,0 2,-2-1,1 3,11,0 4,23,2
Classical Notion of Rationality Nash equilibrium: none of the players gains by deviation. 3,11,0 4,23,2 (row, column)
Classical Notion of Rationality Nash equilibrium: none of the players gains by deviation. 3,11,0 4,23,2 (row, column)
New Notion of Rationality Nash equilibrium: none of the players gains by deviation. Secure equilibrium: none hurts the opponent by deviation. 3,11,0 4,23,2 (row, column)
Secure Equilibria Natural notion of rationality for component systems: –First, a component tries to meet its spec. –Second, a component may obstruct the other components. For -regular specs, there is always unique maximal secure equilibrium.
Games on Graphs Modeling component interaction: –Vertices = states –Players = components –Moves = transitions Applications: –Synthesis (control) of sequential systems –Verification of adversarial specs –Receptiveness –Compatibility –Early error detection –(Bi)simulation checking –Model checking –Game semantics –etc.
Starvation Freedom (mutual exclusion protocols, cache coherence protocols) In a multi-process system, can a process that wishes P to proceed always eventually proceed no matter what the other processes do? Example: Verification 8 (a ) hhPii } b) 8 (a ) 8} b) X 8 (a ! 9} b) X
Starvation Freedom (mutual exclusion protocols, cache coherence protocols) In a multi-process system, can a process P that wishes to proceed always eventually proceed no matter what the other processes do provided they meet their specs ? Example: Verification 8 (a ) hhPii } b) X 8 (a ) 8} b) X 8 (a ! 9} b) X
Games on Graphs Turn-based (perfect-information) games: –Game graph G=((V,E), (V 1,V 2 )). –E µ V £ V: serial edge relation. –(V 1,V 2 ): partition of the vertex set V. The game is played by moving token along edges of the graph: –V 1 : player-1 moves the token. –V 2 : player-2 moves the token.
Example: A Game Graph s0s0 s1s1 s2s2 s3s3
Plays and Strategies Play (outcome) of a game: –Infinite path (s 0,s 1,…) of states s i 2 V such that (s i,s i+1 ) 2 E for all i ¸ 0. – : set of all plays. Player-1 strategy: – Given prefix of play ending in V 1, specifies how to extend the play. – : V * ¢ V 1 ! V such that (s, (x¢s)) 2 E for all x 2 V * and s 2 V 1. – Symmetric definition for player-2 strategies . Given two strategies , and a start state s, there is a unique play , (s).
Example s0s0 s1s1 s2s2 s3s3 Example of a play: s 0 s 1 s 2 Strategies that yield this play: - Player-1: s 0 ! s 1 - Player-2: s 1 ! s 2
Memoryless Strategies Independent of the history of the play: : V 1 ! V : V 2 ! V Yield simple controllers. Existence puts games into NP.
Objectives and Payoffs What the players are playing for: –Player-1 objective: play in set 1 µ V . –Player-2 objective: play in set 2 µ V . –General objectives: Borel sets in the Cantor topology. –Finite-state objectives: -regular sets (level 2.5 Borel sets). From objectives to payoffs: –If , (s) 2 i, then player i gets payoff 1 else payoff 0. –The payoff profile for a strategy profile ( , ) at a state s is (v 1 , (s), v 2 , (s)).
Classification of Games Zero-sum games: –Complementary objectives: 2 = : 1. –Possible payoff profiles (1,0) and (0,1). Non-zero-sum games: –Arbitrary objectives 1, 2. –Possible payoff profiles (1,1), (1,0), (0,1), and (0,0).
Zero-Sum Games on Graphs Winning: - Winning-1 states s: (9 ) (8 ) , (s) 2 1. - Winning-2 states s: (9 ) (8 ) , (s) 2 2. Determinacy: –Every state is winning-1 or winning-2. –Borel determinacy [Martin 75]. –Memoryless determinacy for parity games [Emerson/Jutla 91]. (1,0)(0,1)
Non-Zero-Sum Games on Graphs Nash equilibrium ( , ) at state s: (8 ’) v 1 , (s) ¸ v 1 ’, (s) (8 ’) v 2 , (s) ¸ v 2 , ’ (s)
Example: Reachability Game s0s0 s1s1 s2s2 s3s3 R1R1 R2R2 Objective for player i is to visit R i.
Example s0s0 s1s1 s2s2 s3s3 Nash equilibria: (s 0 ! s 1, s 1 ! s 2 ): (1,0) R1R1 R2R2
Example s0s0 s1s1 s2s2 s3s3 Nash equilibria: (s 0 ! s 1, s 1 ! s 2 ): (1,0) (s 0 ! s 3, s 1 ! s 0 ): (0,1) (s 0 ! s 1, s 1 ! s 0 ): (0,0) R1R1 R2R2
Example s0s0 s1s1 s2s2 s3s3 Nash equilibria: (s 0 ! s 1, s 1 ! s 2 ): (1,0) (s 0 ! s 3, s 1 ! s 0 ): (0,1) R1R1 R2R2
Synthesis: - Zero-sum game controller versus plant. - Control against all plant behaviors. Verification: - Non-zero-sum specs for components. - Components may behave adversarially, but without threatening their own specs. Regular Objectives
Drawbacks of Nash equilibrium: - Does not capture adversarial behavior. - Not unique. A new notion of equilibrium: - Takes into account both non-zero-sum payoffs and adversarial behavior. - Captures the essence of component-based systems. - Unique for Borel objectives. - Computable for -regular objectives. Non-Zero-Sum Verification Games
Secure Equilibria Secure strategy profile ( , ) at state s: (8 ’) ( v 1 , ’ (s) < v 1 , (s) ) v 2 , ’ (s) < v 2 , (s) ) (8 ’) ( v 2 ’, (s) < v 2 , (s) ) v 1 ’, (s) < v 1 , (s) ) A secure profile ( , ) is a contract: if the player-1 deviates to lower player-2’s payoff, her own payoff decreases as well, and vice versa. Secure equilibrium: secure strategy profile that is also a Nash equilibrium.
Secure Equilibria 3,31,3 2,23,1 2,10,0 1,20,0 (row, column)
Example s0s0 s1s1 s2s2 s3s3 R1R1 R2R2 Nash equilibria: (s 0 ! s 1, s 1 ! s 2 ): (1,0) (s 0 ! s 3, s 1 ! s 0 ): (0,1) (s 0 ! s 1, s 1 ! s 0 ): (0,0) not secure
Example s0s0 s1s1 s2s2 s3s3 R1R1 R2R2 Nash equilibria: (s 0 ! s 1, s 1 ! s 2 ): (1,0) (s 0 ! s 3, s 1 ! s 0 ): (0,1) (s 0 ! s 1, s 1 ! s 0 ): (0,0) not secure
Example s0s0 s1s1 s2s2 s3s3 Nash equilibria: (s 0 ! s 1, s 1 ! s 2 ): (1,0) (s 0 ! s 3, s 1 ! s 0 ): (0,1) (s 0 ! s 1, s 1 ! s 0 ): (0,0) R1R1 R2R2 not secure secure
Lexicographic Payoff Profile Ordering Player-1 preference º 1 : –(v 1,v 2 ) Â 1 (v’ 1,v’ 2 ) iff v 1 > v’ 1 or ( v 1 = v’ 1 and v 2 < v’ 2 ). –(v 1,v 2 ) º 1 (v’ 1,v’ 2 ) iff (v 1,v 2 ) Â 1 (v’ 1,v’ 2 ) or (v 1,v 2 ) = (v’ 1,v’ 2 ). Player-2 preference º 2 symmetric. Captures payoff maximization with external adversarial choice. Provides notion of maximality: –Player-1: (1,0) º 1 (1,1) º 1 (0,0) º 1 (0,1) –Player-2: (0,1) º 2 (1,1) º 2 (0,0) º 2 (1,0)
Alternative Characterization A secure equilibrium is an equilibrium with respect to the º 1 and º 2 payoff profile orderings: A strategy profile ( , ) is a secure equilibrium at s iff ( 8 ’) (v 1 , (s), v 2 , (s)) º 1 (v 1 ’, (s), v 2 ’, (s)) (8 ’) (v 1 , (s), v 2 ,’ (s)) º 2 (v 1 , ’ (s), v 2 , ’ (s))
Example: Buechi Game s4s4 s0s0 s2s2 s1s1 s3s3 B1B1 B2B2 Objective for player i is to visit B i infinitely often.
Example s4s4 s0s0 s2s2 s1s1 s3s3 B1B1 B2B2 Nash equilibria: (s 0 ! s 4, s 1 ! s 4 ): (0,0) secure
Example s4s4 s0s0 s2s2 s1s1 s3s3 B1B1 B2B2 Nash equilibria: (s 0 ! s 4, s 1 ! s 4 ): (0,0) (s 0 ! s 1, s 1 ! s 0 ): (1,0) secure
Example s4s4 s0s0 s2s2 s1s1 s3s3 B1B1 B2B2 Nash equilibria: (s 0 ! s 4, s 1 ! s 4 ): (0,0) (s 0 ! s 1, s 1 ! s 0 ): (1,0) secure not secure
Example s4s4 s0s0 s2s2 s1s1 s3s3 B1B1 B2B2 Nash equilibria: (s 0 ! s 4, s 1 ! s 4 ): (0,0) (s 0 ! s 1, s 1 ! s 0 ): (1,0) (s 0 ! s 2, s 3 ! s 1 ): (1,1) secure not secure
Example s4s4 s0s0 s2s2 s1s1 s3s3 B1B1 B2B2 Nash equilibria: (s 0 ! s 4, s 1 ! s 4 ): (0,0) (s 0 ! s 1, s 1 ! s 0 ): (1,0) (s 0 ! s 2, s 3 ! s 1 ): (1,1) secure not secure
Example s4s4 s0s0 s2s2 s1s1 s3s3 B1B1 B2B2 Nash equilibria: (s 0 ! s 4, s 1 ! s 4 ): (0,0) (s 0 ! s 1, s 1 ! s 0 ): (1,0) (s 0 ! s 2, s 3 ! s 1 ): (1,1) secure not secure secure
Example s4s4 s0s0 s2s2 s1s1 s3s3 B1B1 B2B2 Secure equilibrium ( , ) with payoff (1,1) at s 0 : : if s 1 ! s 0, then s 2 else s 4. : if s 3 ! s 1, then s 0 else s 4. Pair of “retaliating” strategies. Require memory.
Theorem: At every state s of a graph game with Borel objectives, there is a unique secure equilibrium profile that is maximal with respect to both º 1 and º 2. Maximal Secure Equilibria This is the rational behavior of both players at s if they wish to 1. satisfy their own objectives and, then, 2. sabotage the opponent’s objective.
Strongly Winning and Retaliating Strategies Winning strategies: –Player-1 wins for the objective 1. –Player-2 wins for 2. Strongly winning strategies: –Player-1 wins for the objective 1 Æ : 2. –Player-2 wins for 2 Æ : 1. Retaliating strategies: –Player-1 wins for the objective 2 ) 1. –Player-2 wins for 1 ) 2.
Winning Sets W 1 : set of states s.t. player-1 has a winning strategy. W 2 : set of states s.t. player-2 has a winning strategy. W 10 : set of states s.t. player-1 has a strongly winning strategy. W 01 : set of states s.t. player-2 has a strongly winning strategy. W 11 : set of states s s.t. there is a pair ( , ) of retaliating strategies with , (s) ² 1 Æ 2. W 00 : set of states s s.t. each player has a retaliating strategy and for every pair ( , ) of retaliating strategies, , (s) ² : 1 Æ : 2.
State Space Partition
W 10 hh1ii ( 1 Æ : 2 ) hh2ii ( : 1 Ç 2 )
State Space Partition W 10 hh1ii ( 1 Æ : 2 ) hh2ii ( : 1 Ç 2 ) W 01 hh2ii ( 2 Æ : 1 ) hh1ii ( : 2 Ç 1 )
State Space Partition W 10 hh1ii ( 1 Æ : 2 ) hh2ii ( : 1 Ç 2 ) W 01 hh2ii ( 2 Æ : 1 ) hh1ii ( : 2 Ç 1 ) Retaliating strategies There is no player-2 retaliating strategy in W 10, and no player-1 retaliating strategy in W 01.
Retaliating Strategy Pairs Every player-1 retaliating strategy ensures for every player-2 strategy that 2 ) 1. Hence every pair ( , ) of retaliating strategies ensures (: 1 Ç : 2 ) ) (: 1 Æ : 2 ). W 11 and W 00 are the regions of the state space where both players have retaliating strategies: –W 11 contains the states where some pair ( , ) of retaliating strategies ensures 1 Æ 2. –W 00 contains the states where all pairs ( , ) of retaliating strategies lead to : 1 Æ : 2.
State Space Partition W 10 hh1ii ( 1 Æ : 2 ) W 01 hh2ii ( 2 Æ : 1 ) W 11 W 00
Uniqueness of Maximal Secure Equilibria W 11 : (1,1) is a secure equilibrium, and hence maximal W 10 : (1,0) is the only secure equilibrium W 01 : (0,1) is the only secure equilibrium W 00 : (0,0) is the only possible secure equilibrium W 11 is the set of states where both players can collaborate to win, yet each player keeps an “insurance policy.”
Generalization of Determinacy W 10 W 01 W 11 W 00 W1W1 W2W2 Zero-sum games: 2 = : 1 Non-zero-sum games: 1, 2
Computing the Partition For -regular 1 and 2. Computing –W 10 = hh1ii ( 1 Æ : 2 ) –W 01 = hh2ii ( 2 Æ : 1 ) involves solving games with conjunctive (Streett) objectives. These are generally not memoryless. We need to compute W 11 µ hh1,2ii ( 1 Æ 2 ) = 9 ( 1 Æ 2 ).
s0s0 s1s1 B1B1 B2B2 W 1,0 even though hh1,2ii ( 1 Æ 2 )
Computing the Partition W 10 hh1ii ( 1 Æ : 2 ) hh2ii ( 1 ) 2 ) W 01 hh2ii ( 2 Æ : 1 ) hh1ii ( 2 ) 1 )
Computing the Partition W 10 hh1ii ( 1 Æ : 2 ) hh2ii ( 1 ) 2 ) W 01 hh2ii ( 2 Æ : 1 ) hh1ii ( 2 ) 1 ) hh1ii 1 U1U1
Computing the Partition W 10 hh1ii ( 1 Æ : 2 ) hh2ii ( 1 ) 2 ) W 01 hh2ii ( 2 Æ : 1 ) hh1ii ( 2 ) 1 ) hh1ii 1 hh2ii 2 U1U1 U2U2
Computing the Partition W 10 hh1ii ( 1 Æ : 2 ) hh2ii ( 1 ) 2 ) W 01 hh2ii ( 2 Æ : 1 ) hh1ii ( 2 ) 1 ) hh1ii 1 hh2ii 2 Threat strategies T, T hh2ii : 1 hh1ii : 2 U1U1 U2U2
Computing the Partition W 10 hh1ii ( 1 Æ : 2 ) hh2ii ( 1 ) 2 ) W 01 hh2ii ( 2 Æ : 1 ) hh1ii ( 2 ) 1 ) hh1ii 1 hh2ii 2 Threat strategies T, T hh2ii : 1 hh1ii : 2 hh1,2ii ( 1 Æ 2 ) Cooperation strategies C, C U1U1 U2U2
Computing W 11 The pair ( C + T, C + T ) is a pair of winning retaliating strategies: –Player-1 follows C until reaching U 1 [ U 2 (then switch to known retaliating pair) or until player-2 deviates from C (then switch to T ). –Player-2 behaves symmetrically. From states s where there are no cooperative strategies, for all retaliation strategies ( , ) we have , (s) ² : 1 Æ : 2. Hence W 11 = hh1,2ii( 1 Æ 2 ) in the subgame without W 10 [ W 01.
Computing the Partition W 10 hh1ii ( 1 Æ : 2 ) W 01 hh2ii ( 2 Æ : 1 ) hh1ii 1 hh2ii 2 hh1,2ii ( 1 Æ 2 ) U1U1 U2U2 W00W00
1, 2 LTL: 2EXPTIME [Pnueli/Rosner]. 1, 2 parity: coNP [Emerson/Jutla]. Computing the Partition
P 1 2 W 1 ( 1 ) P 2 2 W 2 ( 2 ) 1 Æ 2 ) P 1 ||P 2 ² Application: Compositional Verification
P 1 2 W 1 ( 1 ) P 2 2 W 2 ( 2 ) 1 Æ 2 ) P 1 ||P 2 ² Application: Compositional Verification P 1 2 (W 10 [ W 11 ) ( 1 ) P 2 2 (W 01 [ W 11 ) ( 2 ) 1 Æ 2 ) P 1 ||P 2 ² W 1 ½ W 10 [ W 11 W 2 ½ W 01 [ W 11 An assume/guarantee rule.
Summary Rational behavior in non-zero-sum graph games: –not as restrictive as zero-sum games. –capture the essence of multi-component systems. –unique equilibria. –computable for -regular objectives. Reference: LICS 2004.