Stochastic -Regular Games Krishnendu Chatterjee UC Berkeley Dissertation Talk 11/6/2018
Games on Graphs Games on graphs Model interaction between components. Control of finite state systems Solving two player games. 11/6/2018
Graph Models of Processes vertices = states edges = transitions paths = behaviors 11/6/2018
Game Models of Processes vertices = states edges = transitions paths = behaviors Different processes or processes in open environment. 11/6/2018
Graphs vs. Games a a a b a b 11/6/2018
Game Models Enable -synthesis [Church, Rabin, Ramadge/Wonham, Pnueli/Rosner] -receptiveness [Dill, Abadi/Lamport] -semantics of interaction [Abramsky] -reasoning about adversarial behavior -non-emptiness of non-deterministic automata -behavioral type systems and interface automata [deAlfaro/ Henzinger] -modular reasoning [Kupferman/Vardi, Alur/deAlfaro/Henzinger /Mang] -early error detection [deAlfaro/Henzinger/Mang] -model-based testing [Gurevich/Veanes et al.] -etc. 11/6/2018
Game Models Always about open systems: -players = processes / components / agents -demonic or adversarial non-determinism 11/6/2018
Stochastic Games 11/6/2018
Stochastic Games Where: What for: How: Arena: Game graphs with probabilistic transitions. What for: Objectives: -regular. How: Strategies. 11/6/2018
Game Graphs Two broad class Turn-based games Concurrent games Players make moves in turns. Example: tic-tac-toe, chess. Concurrent games Players make moves simultaneously and independently. Example: matrix games. 11/6/2018
Goals Computational issues. Strategy classification Characterize simplest class of strategies that achieve the optimal payoff. 11/6/2018
Three key components Classes of game graphs. Objectives. Strategies and notion of values. 11/6/2018
Markov Decision Process A Markov Decision Process (MDP) is defined as G=((V,E),(V1,V0)), where (V,E) is a finite graph. (V1,V0) is a partition of V. V0 are random nodes chooses between successors uniformly at random. 11/6/2018
A Markov Decision Process 1 1 1 1 1 11/6/2018
Markov Decision Process Games against nature or randomness. Control of systems in presence of uncertainty. We call them 1 ½ -player games. 11/6/2018
Two-player Games A two-player game is defined as G=((V,E),(V1,V2)), where (V,E) is a finite graph. (V1,V2) is a partition of V. V1 player 1 makes moves. V2 player 2 makes moves. 11/6/2018
Two-player Games Games against adversary. Control in open environment controller synthesis. We call them 2-player games. 11/6/2018
Turn-based Probabilistic Games A turn-based probabilistic game is defined as G=((V,E),(V1,V2,V0)), where (V,E) is a finite graph. (V1,V2,V0) is a partition of V. V1 player 1 makes moves. V2 player 2 makes moves. V0 randomly chooses successors. 11/6/2018
Turn-based Probabilistic Games Games against adversary and nature. Controller synthesis in presence of uncertainty controller synthesis of stochastic reactive systems. We call them 2 ½ -player games. Also known as perfect-information stochastic games. 11/6/2018
Game Played Token placed on an initial vertex. If current vertex is Player 1 vertex then player 1 chooses successor. Player 2 vertex then player 2 chooses successor. Player random vertex proceed to successors uniformly at random. Generates infinite sequence of vertices. 11/6/2018
Concurrent Games Players make move simultaneously. Finite set of vertices V. Finite set of actions A. Action assignments 1,2:V ! 2A n ;. Probabilistic transition function (s, a1, a2)(t) = Pr [ t | s, a1, a2] 11/6/2018
A Concurrent Game Actions at s0: a, b for player 1, c, d for player 2. ad Actions at s0: a, b for player 1, c, d for player 2. s0 ac,bd bc s2 s1 11/6/2018
Concurrent Games Games with simultaneous interaction. Model synchronous interaction. 11/6/2018
Stochastic Games 1 ½ pl. 2 pl. 2 ½ pl. Conc. games 11/6/2018
Three key components Classes of game graphs. Objectives. Strategies and notion of values. 11/6/2018
Plays Plays: infinite sequence of vertices or infinite trajectories. V: set of all infinite plays or infinite trajectories. 11/6/2018
Objectives Plays: infinite sequence of vertices. Objectives: subset of plays, 1 µ V. Play is winning for player 1 if it is in 1. Zero-sum game: 2 = V n1. 11/6/2018
Reachability and Safety Let R µ V set of target vertices. Reachability objective requires to visit the set R of vertices. Let F µ V set of safe vertices. Safety objective requires never to visit any vertex outside F. 11/6/2018
Reachability and Safety 11/6/2018
Buechi and coBuechi Let B µ V a set of Buechi vertices. Buechi objective requires that the set B is visited infinitely often. Let C µ V a set of coBuechi vertices. coBuechi objective requires that vertices outside C visited finitely often. 11/6/2018
Buechi and coBuechi C B coBuchi Buchi 11/6/2018
Rabin and Streett Let {(G1,R1), (G2,R2),…, (Gd,Rd)} set of vertex set pairs. Rabin: requires there is a pair (Gj,Rj) such that Gj finitely often and Rj infinitely often. Streett: requires for every pair (Gj,Rj) if Rj infinitely often then Gj infinitely often. Rabin-chain: both a Rabin-Streett, also known as parity objectives; it is a complementation closed subset of Rabin. 11/6/2018
Mueller Objective Mueller objective Fµ 2V is specified as a set of subset of vertices and requires that the set of states that are visited infinitely often is in F. Rabin, Streett, Rabin-chain objectives are all Mueller objectives. 11/6/2018
Objectives -regular: [,¢ , *,. Safety, Reachability, Liveness, etc. Rabin and Streett canonical ways to express. Borel -regular 11/6/2018
Three key components Classes of game graphs. Objectives. Strategies and notion of values. 11/6/2018
Strategies Given a finite sequence of vertices, (that represents the history of play) a strategy for player 1 is a probability distribution over the set of successors. : V* ¢ V1 ! D(V) 11/6/2018
Strategies 1/2 1 1 1/2 1 1 1 11/6/2018
Strategies 3/4 1 1 1/4 1 1 1 11/6/2018
Strategies with Memory It has memory to remember the history. Let M represent a set called memory. Strategy with memory represented as two functions: m : V £ M ! M s : V1 £ M ! D(V) A finite memory strategy is one with finite memory size M. 11/6/2018
Subclass of Strategies Memoryless (stationary) strategies: Strategy is independent of the history of the play and depends on the current vertex, i.e., |M|=1. : V1 ! D(V) Pure strategies: chooses a successor rather than a probability distribution. Pure-memoryless: both pure and memoryless (simplest class). 11/6/2018
Strategies The set of strategies Set of strategy for player 1; strategies . Set of strategy for player 2; strategies . 11/6/2018
Values Given objectives 1 and 2 = V n 1 the value for the players are val1(1)(v) = sup 2 inf 2 Prv,(1). val2(2)(v) = sup 2 inf 2 Prv,(2). 11/6/2018
Determinacy Determinacy Determinacy means val1(1)(v) + val2(2)(v) =1. Determinacy means sup inf = inf sup. von Neumann’s minmax theorem in matrix games. 11/6/2018
Optimal Strategies A strategy is optimal for objective 1 if val1(1)(v) = inf Prv, (1). Analogous definition for player 2. 11/6/2018
Computational Issues Algorithms to compute values in games. Identify the simplest class of strategies that suffices for optimality. 11/6/2018
Outline of Results: Turn-based Games 11/6/2018
History and Results Two-player games. Determinacy (sup inf = inf sup) theorem for Borel objectives. [Mar75] Strong determinacy: values only 0 and 1. Finite memory determinacy (i.e., finite memory optimal strategy exists) for -regular objective. [GurHar82] Pure memoryless optimal strategy exists for Rabin objectives. [EmeJut88] NP-complete. coNP-complete for Streett objectives. Mueller objectives Precisely characterize memory requirement by a number mF [DzeJurWal97]. PSPACE-complete [HunDaw05]. 11/6/2018
History and Results 2 ½ -player games. Determinacy (sup inf = inf sup) theorem for Borel objectives. [Mar98] Strong determinacy (values only 0 and 1) does not hold. 11/6/2018
History and Results 2 ½ -player games Reachability objectives [Con92] Pure memoryless optimal strategy exists. Decided in NP Å coNP. Decision problem: given a 2 ½ -player game, an objective 1, a vertex v, and a rational r, whether val1(1)(v) ¸ r. 11/6/2018
History and Results 2 ½ -player games -regular objectives [deAl Maj01] Parity games: 2EXPTIME. Rabin, Streett and Mueller: 3EXPTIME. Algorithms obtained as a special case of an algorithm for the general case of concurrent games. 11/6/2018
Overview of Results Borel Mar75 1 ½ pl. GH82 2 pl. -regular EJ88 Mar98, dAM01 Conc. games 11/6/2018
Our Results: Turn-based Games 11/6/2018
Results on 2 ½ player Games The complexity of Rabin, Streett and parity objectives [C/deAl Hen 05] Pure memoryless optimal strategies exist for Rabin objectives. NP-complete for Rabin (previously was 3EXPTIME). coNP-complete for Streett (previously was 3EXPTIME). NP Å coNP for parity (previously was 2EXPTIME). 11/6/2018
Rabin and Strett Objectives 2 ½ pl. EJ88 :PM NP comp. 2 pl. PM, NP comp. Reach Con 92: PM 11/6/2018
Results on 2 ½ player Games The complexity of Mueller objectives [C 07] Winning strategies with memory mF suffices (the same memory upper bound as 2 player games). There is a matching lower bound also. PSPACE-complete (previously known 3EXPTIME). 11/6/2018
Results on 2 ½ player Games Practical algorithms Strategy improvement algorithms: iterate over local optimizations of strategies, that must converge to global optimal. Converges fast in practice though worst case bound in exponential. Strategy improvement for parity objectives [C/Hen 06a]. Strategy improvement for Rabin and Streett objectives [C/Hen 06b]. 11/6/2018
Outline of Results: Concurrent Games 11/6/2018
Concurrent Games: Example ad Actions at s0: a, b for player 1, c, d for player 2. Objective for player 1: to reach s1 Pure strategy not enough: a -> d b -> c s0 ac,bd bc s1 s2 11/6/2018
Concurrent Games: Example ad Fact: For any strategy for player 1 she cannot win with prob. 1. As long it plays a deterministically play d, else play c with positive probability. s0 ac,bd bc s1 s2 11/6/2018
Concurrent Games: Example 1- ad Fix strategy for player 1 a !1- b ! c s0 d ac,bd bc 1- s1 s2 For every positive player 1 can win with probability 1 -. 11/6/2018
Concurrent Games: Example ad Actions at s0: a, b for player 1, c, d for player 2. Objective for player 1: to reach s1 infinitely often. Infinite memory is required. s0 ac,bd bc s2 s1 11/6/2018
Concurrent Games Optimal strategies need not exist: -optimal strategies for all >0. Require randomization: pure strategies not enough. For -regular objectives -optimal strategies require infinite memory. 11/6/2018
History and Results Detailed analysis of concurrent games [Fil Vri 97]. Determinacy theorem for all Borel objectives [Mar 98]. Concurrent -regular games: Reachability objectives [deAl Hen Kup98]. Rabin-chain objectives [deAl Hen00]. Rabin-chain objectives [deAl Maj01]. 11/6/2018
History and Results Qualitative analysis Quantitative analysis Reachability objectives [deAl Hen Kup98] Quadratic time. Parity objectives [deAl Hen 00] NP and coNP. Quantitative analysis Reachability objectives [Ete Yan 06] PSPACE. 11/6/2018
History and Results Concurrent games with parity objectives 3EXPTIME [deAl Maj 01] Quantitative -calculus formula. Formula in the theory of reals with length exponential in the size of the game and exponentially many quantifier alternations in the number of parities. Rabin, Streett and Mueller: 4EXPTIME. 11/6/2018
Overview of Results Borel 1 ½ pl. 2 pl. -regular 2 ½ pl. dAH00,dAM01 Reach dAH00,dAM01 Conc. games EY06 Mar98 11/6/2018
Our Results: Concurrent Games 11/6/2018
Results on Concurrent Games The complexity of concurrent parity games [C/deAl Hen06]. PSPACE A polynomial guess followed by a formula in existential theory of reals. Previous formula in the theory of reals required exponential quantifier alternations and was 3EXPTIME. The complexity of concurrent Rabin, Streett and Mueller objectives EXPSPACE – reduction to parity (previous known was 4EXPTIME). 11/6/2018
Results on Concurrent Games Practical algorithms Strategy improvement algorithm for reachability objectives [C/deAl Hen 06b]. Some improved analysis of strategies for concurrent reachability games. A new combinatorial proof of existence of memoryless optimal strategies, for all >0. 11/6/2018
Outline of Results: Non-zero-sum Games 11/6/2018
Non-zero-sum Games So far we discussed strictly competitive games where the objectives were complementary. Non-zero-sum games Each player has own objective. Concept of rationality: Nash equilibrium, i.e., a strategy profile of all the players such that no player has incentive to deviate if the other players keep playing the strategies in the profile. -Nash equilibrium: can only gain at most by deviation. 11/6/2018
Closed sets (Safety). [Sec Sud 02] History and Results Two-player nonzero-sum stochastic games with limit-average payoff. [Vie 00a, Vie 00b] Closed sets (Safety). [Sec Sud 02] Discounted/ Halting Games. [Fink 64] 11/6/2018
Overview of Results Borel -reg Nash:SecSud02 S n pl. conc. R n pl. turn-based 2 pl. conc. Nash:Vie00 Lim. avg 11/6/2018
Our Results: Non-zero-sum Games 11/6/2018
Our Results: Non-zero-sum Concurrent Games An n-player non-zero-sum reachability game has an Nash equilibrium in memoryless strategies for all >0 .[C/ Maj Jur 04] A two-player non-zero-sum parity game has an Nash equilibrium for all >0. [C 05] 11/6/2018
Our Results: Non-zero-sum Turn Based Games Results from [C/ Maj Jur 04] n-player turn based probabilistic games with Borel payoffs have -Nash equilibria in deterministic (pure) strategies. n-player turn based deterministic games with Borel payoffs have Nash equilibria in deterministic (pure) strategies. 11/6/2018
Can it be strengthened? Can -Nash equilibrium be strengthened to Nash equilibrium? 11/6/2018
Can it be strengthened? Can -Nash equilibrium be strengthened to Nash equilibrium? No. Counter-example. 11/6/2018
Las Vegas Game Work 1/2 Jackpot Sorry you lose Go to Vegas Play again 11/6/2018
For every >0, Las Vegas game has a (1-)-optimal winning strategy For < 1/2n, work for n days before heading to Vegas. But no optimal winning strategy. 11/6/2018
-Regular? The Las Vegas game is not -regular Turn based probabilistic non-zero-sum games with -regular objectives have pure Nash equilibria. 11/6/2018
Results in Non-zero-sum Games Borel -reg Nash:SecSud02 S n pl. conc. Nash Nash R n pl. turn-based Nash Nash 2 pl. conc. Lim. avg Nash:Vil00 11/6/2018
Conclusion and Open Problems 11/6/2018
Conclusion Stochastic Games Model for controller synthesis in the presence of uncertainty. Improved strategy complexity, computational complexity, and practical algorithms for various classes of games with different classes of objectives. Existence of Nash and Nash equilibrium in various classes of games. 11/6/2018
Open Problems Complexity of parity games NP Å coNP for 2-player and 2 ½ -player games. Polynomial time algorithm is a major open question. Complexity of 2 ½ player reachability games NP Å coNP. Polynomial time algorithm is a major open problem. Other open problems Algorithms for concurrent parity games. Better algorithmic analysis of 2-player, 2 ½ -player and concurrent games with sub-classes of parity objectives. 11/6/2018
Open Problems Several open problems in non-zero-sum games Related to existence of equilibria and also the related complexity problems. 11/6/2018
Results in Non-zero-sum Games Borel -reg Nash:SecSud02 S n pl. conc. Nash Nash R n pl. turn-based Nash Nash 2 pl. conc. Lim. avg Nash:Vil00 11/6/2018
Open Problems Borel -reg S n pl. conc. R n pl. turn-based 2 pl. conc. Lim. avg 11/6/2018
Acknowledgments Joint work with Thanks to you all !!! Luca de Alfaro Thomas A. Henzinger Marcin Jurdzinski Rupak Majumdar Thanks to you all !!! 11/6/2018
Questions ??? http://www.eecs.berkeley.edu/~c_krish 11/6/2018