Stochastic -Regular Games

Stochastic -Regular Games
Krishnendu Chatterjee UC Berkeley Dissertation Talk 11/6/2018

Games on Graphs Games on graphs Model interaction between components.
Control of finite state systems Solving two player games. 11/6/2018

Graph Models of Processes
vertices = states edges = transitions paths = behaviors 11/6/2018

Game Models of Processes
vertices = states edges = transitions paths = behaviors Different processes or processes in open environment. 11/6/2018

Graphs vs. Games a a a b a b 11/6/2018

Game Models Enable -synthesis [Church, Rabin, Ramadge/Wonham, Pnueli/Rosner] -receptiveness [Dill, Abadi/Lamport] -semantics of interaction [Abramsky] -reasoning about adversarial behavior -non-emptiness of non-deterministic automata -behavioral type systems and interface automata [deAlfaro/ Henzinger] -modular reasoning [Kupferman/Vardi, Alur/deAlfaro/Henzinger /Mang] -early error detection [deAlfaro/Henzinger/Mang] -model-based testing [Gurevich/Veanes et al.] -etc. 11/6/2018

Game Models Always about open systems:
-players = processes / components / agents -demonic or adversarial non-determinism 11/6/2018

Stochastic Games 11/6/2018

Stochastic Games Where: What for: How:
Arena: Game graphs with probabilistic transitions. What for: Objectives: -regular. How: Strategies. 11/6/2018

Game Graphs Two broad class Turn-based games Concurrent games
Players make moves in turns. Example: tic-tac-toe, chess. Concurrent games Players make moves simultaneously and independently. Example: matrix games. 11/6/2018

Goals Computational issues. Strategy classification
Characterize simplest class of strategies that achieve the optimal payoff. 11/6/2018

Three key components Classes of game graphs. Objectives.
Strategies and notion of values. 11/6/2018

Markov Decision Process
A Markov Decision Process (MDP) is defined as G=((V,E),(V1,V0)), where (V,E) is a finite graph. (V1,V0) is a partition of V. V0 are random nodes chooses between successors uniformly at random. 11/6/2018

A Markov Decision Process
1 1 1 1 1 11/6/2018

Markov Decision Process
Games against nature or randomness. Control of systems in presence of uncertainty. We call them 1 ½ -player games. 11/6/2018

Two-player Games A two-player game is defined as
G=((V,E),(V1,V2)), where (V,E) is a finite graph. (V1,V2) is a partition of V. V1 player 1 makes moves. V2 player 2 makes moves. 11/6/2018

Two-player Games Games against adversary. Control in open environment
controller synthesis. We call them 2-player games. 11/6/2018

Turn-based Probabilistic Games
A turn-based probabilistic game is defined as G=((V,E),(V1,V2,V0)), where (V,E) is a finite graph. (V1,V2,V0) is a partition of V. V1 player 1 makes moves. V2 player 2 makes moves. V0 randomly chooses successors. 11/6/2018

Turn-based Probabilistic Games
Games against adversary and nature. Controller synthesis in presence of uncertainty controller synthesis of stochastic reactive systems. We call them 2 ½ -player games. Also known as perfect-information stochastic games. 11/6/2018

Game Played Token placed on an initial vertex. If current vertex is
Player 1 vertex then player 1 chooses successor. Player 2 vertex then player 2 chooses successor. Player random vertex proceed to successors uniformly at random. Generates infinite sequence of vertices. 11/6/2018

Concurrent Games Players make move simultaneously.
Finite set of vertices V. Finite set of actions A. Action assignments 1,2:V ! 2A n ;. Probabilistic transition function (s, a1, a2)(t) = Pr [ t | s, a1, a2] 11/6/2018

A Concurrent Game Actions at s0: a, b for player 1, c, d for player 2.
ad Actions at s0: a, b for player 1, c, d for player 2. s0 ac,bd bc s2 s1 11/6/2018

Concurrent Games Games with simultaneous interaction.
Model synchronous interaction. 11/6/2018

Stochastic Games 1 ½ pl. 2 pl. 2 ½ pl. Conc. games 11/6/2018

Plays Plays: infinite sequence of vertices or infinite trajectories.
V: set of all infinite plays or infinite trajectories. 11/6/2018

Objectives Plays: infinite sequence of vertices.
Objectives: subset of plays, 1 µ V. Play is winning for player 1 if it is in 1. Zero-sum game: 2 = V n1. 11/6/2018

Reachability and Safety
Let R µ V set of target vertices. Reachability objective requires to visit the set R of vertices. Let F µ V set of safe vertices. Safety objective requires never to visit any vertex outside F. 11/6/2018

Reachability and Safety
11/6/2018

Buechi and coBuechi Let B µ V a set of Buechi vertices.
Buechi objective requires that the set B is visited infinitely often. Let C µ V a set of coBuechi vertices. coBuechi objective requires that vertices outside C visited finitely often. 11/6/2018

Buechi and coBuechi C B coBuchi Buchi 11/6/2018

Rabin and Streett Let {(G1,R1), (G2,R2),…, (Gd,Rd)} set of vertex set pairs. Rabin: requires there is a pair (Gj,Rj) such that Gj finitely often and Rj infinitely often. Streett: requires for every pair (Gj,Rj) if Rj infinitely often then Gj infinitely often. Rabin-chain: both a Rabin-Streett, also known as parity objectives; it is a complementation closed subset of Rabin. 11/6/2018

Mueller Objective Mueller objective Fµ 2V is specified as a set of subset of vertices and requires that the set of states that are visited infinitely often is in F. Rabin, Streett, Rabin-chain objectives are all Mueller objectives. 11/6/2018

Objectives -regular: [,¢ , *,. Safety, Reachability, Liveness, etc.
Rabin and Streett canonical ways to express. Borel -regular 11/6/2018

Strategies Given a finite sequence of vertices, (that represents the history of play) a strategy  for player 1 is a probability distribution over the set of successors.  : V* ¢ V1 ! D(V) 11/6/2018

Strategies 1/2 1 1 1/2 1 1 1 11/6/2018

Strategies 3/4 1 1 1/4 1 1 1 11/6/2018

Strategies with Memory
It has memory to remember the history. Let M represent a set called memory. Strategy with memory represented as two functions: m : V £ M ! M s : V1 £ M ! D(V) A finite memory strategy is one with finite memory size M. 11/6/2018

Subclass of Strategies
Memoryless (stationary) strategies: Strategy is independent of the history of the play and depends on the current vertex, i.e., |M|=1. : V1 ! D(V) Pure strategies: chooses a successor rather than a probability distribution. Pure-memoryless: both pure and memoryless (simplest class). 11/6/2018

Strategies The set of strategies
Set of strategy  for player 1; strategies . Set of strategy  for player 2; strategies . 11/6/2018

Values Given objectives 1 and 2 = V n 1 the value for the players are val1(1)(v) = sup 2  inf 2  Prv,(1). val2(2)(v) = sup 2  inf 2  Prv,(2). 11/6/2018

Determinacy Determinacy Determinacy means
val1(1)(v) + val2(2)(v) =1. Determinacy means sup inf = inf sup. von Neumann’s minmax theorem in matrix games. 11/6/2018

Optimal Strategies A strategy  is optimal for objective 1 if
val1(1)(v) = inf Prv, (1). Analogous definition for player 2. 11/6/2018

Computational Issues Algorithms to compute values in games.
Identify the simplest class of strategies that suffices for optimality. 11/6/2018

Outline of Results: Turn-based Games
11/6/2018

History and Results Two-player games.
Determinacy (sup inf = inf sup) theorem for Borel objectives. [Mar75] Strong determinacy: values only 0 and 1. Finite memory determinacy (i.e., finite memory optimal strategy exists) for -regular objective. [GurHar82] Pure memoryless optimal strategy exists for Rabin objectives. [EmeJut88] NP-complete. coNP-complete for Streett objectives. Mueller objectives Precisely characterize memory requirement by a number mF [DzeJurWal97]. PSPACE-complete [HunDaw05]. 11/6/2018

History and Results 2 ½ -player games.
Determinacy (sup inf = inf sup) theorem for Borel objectives. [Mar98] Strong determinacy (values only 0 and 1) does not hold. 11/6/2018

History and Results 2 ½ -player games Reachability objectives [Con92]
Pure memoryless optimal strategy exists. Decided in NP Å coNP. Decision problem: given a 2 ½ -player game, an objective 1, a vertex v, and a rational r, whether val1(1)(v) ¸ r. 11/6/2018

History and Results 2 ½ -player games
-regular objectives [deAl Maj01] Parity games: 2EXPTIME. Rabin, Streett and Mueller: 3EXPTIME. Algorithms obtained as a special case of an algorithm for the general case of concurrent games. 11/6/2018

Overview of Results Borel Mar75 1 ½ pl. GH82 2 pl. -regular EJ88
Mar98, dAM01 Conc. games 11/6/2018

Our Results: Turn-based Games
11/6/2018

Results on 2 ½ player Games
The complexity of Rabin, Streett and parity objectives [C/deAl Hen 05] Pure memoryless optimal strategies exist for Rabin objectives. NP-complete for Rabin (previously was 3EXPTIME). coNP-complete for Streett (previously was 3EXPTIME). NP Å coNP for parity (previously was 2EXPTIME). 11/6/2018

Rabin and Strett Objectives
2 ½ pl. EJ88 :PM NP comp. 2 pl. PM, NP comp. Reach Con 92: PM 11/6/2018

The complexity of Mueller objectives [C 07] Winning strategies with memory mF suffices (the same memory upper bound as 2 player games). There is a matching lower bound also. PSPACE-complete (previously known 3EXPTIME). 11/6/2018

Practical algorithms Strategy improvement algorithms: iterate over local optimizations of strategies, that must converge to global optimal. Converges fast in practice though worst case bound in exponential. Strategy improvement for parity objectives [C/Hen 06a]. Strategy improvement for Rabin and Streett objectives [C/Hen 06b]. 11/6/2018

Outline of Results: Concurrent Games
11/6/2018

Concurrent Games: Example
ad Actions at s0: a, b for player 1, c, d for player 2. Objective for player 1: to reach s1 Pure strategy not enough: a -> d b -> c s0 ac,bd bc s1 s2 11/6/2018

ad Fact: For any strategy for player 1 she cannot win with prob. 1. As long it plays a deterministically play d, else play c with positive probability. s0 ac,bd bc s1 s2 11/6/2018

1- ad Fix strategy for player 1 a !1- b !  c s0 d ac,bd bc   1- s1 s2 For every positive  player 1 can win with probability 1 -. 11/6/2018

ad Actions at s0: a, b for player 1, c, d for player 2. Objective for player 1: to reach s1 infinitely often. Infinite memory is required. s0 ac,bd bc s2 s1 11/6/2018

Concurrent Games Optimal strategies need not exist:
-optimal strategies for all >0. Require randomization: pure strategies not enough. For -regular objectives -optimal strategies require infinite memory. 11/6/2018

History and Results Detailed analysis of concurrent games [Fil Vri 97]. Determinacy theorem for all Borel objectives [Mar 98]. Concurrent -regular games: Reachability objectives [deAl Hen Kup98]. Rabin-chain objectives [deAl Hen00]. Rabin-chain objectives [deAl Maj01]. 11/6/2018

History and Results Qualitative analysis Quantitative analysis
Reachability objectives [deAl Hen Kup98] Quadratic time. Parity objectives [deAl Hen 00] NP and coNP. Quantitative analysis Reachability objectives [Ete Yan 06] PSPACE. 11/6/2018

History and Results Concurrent games with parity objectives
3EXPTIME [deAl Maj 01] Quantitative -calculus formula. Formula in the theory of reals with length exponential in the size of the game and exponentially many quantifier alternations in the number of parities. Rabin, Streett and Mueller: 4EXPTIME. 11/6/2018

Overview of Results Borel 1 ½ pl. 2 pl. -regular 2 ½ pl. dAH00,dAM01
Reach dAH00,dAM01 Conc. games EY06 Mar98 11/6/2018

Our Results: Concurrent Games
11/6/2018

Results on Concurrent Games
The complexity of concurrent parity games [C/deAl Hen06]. PSPACE A polynomial guess followed by a formula in existential theory of reals. Previous formula in the theory of reals required exponential quantifier alternations and was 3EXPTIME. The complexity of concurrent Rabin, Streett and Mueller objectives EXPSPACE – reduction to parity (previous known was 4EXPTIME). 11/6/2018

Results on Concurrent Games
Practical algorithms Strategy improvement algorithm for reachability objectives [C/deAl Hen 06b]. Some improved analysis of strategies for concurrent reachability games. A new combinatorial proof of existence of memoryless  optimal strategies, for all >0. 11/6/2018

Outline of Results: Non-zero-sum Games
11/6/2018

Non-zero-sum Games So far we discussed strictly competitive games where the objectives were complementary. Non-zero-sum games Each player has own objective. Concept of rationality: Nash equilibrium, i.e., a strategy profile of all the players such that no player has incentive to deviate if the other players keep playing the strategies in the profile. -Nash equilibrium: can only gain at most  by deviation. 11/6/2018

Closed sets (Safety). [Sec Sud 02]
History and Results Two-player nonzero-sum stochastic games with limit-average payoff [Vie 00a, Vie 00b] Closed sets (Safety). [Sec Sud 02] Discounted/ Halting Games. [Fink 64] 11/6/2018

Overview of Results Borel -reg Nash:SecSud02 S n pl. conc. R
n pl. turn-based 2 pl. conc.  Nash:Vie00 Lim. avg 11/6/2018

Our Results: Non-zero-sum Games
11/6/2018

Our Results: Non-zero-sum Concurrent Games
An n-player non-zero-sum reachability game has an  Nash equilibrium in memoryless strategies for all >0 .[C/ Maj Jur 04] A two-player non-zero-sum parity game has an  Nash equilibrium for all  >0. [C 05] 11/6/2018

Our Results: Non-zero-sum Turn Based Games
Results from [C/ Maj Jur 04] n-player turn based probabilistic games with Borel payoffs have -Nash equilibria in deterministic (pure) strategies. n-player turn based deterministic games with Borel payoffs have Nash equilibria in deterministic (pure) strategies. 11/6/2018

Can it be strengthened? Can -Nash equilibrium be strengthened to Nash equilibrium? 11/6/2018

Can it be strengthened? Can -Nash equilibrium be strengthened to Nash equilibrium? No. Counter-example. 11/6/2018

Las Vegas Game Work 1/2 Jackpot Sorry you lose Go to Vegas Play again
11/6/2018

For every >0, Las Vegas game has a (1-)-optimal winning strategy
For  < 1/2n, work for n days before heading to Vegas. But no optimal winning strategy. 11/6/2018

-Regular? The Las Vegas game is not -regular
Turn based probabilistic non-zero-sum games with -regular objectives have pure Nash equilibria. 11/6/2018

Results in Non-zero-sum Games
Borel -reg Nash:SecSud02 S n pl. conc.  Nash Nash R n pl. turn-based  Nash  Nash 2 pl. conc. Lim. avg  Nash:Vil00 11/6/2018

Conclusion and Open Problems
11/6/2018

Conclusion Stochastic Games
Model for controller synthesis in the presence of uncertainty. Improved strategy complexity, computational complexity, and practical algorithms for various classes of games with different classes of objectives. Existence of Nash and  Nash equilibrium in various classes of games. 11/6/2018

Open Problems Complexity of parity games
NP Å coNP for 2-player and 2 ½ -player games. Polynomial time algorithm is a major open question. Complexity of 2 ½ player reachability games NP Å coNP. Polynomial time algorithm is a major open problem. Other open problems Algorithms for concurrent parity games. Better algorithmic analysis of 2-player, 2 ½ -player and concurrent games with sub-classes of parity objectives. 11/6/2018

Open Problems Several open problems in non-zero-sum games
Related to existence of equilibria and also the related complexity problems. 11/6/2018

Results in Non-zero-sum Games
Borel -reg Nash:SecSud02 S n pl. conc.  Nash Nash R n pl. turn-based  Nash  Nash 2 pl. conc. Lim. avg  Nash:Vil00 11/6/2018

Open Problems Borel -reg S n pl. conc. R n pl. turn-based 2 pl. conc.
Lim. avg 11/6/2018

Acknowledgments Joint work with Thanks to you all !!! Luca de Alfaro
Thomas A. Henzinger Marcin Jurdzinski Rupak Majumdar Thanks to you all !!! 11/6/2018

Questions ??? 11/6/2018

Stochastic -Regular Games

Similar presentations

Presentation on theme: "Stochastic -Regular Games"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Stochastic -Regular Games

Similar presentations

Presentation on theme: "Stochastic -Regular Games"— Presentation transcript:

Similar presentations

About project

Feedback