Stochastic -Regular Games

Slides:



Advertisements
Similar presentations
Model Checking Lecture 2. Three important decisions when choosing system properties: 1automata vs. logic 2branching vs. linear time 3safety vs. liveness.
Advertisements

From Graph Models to Game Models Tom Henzinger EPFL.
Uri Zwick Tel Aviv University Simple Stochastic Games Mean Payoff Games Parity Games.
Vincent Conitzer CPS Repeated games Vincent Conitzer
1 A Graph-Theoretic Network Security Game M. Mavronicolas , V. Papadopoulou , A. Philippou  and P. Spirakis § University of Cyprus, Cyprus  University.
Markov Decision Process
This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.
Mixed Strategies CMPT 882 Computational Game Theory Simon Fraser University Spring 2010 Instructor: Oliver Schulte.
Energy and Mean-Payoff Parity Markov Decision Processes Laurent Doyen LSV, ENS Cachan & CNRS Krishnendu Chatterjee IST Austria MFCS 2011.
Krishnendu Chatterjee1 Graph Games with Reachabillity Objectives: Mixing Chess, Soccer and Poker Krishnendu Chatterjee 5 th Workshop on Reachability Problems,
Krishnendu Chatterjee1 Partial-information Games with Reachability Objectives Krishnendu Chatterjee Formal Methods for Robotics and Automation July 15,
Randomness for Free Laurent Doyen LSV, ENS Cachan & CNRS joint work with Krishnendu Chatterjee, Hugo Gimbert, Tom Henzinger.
Concurrent Reachability Games Peter Bro Miltersen Aarhus University 1CTW 2009.
Planning under Uncertainty
Discounting the Future in Systems Theory Chess Review May 11, 2005 Berkeley, CA Luca de Alfaro, UC Santa Cruz Tom Henzinger, UC Berkeley Rupak Majumdar,
Stochastic Zero-sum and Nonzero-sum  -regular Games A Survey of Results Krishnendu Chatterjee Chess Review May 11, 2005.
Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger.
Models and Theory of Computation (MTC) EPFL Dirk Beyer, Jasmin Fisher, Nir Piterman Simon Kramer: Logic for cryptography Marc Schaub: Models for biological.
Stochastic Games Games played on graphs with stochastic transitions Markov decision processes Games against nature Turn-based games Games against adversary.
Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6 th, 2006 CS286r Presented by Ilan Lobel.
Chess Review November 18, 2004 Berkeley, CA Hybrid Systems Theory Edited and Presented by Thomas A. Henzinger, Co-PI UC Berkeley.
DANSS Colloquium By Prof. Danny Dolev Presented by Rica Gonen
Stochastic Games Krishnendu Chatterjee CS 294 Game Theory.
NSF Foundations of Hybrid and Embedded Software Systems UC Berkeley: Chess Vanderbilt University: ISIS University of Memphis: MSI Program Review May 10,
Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.
Quantitative Languages Krishnendu Chatterjee, UCSC Laurent Doyen, EPFL Tom Henzinger, EPFL CSL 2008.
MAKING COMPLEX DEClSlONS
MCS312: NP-completeness and Approximation Algorithms
Energy Parity Games Laurent Doyen LSV, ENS Cachan & CNRS Krishnendu Chatterjee IST Austria.
Presenter: Jen Hua Chi Adviser: Yeong Sung Lin Network Games with Many Attackers and Defenders.
Peter van Emde Boas: Games and Computer Science 1999 GAMES AND COMPUTER SCIENCE Theoretical Models 1999 Peter van Emde Boas References available at:
Algorithmic Software Verification III. Finite state games and pushdown automata.
Uri Zwick Tel Aviv University Simple Stochastic Games Mean Payoff Games Parity Games TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Games with Secure Equilibria Krishnendu Chatterjee (Berkeley) Thomas A. Henzinger (EPFL) Marcin Jurdzinski (Warwick)
Avoiding Determinization Orna Kupferman Hebrew University Joint work with Moshe Vardi.
Ásbjörn H Kristbjörnsson1 The complexity of Finding Nash Equilibria Ásbjörn H Kristbjörnsson Algorithms, Logic and Complexity.
Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.
1 Algorithms for Computing Approximate Nash Equilibria Vangelis Markakis Athens University of Economics and Business.
Krishnendu ChatterjeeFormal Methods Class1 MARKOV CHAINS.
Adversarial Search and Game-Playing
Kristoffer Arnsfelt Hansen Rasmus Ibsen-Jensen Peter Bro Miltersen
Game Theory M.Pajhouh Niya M.Ghotbi
The Duality Theorem Primal P: Maximize
Game Theory Just last week:
Prof. Dr. Holger Schlingloff 1,2 Dr. Esteban Pavese 1
Making complex decisions
The Multiple Dimensions of Mean-Payoff Games
Communication Complexity as a Lower Bound for Learning in Games
Vincent Conitzer CPS Repeated games Vincent Conitzer
Artificial Intelligence
Convergence, Targeted Optimality, and Safety in Multiagent Learning
Turnstile Streaming Algorithms Might as Well Be Linear Sketches
Multiagent Systems Game Theory © Manfred Huber 2018.
Alternating tree Automata and Parity games
Presented By Aaron Roth
Game Theory in Wireless and Communication Networks: Theory, Models, and Applications Lecture 10 Stochastic Game Zhu Han, Dusit Niyato, Walid Saad, and.
Uri Zwick Tel Aviv University
Artificial Intelligence
Memoryless Determinacy of Parity Games
Pruned Search Strategies
Multiagent Systems Repeated Games © Manfred Huber 2018.
Vincent Conitzer Repeated games Vincent Conitzer
15th Scandinavian Workshop on Algorithm Theory
Instructor: Aaron Roth
Collaboration in Repeated Games
Normal Form (Matrix) Games
A Technique for Reducing Normal Form Games to Compute a Nash Equilibrium Vincent Conitzer and Tuomas Sandholm Carnegie Mellon University, Computer Science.
Vincent Conitzer CPS Repeated games Vincent Conitzer
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Presentation transcript:

Stochastic -Regular Games Krishnendu Chatterjee UC Berkeley Dissertation Talk 11/6/2018

Games on Graphs Games on graphs Model interaction between components. Control of finite state systems Solving two player games. 11/6/2018

Graph Models of Processes vertices = states edges = transitions paths = behaviors 11/6/2018

Game Models of Processes vertices = states edges = transitions paths = behaviors Different processes or processes in open environment. 11/6/2018

Graphs vs. Games a a a b a b 11/6/2018

Game Models Enable -synthesis [Church, Rabin, Ramadge/Wonham, Pnueli/Rosner] -receptiveness [Dill, Abadi/Lamport] -semantics of interaction [Abramsky] -reasoning about adversarial behavior -non-emptiness of non-deterministic automata -behavioral type systems and interface automata [deAlfaro/ Henzinger] -modular reasoning [Kupferman/Vardi, Alur/deAlfaro/Henzinger /Mang] -early error detection [deAlfaro/Henzinger/Mang] -model-based testing [Gurevich/Veanes et al.] -etc. 11/6/2018

Game Models Always about open systems: -players = processes / components / agents -demonic or adversarial non-determinism 11/6/2018

Stochastic Games 11/6/2018

Stochastic Games Where: What for: How: Arena: Game graphs with probabilistic transitions. What for: Objectives: -regular. How: Strategies. 11/6/2018

Game Graphs Two broad class Turn-based games Concurrent games Players make moves in turns. Example: tic-tac-toe, chess. Concurrent games Players make moves simultaneously and independently. Example: matrix games. 11/6/2018

Goals Computational issues. Strategy classification Characterize simplest class of strategies that achieve the optimal payoff. 11/6/2018

Three key components Classes of game graphs. Objectives. Strategies and notion of values. 11/6/2018

Markov Decision Process A Markov Decision Process (MDP) is defined as G=((V,E),(V1,V0)), where (V,E) is a finite graph. (V1,V0) is a partition of V. V0 are random nodes chooses between successors uniformly at random. 11/6/2018

A Markov Decision Process 1 1 1 1 1 11/6/2018

Markov Decision Process Games against nature or randomness. Control of systems in presence of uncertainty. We call them 1 ½ -player games. 11/6/2018

Two-player Games A two-player game is defined as G=((V,E),(V1,V2)), where (V,E) is a finite graph. (V1,V2) is a partition of V. V1 player 1 makes moves. V2 player 2 makes moves. 11/6/2018

Two-player Games Games against adversary. Control in open environment controller synthesis. We call them 2-player games. 11/6/2018

Turn-based Probabilistic Games A turn-based probabilistic game is defined as G=((V,E),(V1,V2,V0)), where (V,E) is a finite graph. (V1,V2,V0) is a partition of V. V1 player 1 makes moves. V2 player 2 makes moves. V0 randomly chooses successors. 11/6/2018

Turn-based Probabilistic Games Games against adversary and nature. Controller synthesis in presence of uncertainty controller synthesis of stochastic reactive systems. We call them 2 ½ -player games. Also known as perfect-information stochastic games. 11/6/2018

Game Played Token placed on an initial vertex. If current vertex is Player 1 vertex then player 1 chooses successor. Player 2 vertex then player 2 chooses successor. Player random vertex proceed to successors uniformly at random. Generates infinite sequence of vertices. 11/6/2018

Concurrent Games Players make move simultaneously. Finite set of vertices V. Finite set of actions A. Action assignments 1,2:V ! 2A n ;. Probabilistic transition function (s, a1, a2)(t) = Pr [ t | s, a1, a2] 11/6/2018

A Concurrent Game Actions at s0: a, b for player 1, c, d for player 2. ad Actions at s0: a, b for player 1, c, d for player 2. s0 ac,bd bc s2 s1 11/6/2018

Concurrent Games Games with simultaneous interaction. Model synchronous interaction. 11/6/2018

Stochastic Games 1 ½ pl. 2 pl. 2 ½ pl. Conc. games 11/6/2018

Three key components Classes of game graphs. Objectives. Strategies and notion of values. 11/6/2018

Plays Plays: infinite sequence of vertices or infinite trajectories. V: set of all infinite plays or infinite trajectories. 11/6/2018

Objectives Plays: infinite sequence of vertices. Objectives: subset of plays, 1 µ V. Play is winning for player 1 if it is in 1. Zero-sum game: 2 = V n1. 11/6/2018

Reachability and Safety Let R µ V set of target vertices. Reachability objective requires to visit the set R of vertices. Let F µ V set of safe vertices. Safety objective requires never to visit any vertex outside F. 11/6/2018

Reachability and Safety 11/6/2018

Buechi and coBuechi Let B µ V a set of Buechi vertices. Buechi objective requires that the set B is visited infinitely often. Let C µ V a set of coBuechi vertices. coBuechi objective requires that vertices outside C visited finitely often. 11/6/2018

Buechi and coBuechi C B coBuchi Buchi 11/6/2018

Rabin and Streett Let {(G1,R1), (G2,R2),…, (Gd,Rd)} set of vertex set pairs. Rabin: requires there is a pair (Gj,Rj) such that Gj finitely often and Rj infinitely often. Streett: requires for every pair (Gj,Rj) if Rj infinitely often then Gj infinitely often. Rabin-chain: both a Rabin-Streett, also known as parity objectives; it is a complementation closed subset of Rabin. 11/6/2018

Mueller Objective Mueller objective Fµ 2V is specified as a set of subset of vertices and requires that the set of states that are visited infinitely often is in F. Rabin, Streett, Rabin-chain objectives are all Mueller objectives. 11/6/2018

Objectives -regular: [,¢ , *,. Safety, Reachability, Liveness, etc. Rabin and Streett canonical ways to express. Borel -regular 11/6/2018

Three key components Classes of game graphs. Objectives. Strategies and notion of values. 11/6/2018

Strategies Given a finite sequence of vertices, (that represents the history of play) a strategy  for player 1 is a probability distribution over the set of successors.  : V* ¢ V1 ! D(V) 11/6/2018

Strategies 1/2 1 1 1/2 1 1 1 11/6/2018

Strategies 3/4 1 1 1/4 1 1 1 11/6/2018

Strategies with Memory It has memory to remember the history. Let M represent a set called memory. Strategy with memory represented as two functions: m : V £ M ! M s : V1 £ M ! D(V) A finite memory strategy is one with finite memory size M. 11/6/2018

Subclass of Strategies Memoryless (stationary) strategies: Strategy is independent of the history of the play and depends on the current vertex, i.e., |M|=1. : V1 ! D(V) Pure strategies: chooses a successor rather than a probability distribution. Pure-memoryless: both pure and memoryless (simplest class). 11/6/2018

Strategies The set of strategies Set of strategy  for player 1; strategies . Set of strategy  for player 2; strategies . 11/6/2018

Values Given objectives 1 and 2 = V n 1 the value for the players are val1(1)(v) = sup 2  inf 2  Prv,(1). val2(2)(v) = sup 2  inf 2  Prv,(2). 11/6/2018

Determinacy Determinacy Determinacy means val1(1)(v) + val2(2)(v) =1. Determinacy means sup inf = inf sup. von Neumann’s minmax theorem in matrix games. 11/6/2018

Optimal Strategies A strategy  is optimal for objective 1 if val1(1)(v) = inf Prv, (1). Analogous definition for player 2. 11/6/2018

Computational Issues Algorithms to compute values in games. Identify the simplest class of strategies that suffices for optimality. 11/6/2018

Outline of Results: Turn-based Games 11/6/2018

History and Results Two-player games. Determinacy (sup inf = inf sup) theorem for Borel objectives. [Mar75] Strong determinacy: values only 0 and 1. Finite memory determinacy (i.e., finite memory optimal strategy exists) for -regular objective. [GurHar82] Pure memoryless optimal strategy exists for Rabin objectives. [EmeJut88] NP-complete. coNP-complete for Streett objectives. Mueller objectives Precisely characterize memory requirement by a number mF [DzeJurWal97]. PSPACE-complete [HunDaw05]. 11/6/2018

History and Results 2 ½ -player games. Determinacy (sup inf = inf sup) theorem for Borel objectives. [Mar98] Strong determinacy (values only 0 and 1) does not hold. 11/6/2018

History and Results 2 ½ -player games Reachability objectives [Con92] Pure memoryless optimal strategy exists. Decided in NP Å coNP. Decision problem: given a 2 ½ -player game, an objective 1, a vertex v, and a rational r, whether val1(1)(v) ¸ r. 11/6/2018

History and Results 2 ½ -player games -regular objectives [deAl Maj01] Parity games: 2EXPTIME. Rabin, Streett and Mueller: 3EXPTIME. Algorithms obtained as a special case of an algorithm for the general case of concurrent games. 11/6/2018

Overview of Results Borel Mar75 1 ½ pl. GH82 2 pl. -regular EJ88 Mar98, dAM01 Conc. games 11/6/2018

Our Results: Turn-based Games 11/6/2018

Results on 2 ½ player Games The complexity of Rabin, Streett and parity objectives [C/deAl Hen 05] Pure memoryless optimal strategies exist for Rabin objectives. NP-complete for Rabin (previously was 3EXPTIME). coNP-complete for Streett (previously was 3EXPTIME). NP Å coNP for parity (previously was 2EXPTIME). 11/6/2018

Rabin and Strett Objectives 2 ½ pl. EJ88 :PM NP comp. 2 pl. PM, NP comp. Reach Con 92: PM 11/6/2018

Results on 2 ½ player Games The complexity of Mueller objectives [C 07] Winning strategies with memory mF suffices (the same memory upper bound as 2 player games). There is a matching lower bound also. PSPACE-complete (previously known 3EXPTIME). 11/6/2018

Results on 2 ½ player Games Practical algorithms Strategy improvement algorithms: iterate over local optimizations of strategies, that must converge to global optimal. Converges fast in practice though worst case bound in exponential. Strategy improvement for parity objectives [C/Hen 06a]. Strategy improvement for Rabin and Streett objectives [C/Hen 06b]. 11/6/2018

Outline of Results: Concurrent Games 11/6/2018

Concurrent Games: Example ad Actions at s0: a, b for player 1, c, d for player 2. Objective for player 1: to reach s1 Pure strategy not enough: a -> d b -> c s0 ac,bd bc s1 s2 11/6/2018

Concurrent Games: Example ad Fact: For any strategy for player 1 she cannot win with prob. 1. As long it plays a deterministically play d, else play c with positive probability. s0 ac,bd bc s1 s2 11/6/2018

Concurrent Games: Example 1- ad Fix strategy for player 1 a !1- b !  c s0 d ac,bd bc   1- s1 s2 For every positive  player 1 can win with probability 1 -. 11/6/2018

Concurrent Games: Example ad Actions at s0: a, b for player 1, c, d for player 2. Objective for player 1: to reach s1 infinitely often. Infinite memory is required. s0 ac,bd bc s2 s1 11/6/2018

Concurrent Games Optimal strategies need not exist: -optimal strategies for all >0. Require randomization: pure strategies not enough. For -regular objectives -optimal strategies require infinite memory. 11/6/2018

History and Results Detailed analysis of concurrent games [Fil Vri 97]. Determinacy theorem for all Borel objectives [Mar 98]. Concurrent -regular games: Reachability objectives [deAl Hen Kup98]. Rabin-chain objectives [deAl Hen00]. Rabin-chain objectives [deAl Maj01]. 11/6/2018

History and Results Qualitative analysis Quantitative analysis Reachability objectives [deAl Hen Kup98] Quadratic time. Parity objectives [deAl Hen 00] NP and coNP. Quantitative analysis Reachability objectives [Ete Yan 06] PSPACE. 11/6/2018

History and Results Concurrent games with parity objectives 3EXPTIME [deAl Maj 01] Quantitative -calculus formula. Formula in the theory of reals with length exponential in the size of the game and exponentially many quantifier alternations in the number of parities. Rabin, Streett and Mueller: 4EXPTIME. 11/6/2018

Overview of Results Borel 1 ½ pl. 2 pl. -regular 2 ½ pl. dAH00,dAM01 Reach dAH00,dAM01 Conc. games EY06 Mar98 11/6/2018

Our Results: Concurrent Games 11/6/2018

Results on Concurrent Games The complexity of concurrent parity games [C/deAl Hen06]. PSPACE A polynomial guess followed by a formula in existential theory of reals. Previous formula in the theory of reals required exponential quantifier alternations and was 3EXPTIME. The complexity of concurrent Rabin, Streett and Mueller objectives EXPSPACE – reduction to parity (previous known was 4EXPTIME). 11/6/2018

Results on Concurrent Games Practical algorithms Strategy improvement algorithm for reachability objectives [C/deAl Hen 06b]. Some improved analysis of strategies for concurrent reachability games. A new combinatorial proof of existence of memoryless  optimal strategies, for all >0. 11/6/2018

Outline of Results: Non-zero-sum Games 11/6/2018

Non-zero-sum Games So far we discussed strictly competitive games where the objectives were complementary. Non-zero-sum games Each player has own objective. Concept of rationality: Nash equilibrium, i.e., a strategy profile of all the players such that no player has incentive to deviate if the other players keep playing the strategies in the profile. -Nash equilibrium: can only gain at most  by deviation. 11/6/2018

Closed sets (Safety). [Sec Sud 02] History and Results Two-player nonzero-sum stochastic games with limit-average payoff. [Vie 00a, Vie 00b] Closed sets (Safety). [Sec Sud 02] Discounted/ Halting Games. [Fink 64] 11/6/2018

Overview of Results Borel -reg Nash:SecSud02 S n pl. conc. R n pl. turn-based 2 pl. conc.  Nash:Vie00 Lim. avg 11/6/2018

Our Results: Non-zero-sum Games 11/6/2018

Our Results: Non-zero-sum Concurrent Games An n-player non-zero-sum reachability game has an  Nash equilibrium in memoryless strategies for all >0 .[C/ Maj Jur 04] A two-player non-zero-sum parity game has an  Nash equilibrium for all  >0. [C 05] 11/6/2018

Our Results: Non-zero-sum Turn Based Games Results from [C/ Maj Jur 04] n-player turn based probabilistic games with Borel payoffs have -Nash equilibria in deterministic (pure) strategies. n-player turn based deterministic games with Borel payoffs have Nash equilibria in deterministic (pure) strategies. 11/6/2018

Can it be strengthened? Can -Nash equilibrium be strengthened to Nash equilibrium? 11/6/2018

Can it be strengthened? Can -Nash equilibrium be strengthened to Nash equilibrium? No. Counter-example. 11/6/2018

Las Vegas Game Work 1/2 Jackpot Sorry you lose Go to Vegas Play again 11/6/2018

For every >0, Las Vegas game has a (1-)-optimal winning strategy For  < 1/2n, work for n days before heading to Vegas. But no optimal winning strategy. 11/6/2018

-Regular? The Las Vegas game is not -regular Turn based probabilistic non-zero-sum games with -regular objectives have pure Nash equilibria. 11/6/2018

Results in Non-zero-sum Games Borel -reg Nash:SecSud02 S n pl. conc.  Nash Nash R n pl. turn-based  Nash  Nash 2 pl. conc. Lim. avg  Nash:Vil00 11/6/2018

Conclusion and Open Problems 11/6/2018

Conclusion Stochastic Games Model for controller synthesis in the presence of uncertainty. Improved strategy complexity, computational complexity, and practical algorithms for various classes of games with different classes of objectives. Existence of Nash and  Nash equilibrium in various classes of games. 11/6/2018

Open Problems Complexity of parity games NP Å coNP for 2-player and 2 ½ -player games. Polynomial time algorithm is a major open question. Complexity of 2 ½ player reachability games NP Å coNP. Polynomial time algorithm is a major open problem. Other open problems Algorithms for concurrent parity games. Better algorithmic analysis of 2-player, 2 ½ -player and concurrent games with sub-classes of parity objectives. 11/6/2018

Open Problems Several open problems in non-zero-sum games Related to existence of equilibria and also the related complexity problems. 11/6/2018

Results in Non-zero-sum Games Borel -reg Nash:SecSud02 S n pl. conc.  Nash Nash R n pl. turn-based  Nash  Nash 2 pl. conc. Lim. avg  Nash:Vil00 11/6/2018

Open Problems Borel -reg S n pl. conc. R n pl. turn-based 2 pl. conc. Lim. avg 11/6/2018

Acknowledgments Joint work with Thanks to you all !!! Luca de Alfaro Thomas A. Henzinger Marcin Jurdzinski Rupak Majumdar Thanks to you all !!! 11/6/2018

Questions ??? http://www.eecs.berkeley.edu/~c_krish 11/6/2018