Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger.

Slides:



Advertisements
Similar presentations
Model Checking Lecture 4. Outline 1 Specifications: logic vs. automata, linear vs. branching, safety vs. liveness 2 Graph algorithms for model checking.
Advertisements

Model Checking Lecture 3. Specification Automata Syntax, given a set A of atomic observations: Sfinite set of states S 0 Sset of initial states S S transition.
Model Checking Lecture 2. Three important decisions when choosing system properties: 1automata vs. logic 2branching vs. linear time 3safety vs. liveness.
From Graph Models to Game Models Tom Henzinger EPFL.
Part VI NP-Hardness. Lecture 23 Whats NP? Hard Problems.
Uri Zwick Tel Aviv University Simple Stochastic Games Mean Payoff Games Parity Games.
Distributed Markov Chains P S Thiagarajan School of Computing, National University of Singapore Joint work with Madhavan Mukund, Sumit K Jha and Ratul.
JAYASRI JETTI CHINMAYA KRISHNA SURYADEVARA
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Hybrid Systems Presented by: Arnab De Anand S. An Intuitive Introduction to Hybrid Systems Discrete program with an analog environment. What does it mean?
Energy and Mean-Payoff Parity Markov Decision Processes Laurent Doyen LSV, ENS Cachan & CNRS Krishnendu Chatterjee IST Austria MFCS 2011.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Krishnendu Chatterjee1 Graph Games with Reachabillity Objectives: Mixing Chess, Soccer and Poker Krishnendu Chatterjee 5 th Workshop on Reachability Problems,
Krishnendu Chatterjee1 Partial-information Games with Reachability Objectives Krishnendu Chatterjee Formal Methods for Robotics and Automation July 15,
Randomness for Free Laurent Doyen LSV, ENS Cachan & CNRS joint work with Krishnendu Chatterjee, Hugo Gimbert, Tom Henzinger.
Planning under Uncertainty
Discounting the Future in Systems Theory Chess Review May 11, 2005 Berkeley, CA Luca de Alfaro, UC Santa Cruz Tom Henzinger, UC Berkeley Rupak Majumdar,
02/01/11CMPUT 671 Lecture 11 CMPUT 671 Hard Problems Winter 2002 Joseph Culberson Home Page.
Complexity Theory CSE 331 Section 2 James Daly. Reminders Project 4 is out Due Friday Dynamic programming project Homework 6 is out Due next week (on.
Stochastic Zero-sum and Nonzero-sum  -regular Games A Survey of Results Krishnendu Chatterjee Chess Review May 11, 2005.
61 Nondeterminism and Nodeterministic Automata. 62 The computational machine models that we learned in the class are deterministic in the sense that the.
Models and Theory of Computation (MTC) EPFL Dirk Beyer, Jasmin Fisher, Nir Piterman Simon Kramer: Logic for cryptography Marc Schaub: Models for biological.
1 Computing Nash Equilibrium Presenter: Yishay Mansour.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
Stochastic Games Games played on graphs with stochastic transitions Markov decision processes Games against nature Turn-based games Games against adversary.
Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6 th, 2006 CS286r Presented by Ilan Lobel.
Quantum Automata Formalism. These are general questions related to complexity of quantum algorithms, combinational and sequential.
UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.
Analysis of Algorithms CS 477/677
From Boolean to Quantitative System Specifications Tom Henzinger EPFL.
Chess Review November 18, 2004 Berkeley, CA Hybrid Systems Theory Edited and Presented by Thomas A. Henzinger, Co-PI UC Berkeley.
NP and NP- Completeness Bryan Pearsaul. Outline Decision and Optimization Problems Decision and Optimization Problems P and NP P and NP Polynomial-Time.
Theory of Computing Lecture 22 MAS 714 Hartmut Klauck.
Stochastic Games Krishnendu Chatterjee CS 294 Game Theory.
NSF Foundations of Hybrid and Embedded Software Systems UC Berkeley: Chess Vanderbilt University: ISIS University of Memphis: MSI Program Review May 10,
UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.
Quantitative Languages Krishnendu Chatterjee, UCSC Laurent Doyen, EPFL Tom Henzinger, EPFL CSL 2008.
MAKING COMPLEX DEClSlONS
Complexity Classes Kang Yu 1. NP NP : nondeterministic polynomial time NP-complete : 1.In NP (can be verified in polynomial time) 2.Every problem in NP.
MCS312: NP-completeness and Approximation Algorithms
Model Checking Lecture 4 Tom Henzinger. Model-Checking Problem I |= S System modelSystem property.
Energy Parity Games Laurent Doyen LSV, ENS Cachan & CNRS Krishnendu Chatterjee IST Austria.
Theory of Computing Lecture 15 MAS 714 Hartmut Klauck.
Model Checking Lecture 3 Tom Henzinger. Model-Checking Problem I |= S System modelSystem property.
Rutgers University A Polynomial-time Nash Equilibrium Algorithm for Repeated Stochastic Games Enrique Munoz de Cote Michael L. Littman.
Uri Zwick Tel Aviv University Simple Stochastic Games Mean Payoff Games Parity Games TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Games with Secure Equilibria Krishnendu Chatterjee (Berkeley) Thomas A. Henzinger (EPFL) Marcin Jurdzinski (Warwick)
Avoiding Determinization Orna Kupferman Hebrew University Joint work with Moshe Vardi.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Four Lectures on Model Checking Tom Henzinger University of California, Berkeley.
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
CSCI 115 Chapter 8 Topics in Graph Theory. CSCI 115 §8.1 Graphs.
1 Introduction to Quantum Information Processing CS 467 / CS 667 Phys 467 / Phys 767 C&O 481 / C&O 681 Richard Cleve DC 3524 Course.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Model Checking Lecture 2. Model-Checking Problem I |= S System modelSystem property.
Model Checking Lecture 2 Tom Henzinger. Model-Checking Problem I |= S System modelSystem property.
Krishnendu ChatterjeeFormal Methods Class1 MARKOV CHAINS.
Complexity of Compositional Model Checking of Computation Tree Logic on Simple Structures Krishnendu Chatterjee Pallab Dasgupta P.P. Chakrabarti IWDC 2004,
Four Lectures on Model Checking Tom Henzinger University of California, Berkeley.
Part VI NP-Hardness.
The Multiple Dimensions of Mean-Payoff Games
Stochastic -Regular Games
Markov Decision Processes
Markov Decision Processes
Alternating tree Automata and Parity games
Uri Zwick Tel Aviv University
Quantitative Modeling, Verification, and Synthesis
Instructor: Aaron Roth
Collaboration in Repeated Games
Presentation transcript:

Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger

Graph Models of Systems vertices = states edges = transitions paths = behaviors

graph Extended Graph Models CONTROL: game graph OBJECTIVE:  -automaton PROBABILITIES: Markov decision process stochastic game  regular game CLOCKS: timed automaton stochastic hybrid system

Graphs vs. Games a ba a b a

Games model Open Systems Two players: environment / controller / input vs. system / plant / output Multiple players: processes / components / agents Stochastic players: nature / randomized algorithms

Example P1: init x := 0 loop choice | x := x+1 mod 2 | x := 0 end choice end loop  1 :  (  x = y ) P2: init y := 0 loop choice | y := x | y := x+1 mod 2 end choice end loop  2 :  ( y = 0 )

Graph Questions 8  ( x = y ) 9  ( x = y ) CTL

Graph Questions 8  ( x = y ) 9  ( x = y )  X CTL

Zero-Sum Game Questions hhP1ii  ( x = y ) hhP2ii  ( y = 0 ) ATL [Alur/H/Kupferman]

Zero-Sum Game Questions hhP1ii  ( x = y ) hhP2ii  ( y = 0 ) ATL [Alur/H/Kupferman]

Zero-Sum Game Questions hhP1ii  ( x = y ) hhP2ii  ( y = 0 ) ATL [Alur/H/Kupferman] X

Zero-Sum Game Questions hhP1ii  ( x = y ) hhP2ii  ( y = 0 ) ATL [Alur/H/Kupferman]  X

Nonzero-Sum Game Questions hhP1ii  ( x = y )  hhP2ii  ( y = 0 ) Secure equilibra [Chatterjee/H/Jurdzinski]

Nonzero-Sum Game Questions hhP1ii  ( x = y )  hhP2ii  ( y = 0 ) Secure equilibra [Chatterjee/H/Jurdzinski] 

Strategies Strategies x,y: Q * ! Q From a state q, a pair (x,y) of a player-1 strategy x2  1 and a player-2 strategy y2  2 gives a unique infinite path Outcome x,y (q) 2 Q .

Strategies hhP1ii  1 = (9 x2  1 ) (8 y2  2 )  1 (x,y) Short for: q ² hhP1ii  1 iff (9 x2  1 ) (8 y2  2 ) ( Outcome x,y (q) ²  1 ) Strategies x,y: Q * ! Q From a state q, a pair (x,y) of a player-1 strategy x2  1 and a player-2 strategy y2  2 gives a unique infinite path Outcome x,y (q) 2 Q .

Strategies hhP1ii  1 = (9 x2  1 ) (8 y2  2 )  1 (x,y) hhP1ii  1  hhP2ii  2 = (9 x2  1 ) (9 y2  2 ) [ (  1 Æ  2 )(x,y) Æ (8 y’2  2 ) (  2 !  1 )(x,y’) Æ (8 x’2  1 ) (  2 !  1 )(x,y) ] Strategies x,y: Q * ! Q From a state q, a pair (x,y) of a player-1 strategy x2  1 and a player-2 strategy y2  2 gives a unique infinite path Outcome x,y (q) 2 Q .

Objectives    and  2 Qualitative: reachability; Buechi; parity (  -regular) Quantitative: max; lim sup; lim avg

Reachability} a Safety  a= :}: a Normal Forms of  -Regular Sets Borel-1

Reachability} a Safety  a= :}: a Buechi  } a coBuechi}  a = :  }: a Normal Forms of  -Regular Sets Borel-1 Borel-2

Reachability} a Safety  a= :}: a Buechi  } a coBuechi}  a = :  }: a StreettÆ (  } a !  } b ) = Æ ( }  : a Ç  } b ) RabinÇ ( }  a Æ  } b ) Parity: complement-closed subset of Streett/Rabin Normal Forms of  -Regular Sets Borel-1 Borel-2 Borel-2.5

Buechi Game q4q4 q0q0 q2q2 q1q1 q3q3 G B

q4q4 q0q0 q2q2 q1q1 q3q3 G B Secure equilibrium (x,y) at q 0 : x: if q 1 ! q 0, then q 2 else q 4. y: if q 3 ! q 1, then q 0 else q 4. Strategies require memory.

Zero-Sum Games: Determinacy W1W1 W2W2  1 = :  2 hhP2ii  2 hhP1ii  1

Nonzero-sum Games W 10 hhP1ii (  1 Æ :  2 ) W 01 hhP2ii (  2 Æ :  1 ) W 11 W 00 hhP1ii  1  hhP2ii  2

Objectives Qualitative: reachability; Buchi; parity (  -regular) Quantitative: max; lim sup; lim avg

Objectives Qualitative: reachability; Buchi; parity (  -regular) Quantitative: max; lim sup; lim avg Borel-1 Borel-2 Borel-3

Quantitative Games hhP1ii lim sup hhP1ii lim avg

Quantitative Games hhP1ii lim sup = 3 hhP1ii lim avg

Quantitative Games hhP1ii lim sup = 3 hhP1ii lim avg =

Solving Games by Value Iteration Generalization of the  -calculus: computing fixpoints of transfer functions (pre; post). Generalization of dynamic programming: iterative optimization. q Region R: Q ! V q’ R(q’)

Solving Games by Value Iteration Generalization of the  -calculus: computing fixpoints of transfer functions (pre; post). Generalization of dynamic programming: iterative optimization. q Region R: Q ! V q’ R(q’) R(q) := pre(R(q’))

Q states  transition labels  : Q    Q transition function Graph

Q states  transition labels  : Q    Q transition function  = [ Q ! {0,1} ] regions with V = B 9 pre:    q  9 pre(R) iff (   )  (q,  )  R 8 pre:    q  8 pre(R) iff (   )  (q,  )  R Graph

acb 9  c =(  X) ( c Ç 9pre(X) )

acb Graph 9  c =(  X) ( c Ç 9pre(X) )

acb Graph 9  c =(  X) ( c Ç 9pre(X) )

acb Graph 9  c =(  X) ( c Ç 9pre(X) ) 8  c=(  X) ( c Ç 8pre(X) )

Graph Reachability   R Given R µ Q, find the states from which some path leads to R. R

R R [  pre(R)   R = (  X) (R Ç 9 pre(X)) Given R µ Q, find the states from which some path leads to R. Graph Reachability

R R [  pre(R) R [  pre(R) [  pre 2 (R)   R = (  X) (R Ç 9 pre(X)) Given R µ Q, find the states from which some path leads to R. Graph Reachability

R... RR R [  pre(R) R [  pre(R) [  pre 2 (R)   R = (  X) (R Ç 9 pre(X)) Given R µ Q, find the states from which some path leads to R. Graph Reachability

R... RR R [  pre(R) R [  pre(R) [  pre 2 (R)   R = (  X) (R Ç 8 pre(X)) Given R µ Q, find the states from which all paths lead to R. Graph Reachability

Value Iteration Algorithms consist of A.LOCAL PART: 9pre and 8pre computation B.GLOBAL PART: evaluation of a fixpoint expression We need to generalize both parts to solve games.

Q 1, Q 2 states( Q = Q 1 [ Q 2 )  transition labels  : Q    Q transition function Turn-based Game

Q 1, Q 2 states( Q = Q 1 [ Q 2 )  transition labels  : Q    Q transition function  = [ Q ! {0,1} ] regions with V = B 1pre:    q  1pre(R) iff q 2 Q 1 Æ (     )  (q,  )  R or q 2 Q 2 Æ ( 8  2  )  (q,  ) 2 R Turn-based Game

Q 1, Q 2 states( Q = Q 1 [ Q 2 )  transition labels  : Q    Q transition function  = [ Q ! {0,1} ] regions with V = B 1pre:    q  1pre(R) iff q 2 Q 1 Æ (     )  (q,  )  R or q 2 Q 2 Æ ( 8  2  )  (q,  ) 2 R 2pre:    q  2pre(R) iff q 2 Q 1 Æ ( 8    )  (q,  )  R or q 2 Q 2 Æ ( 9  2  )  (q,  ) 2 R Turn-based Game

c ab

c ab hhP1ii  c =(  X) ( c Ç 1pre(X) )

c Turn-based Game ab hhP1ii  c =(  X) ( c Ç 1pre(X) )

c Turn-based Game ab hhP1ii  c =(  X) ( c Ç 1pre(X) ) hhP2ii  c=(  X) ( c Ç 2pre(X) )

c Turn-based Game ab hhP1ii  c =(  X) ( c Ç 1pre(X) ) hhP2ii  c=(  X) ( c Ç 2pre(X) )

c Turn-based Game ab hhP1ii  c =(  X) ( c Ç 1pre(X) ) hhP2ii  c=(  X) ( c Ç 2pre(X) )

R  P1   R Given R µ Q, find the states from which player 1 has a strategy to force the game to R. Reachability Game

R R [ 1pre(R)  P1   R Given R µ Q, find the states from which player 1 has a strategy to force the game to R. Reachability Game

R R [ 1pre(R) R [ 1pre(R) [ 1pre 2 (R)  P1   R Given R µ Q, find the states from which player 1 has a strategy to force the game to R. Reachability Game

R...  1   R R [ 1pre(R) R [ 1pre(R) [ 1pre 2 (R)  P1   R = (  X) (R Ç 1pre(X)) Given R µ Q, find the states from which player 1 has a strategy to force the game to R. Reachability Game

 P1   R Given R µ Q, find the states from which player 1 has a strategy to keep the game in R. R Safety Game

R \ 1pre(R)  P1   R Given R µ Q, find the states from which player 1 has a strategy to keep the game in R. R Safety Game

R \ 1pre(R) R \ 1pre(R) \ 1pre 2 (R)  P1   R Given R µ Q, find the states from which player 1 has a strategy to keep the game in R. R Safety Game

...  1   R R \ 1pre(R) R \ 1pre(R) \ 1pre 2 (R)  P1   R = ( X) (R Æ 1pre(X)) Given R µ Q, find the states from which player 1 has a strategy to keep the game in R. R Safety Game

Q 1, Q 2 states( Q = Q 1 [ Q 2 )  transition labels  : Q    N £ Q transition function Quantitative Game

Q 1, Q 2 states( Q = Q 1 [ Q 2 )  transition labels  : Q    N £ Q transition function  = [ Q ! N ] regions with V = N 1pre:    1pre(R)(q) = (max    ) max(  1 (q,  ), R(  2 (q,  )) ) if q 2 Q 1 (min  2  ) max(  1 (q,  ), R(  2 (q,  )) ) if q 2 Q 2 Quantitative Game

Q 1, Q 2 states( Q = Q 1 [ Q 2 )  transition labels  : Q    N £ Q transition function  = [ Q ! N ] regions with V = N 1pre:    1pre(R)(q) = (max    ) max(  1 (q,  ), R(  2 (q,  )) ) if q 2 Q 1 (min  2  ) max(  1 (q,  ), R(  2 (q,  )) ) if q 2 Q 2 2pre:    2pre(R)(q) = (min    ) max(  1 (q,  ), R(   (q,  )) ) if q 2 Q 1 (max  2  ) max(  1 (q,  ), R(  2 (q,  )) ) if q 2 Q 2 Quantitative Game

c Maximizing Game ab

c ab hhP1ii  0 =(  X) max( 0, 1pre(X) )

c Maximizing Game ab hhP1ii  0 =(  X) max( 0, 1pre(X) )

c Maximizing Game ab hhP1ii  0 =(  X) max( 0, 1pre(X) )

c Maximizing Game ab hhP1ii  0 =(  X) max( 0, 1pre(X) )

B   B Given B µ Q, find the states from which some path visits B infinitely often. Buechi Graph

B R 1 =    pre(B)...  pre(B)  pre(B) [  pre 2 (B)   B Given B µ Q, find the states from which some path visits B infinitely often. Buechi Graph

B R 1 =    pre(B)   B Given B µ Q, find the states from which some path visits B infinitely often. Buechi Graph

B R 1 =    pre(B) R 2 =    pre(B Å R 1 )   B Given B µ Q, find the states from which some path visits B infinitely often. Buechi Graph

B...   B   B = ( Y) 9  (B Æ 9 pre(Y)) Given B µ Q, find the states from which some path visits B infinitely often. Buechi Graph

B...   B   B = ( Y) (  X) ((B Æ 9 pre(Y)) Ç 9 pre(X)) Given B µ Q, find the states from which some path visits B infinitely often. Buechi Graph

B  P1    B Given B µ Q, find the states from which player 1 has a strategy to force the game to B infinitely often. Buechi Game

B...  P1   B R 2 =  P1   1pre(B Å R 1 ) R 1 =  P1   1pre(B)  P1   B = ( Y) (  X) ((B Æ 1pre(Y)) Ç 1pre(X)) Given B µ Q, find the states from which player 1 has a strategy to force the game to B infinitely often. Buechi Game

Can we use the same value iteration scheme? Yes, iff the fixpoint expression computes correctly on all single-player (player 1 and player 2) structures. Reachability:9  p = (  X) (p Ç 9pre(X)) 8  p = (  X) (p Ç 8pre(X)) Hence:hhP1ii  p = (  X) (p Ç 1pre(X)) hhP2ii  p = (  X) (p Ç 2pre(X)) From Graphs to Games

Complexity of Turn-based Games 1.Reachability, safety: linear time (P-complete) 2.Buechi: quadratic time (optimal ???) 3.Parity: NP Å coNP (in P ???)

Complexity of Turn-based Games 1.Reachability, safety: linear time (P-complete) 2.Buechi: quadratic time (optimal ???) 3.Parity: NP Å coNP (in P ???) on graphs polynomial on graphs linear

Graph-based (finite-carrier) systems: Q = B m  = boolean formulas [e.g. BDDs] 9 pre = ( 9 x 2 B ) Timed and hybrid systems: Q = B m £ R n  = formulas of ( Q, ·,+) [e.g. polyhedral sets] 9 pre = ( 9 x 2 Q ) Beyond Graphs as Finite Carrier Sets

Q states  1,  2 moves of both players  : Q   1   2  Q transition function Concurrent Game

Q states  1,  2 moves of both players  : Q   1   2  Q transition function  = [ Q ! {0,1} ] regions with V = B 1pre:    q  1pre(R) iff (  1   1 ) (  2   2 )  (q,  1,  2 )  R Concurrent Game

Q states  1,  2 moves of both players  : Q   1   2  Q transition function  = [ Q ! {0,1} ] regions with V = B 1pre:    q  1pre(R) iff (  1   1 ) (  2   2 )  (q,  1,  2 )  R 2pre:    q  2pre(R) iff (  2   2 ) (  1   1 )  (q,  1,  2 )  R Concurrent Game

acb 1,11,21,11,2 2,12,22,12,2 1,11,22,21,11,22,2 2,12,1

acb 1,11,21,11,2 2,12,22,12,2 1,11,22,21,11,22,2 2,12,1 hhP2ii  c=(  X) ( c Ç 2pre(X) )

acb 1,11,21,11,2 2,12,22,12,2 1,11,22,21,11,22,2 2,12,1 Concurrent Game hhP2ii  c=(  X) ( c Ç 2pre(X) )

acb 1,11,21,11,2 2,12,22,12,2 1,11,22,21,11,22,2 2,12,1 Concurrent Game hhP2ii  c=(  X) ( c Ç 2pre(X) ) Pr(1): 0.5 Pr(2): 0.5

graph Extended Graph Models CONTROL: game graph OBJECTIVE:  -automaton PROBABILITIES: Markov decision process stochastic game  regular game CLOCKS: timed automaton stochastic hybrid system

Nondeterministic closed system. q1 q2 q3 Graph: 1 Player b a a

a Probabilistic closed system q1 q3 q2 q5 q4 MDP: 1.5 Players a b a c

Asynchronous open system. q1 q3 q2 q5 q4 Turn-based Game: 2 Players a b a c a

a Probabilistic asynchronous open system q1 q3 q2 q5 q4 q7 q6 Turn-based Stochastic Game: 2.5 Players cb c a b a

a aa q1 bb q2 q4q5q3 1,1 1,2 2,1 2,2 Concurrent Game Synchronous open system.

a aa q1 bb q2 q4q5q3 q2: 0.3 q3: 0.2 q4: 0.5 q5: q2: 0.1 q3: 0.1 q4: 0.5 q5: 0.3 q2: q3: 0.2 q4: 0.1 q5: 0.7 q2: 1.0 q3: q4: q5: Matrix game at each vertex. q1: Concurrent Stochastic Game Probabilistic synchronous open system.

Graph: nondeterministic generator of behaviors (possibly stochastic) Strategy: deterministic selector of behaviors (possibly randomized) Graph + Strategies for both players ! Behavior

Two pure strategies at q1: “left” and “right”. Two pure behaviors: ab; aa. Model = graph Pure behavior = path q1 q2 q3 b a a

Two pure strategies at q1: “left” and “right”. Two pure behaviors: {ab: 1}; {aac: 0.4, aaa: 0.6}. Model = MDP Pure behavior = probability distribution on paths = p-path a q1 q3 q2 q5 q4 a b a c

Model = turn-based game Pure behavior = path Two pure pl. 1 strategies at q1: “left” and “right”. Two pure pl. 2 strategies at q3: “left” and “right”. Three pure behaviors: ab; aac; aaa. q1 q3 q2 q5 q4 a b a c a

Model = turn-based game Pure behavior = path General (randomized) behavior = p-path Three pure behaviors: ab; aac; aaa. Infinitely many behaviors, e.g. {aac: 0.5, aaa: 0.5}. q1 q3 q2 q5 q4 a b a c a

The objective of each player is to find a strategy that optimizes the value of the resulting behavior. How do we define “value”? A. Assign a value to each path B. Assign a value to each behavior (expected value of A.) C. Assign a value to each state (strategy sup inf of B.)

A. Value of Paths Qualitative value function:  : Q  ! {0,1} e.g.  -regular subsets of Q 

B. Value of Behaviors path t:  (T) =  (t) p-path T:  (T) = Exp {  (T)} (expected value) Example: T = {aaa: 0.2, aab: 0.7, bbb: 0.1 } (} b)(T) = 0.8

C. Value of States hh1ii  (q) = sup x inf y  ( Outcome x,y (q) ) hh2ii  (q) = sup y inf x  ( Outcome x,y (q) )

Q states  1,  2 moves of both players  : Q   1   2  Dist(Q) probabilistic transition function  = [ Q ! [0,1] ] regions with V = [0,1] Concurrent Stochastic Game

Q states  1,  2 moves of both players  : Q   1   2  Dist(Q) probabilistic transition function  = [ Q ! [0,1] ] regions with V = [0,1] 1pre:    1pre(R)(q) = (sup  1   1 ) (inf  2   2 ) R(  (q,  1,  2 )) Concurrent Stochastic Game

Q states  1,  2 moves of both players  : Q   1   2  Dist(Q) probabilistic transition function  = [ Q ! [0,1] ] regions with V = [0,1] 1pre:    1pre(R)(q) = (sup  1   1 ) (inf  2   2 ) R(  (q,  1,  2 )) 2pre:    2pre(R)(q) = (sup  2   2 ) (inf  1   1 ) R(  (q,  1,  2 )) Concurrent Stochastic Game

acb Pl.1 Pl.2 a: 0.6 b: 0.4 a: 0.1 b: 0.9 a: 0.5 b: 0.5 a: 0.2 b: Pl.1 Pl.2 a: 0.0 c: 1.0 a: 0.7 c: 0.3 a: 0.0 c: 1.0 Concurrent Stochastic Game

acb Pl.1 Pl.2 a: 0.6 b: 0.4 a: 0.1 b: 0.9 a: 0.5 b: 0.5 a: 0.2 b: Pl.1 Pl.2 a: 0.0 c: 1.0 a: 0.7 c: 0.3 a: 0.0 c: 1.0 Concurrent Stochastic Game hhP1ii  c =(  X) max( c, 1pre(X) ) 0 10

acb Pl.1 Pl.2 a: 0.6 b: 0.4 a: 0.1 b: 0.9 a: 0.5 b: 0.5 a: 0.2 b: Pl.1 Pl.2 a: 0.0 c: 1.0 a: 0.7 c: 0.3 a: 0.0 c: 1.0 Concurrent Stochastic Game hhP1ii  c =(  X) max( c, 1pre(X) ) 0 11

acb Pl.1 Pl.2 a: 0.6 b: 0.4 a: 0.1 b: 0.9 a: 0.5 b: 0.5 a: 0.2 b: Pl.1 Pl.2 a: 0.0 c: 1.0 a: 0.7 c: 0.3 a: 0.0 c: 1.0 Concurrent Stochastic Game hhP1ii  c =(  X) max( c, 1pre(X) )

acb Pl.1 Pl.2 a: 0.6 b: 0.4 a: 0.1 b: 0.9 a: 0.5 b: 0.5 a: 0.2 b: Pl.1 Pl.2 a: 0.0 c: 1.0 a: 0.7 c: 0.3 a: 0.0 c: 1.0 Concurrent Stochastic Game hhP1ii  c =(  X) max( c, 1pre(X) )

acb Pl.1 Pl.2 a: 0.6 b: 0.4 a: 0.1 b: 0.9 a: 0.5 b: 0.5 a: 0.2 b: Pl.1 Pl.2 a: 0.0 c: 1.0 a: 0.7 c: 0.3 a: 0.0 c: 1.0 Concurrent Stochastic Game hhP1ii  c =(  X) max( c, 1pre(X) ) limit 1 11

Solving Games by Value Iteration Reachability / max:  Buechi / lim sup:  Parity:  … 

Solving Games by Value Iteration Reachability / max:  Buechi / lim sup:  Parity:  …  Many open questions: How do different evaluation orders compare? How fast do these algorithms converge? When are they optimal?

1.Number of players: 1, 1.5, 2, Alternation: turn-based or concurrent 3.Strategies: pure or randomized 4.Value of a path: qualitative (boolean) or quantitative (real) 5.Objective: Borel 1, 2, 3 6.Zero-sum vs. nonzero-sum Summary: Classification of Games

The two players have complementary path values:  2 (t) = 1 –  1 (t) -reachability vs. safety / max vs. min -Buechi vs. coBuechi / lim sup vs. lim inf -Rabin vs. Streett Main Theorem [Martin75, Martin98]: The concurrent stochastic games are determined for all Borel objectives, i.e., hh1ii  1 (q) + hh2ii  2 (q) = 1. sup inf = inf sup Summary: Zero-Sum Games

1.5 players 2 players 2.5 players concurrent parity CY98, dAl97: polynomial GH82, EJ88 dAM01 dAH00, CdAH06: NP Å coNP Summary: Zero-Sum Games

-optimal strategies may not exist -limit values may not be rational -  -close strategies, for fixed , may require infinite memory -no determinacy for pure strategies a aa q1 bb 1,1 1,2 2,1 2,2 hhP1ii (} a) (q1) = 0 hhP2ii (} b) (q1) = 0 Concurrent Games are Difficult

-optimal strategies always exist [McIver/Morgan] -in the non-stochastic case, pure finite-memory optimal strategies exist for  -regular objectives [Gurevich/Harrington] -for parity objectives, pure memoryless optimal strategies exist [Emerson/Jutla: non-stochastic Rabin; Condon: stochastic reachability; Chatterjee/deAlfaro/H: stochastic Rabin], hence NP Å coNP Turn-based Games are More Pleasant

-optimal strategies always exist [McIver/Morgan] -in the non-stochastic case, pure finite-memory optimal strategies exist for  -regular objectives [Gurevich/Harrington] -for parity objectives, pure memoryless optimal strategies exist [Emerson/Jutla: non-stochastic Rabin; Condon: stochastic reachability; Chatterjee/deAlfaro/H: stochastic Rabin], hence NP Å coNP If solvable in P is open for non-stochastic parity games and for stochastic reachability games. Turn-based Games are More Pleasant

Summary Verification and control are very special (boolean) cases of graph-based optimization problems. They can be generalized to solve questions that involve multiple players, quantitative resources, probabilistic transitions, and continuous state spaces. The theory and practice of this is still wide open …