Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger
Graph Models of Systems vertices = states edges = transitions paths = behaviors
graph Extended Graph Models CONTROL: game graph OBJECTIVE: -automaton PROBABILITIES: Markov decision process stochastic game regular game CLOCKS: timed automaton stochastic hybrid system
Graphs vs. Games a ba a b a
Games model Open Systems Two players: environment / controller / input vs. system / plant / output Multiple players: processes / components / agents Stochastic players: nature / randomized algorithms
Example P1: init x := 0 loop choice | x := x+1 mod 2 | x := 0 end choice end loop 1 : ( x = y ) P2: init y := 0 loop choice | y := x | y := x+1 mod 2 end choice end loop 2 : ( y = 0 )
Graph Questions 8 ( x = y ) 9 ( x = y ) CTL
Graph Questions 8 ( x = y ) 9 ( x = y ) X CTL
Zero-Sum Game Questions hhP1ii ( x = y ) hhP2ii ( y = 0 ) ATL [Alur/H/Kupferman]
Zero-Sum Game Questions hhP1ii ( x = y ) hhP2ii ( y = 0 ) ATL [Alur/H/Kupferman]
Zero-Sum Game Questions hhP1ii ( x = y ) hhP2ii ( y = 0 ) ATL [Alur/H/Kupferman] X
Zero-Sum Game Questions hhP1ii ( x = y ) hhP2ii ( y = 0 ) ATL [Alur/H/Kupferman] X
Nonzero-Sum Game Questions hhP1ii ( x = y ) hhP2ii ( y = 0 ) Secure equilibra [Chatterjee/H/Jurdzinski]
Nonzero-Sum Game Questions hhP1ii ( x = y ) hhP2ii ( y = 0 ) Secure equilibra [Chatterjee/H/Jurdzinski]
Strategies Strategies x,y: Q * ! Q From a state q, a pair (x,y) of a player-1 strategy x2 1 and a player-2 strategy y2 2 gives a unique infinite path Outcome x,y (q) 2 Q .
Strategies hhP1ii 1 = (9 x2 1 ) (8 y2 2 ) 1 (x,y) Short for: q ² hhP1ii 1 iff (9 x2 1 ) (8 y2 2 ) ( Outcome x,y (q) ² 1 ) Strategies x,y: Q * ! Q From a state q, a pair (x,y) of a player-1 strategy x2 1 and a player-2 strategy y2 2 gives a unique infinite path Outcome x,y (q) 2 Q .
Strategies hhP1ii 1 = (9 x2 1 ) (8 y2 2 ) 1 (x,y) hhP1ii 1 hhP2ii 2 = (9 x2 1 ) (9 y2 2 ) [ ( 1 Æ 2 )(x,y) Æ (8 y’2 2 ) ( 2 ! 1 )(x,y’) Æ (8 x’2 1 ) ( 2 ! 1 )(x,y) ] Strategies x,y: Q * ! Q From a state q, a pair (x,y) of a player-1 strategy x2 1 and a player-2 strategy y2 2 gives a unique infinite path Outcome x,y (q) 2 Q .
Objectives and 2 Qualitative: reachability; Buechi; parity ( -regular) Quantitative: max; lim sup; lim avg
Reachability} a Safety a= :}: a Normal Forms of -Regular Sets Borel-1
Reachability} a Safety a= :}: a Buechi } a coBuechi} a = : }: a Normal Forms of -Regular Sets Borel-1 Borel-2
Reachability} a Safety a= :}: a Buechi } a coBuechi} a = : }: a StreettÆ ( } a ! } b ) = Æ ( } : a Ç } b ) RabinÇ ( } a Æ } b ) Parity: complement-closed subset of Streett/Rabin Normal Forms of -Regular Sets Borel-1 Borel-2 Borel-2.5
Buechi Game q4q4 q0q0 q2q2 q1q1 q3q3 G B
q4q4 q0q0 q2q2 q1q1 q3q3 G B Secure equilibrium (x,y) at q 0 : x: if q 1 ! q 0, then q 2 else q 4. y: if q 3 ! q 1, then q 0 else q 4. Strategies require memory.
Zero-Sum Games: Determinacy W1W1 W2W2 1 = : 2 hhP2ii 2 hhP1ii 1
Nonzero-sum Games W 10 hhP1ii ( 1 Æ : 2 ) W 01 hhP2ii ( 2 Æ : 1 ) W 11 W 00 hhP1ii 1 hhP2ii 2
Objectives Qualitative: reachability; Buchi; parity ( -regular) Quantitative: max; lim sup; lim avg
Objectives Qualitative: reachability; Buchi; parity ( -regular) Quantitative: max; lim sup; lim avg Borel-1 Borel-2 Borel-3
Quantitative Games hhP1ii lim sup hhP1ii lim avg
Quantitative Games hhP1ii lim sup = 3 hhP1ii lim avg
Quantitative Games hhP1ii lim sup = 3 hhP1ii lim avg =
Solving Games by Value Iteration Generalization of the -calculus: computing fixpoints of transfer functions (pre; post). Generalization of dynamic programming: iterative optimization. q Region R: Q ! V q’ R(q’)
Solving Games by Value Iteration Generalization of the -calculus: computing fixpoints of transfer functions (pre; post). Generalization of dynamic programming: iterative optimization. q Region R: Q ! V q’ R(q’) R(q) := pre(R(q’))
Q states transition labels : Q Q transition function Graph
Q states transition labels : Q Q transition function = [ Q ! {0,1} ] regions with V = B 9 pre: q 9 pre(R) iff ( ) (q, ) R 8 pre: q 8 pre(R) iff ( ) (q, ) R Graph
acb 9 c =( X) ( c Ç 9pre(X) )
acb Graph 9 c =( X) ( c Ç 9pre(X) )
acb Graph 9 c =( X) ( c Ç 9pre(X) )
acb Graph 9 c =( X) ( c Ç 9pre(X) ) 8 c=( X) ( c Ç 8pre(X) )
Graph Reachability R Given R µ Q, find the states from which some path leads to R. R
R R [ pre(R) R = ( X) (R Ç 9 pre(X)) Given R µ Q, find the states from which some path leads to R. Graph Reachability
R R [ pre(R) R [ pre(R) [ pre 2 (R) R = ( X) (R Ç 9 pre(X)) Given R µ Q, find the states from which some path leads to R. Graph Reachability
R... RR R [ pre(R) R [ pre(R) [ pre 2 (R) R = ( X) (R Ç 9 pre(X)) Given R µ Q, find the states from which some path leads to R. Graph Reachability
R... RR R [ pre(R) R [ pre(R) [ pre 2 (R) R = ( X) (R Ç 8 pre(X)) Given R µ Q, find the states from which all paths lead to R. Graph Reachability
Value Iteration Algorithms consist of A.LOCAL PART: 9pre and 8pre computation B.GLOBAL PART: evaluation of a fixpoint expression We need to generalize both parts to solve games.
Q 1, Q 2 states( Q = Q 1 [ Q 2 ) transition labels : Q Q transition function Turn-based Game
Q 1, Q 2 states( Q = Q 1 [ Q 2 ) transition labels : Q Q transition function = [ Q ! {0,1} ] regions with V = B 1pre: q 1pre(R) iff q 2 Q 1 Æ ( ) (q, ) R or q 2 Q 2 Æ ( 8 2 ) (q, ) 2 R Turn-based Game
Q 1, Q 2 states( Q = Q 1 [ Q 2 ) transition labels : Q Q transition function = [ Q ! {0,1} ] regions with V = B 1pre: q 1pre(R) iff q 2 Q 1 Æ ( ) (q, ) R or q 2 Q 2 Æ ( 8 2 ) (q, ) 2 R 2pre: q 2pre(R) iff q 2 Q 1 Æ ( 8 ) (q, ) R or q 2 Q 2 Æ ( 9 2 ) (q, ) 2 R Turn-based Game
c ab
c ab hhP1ii c =( X) ( c Ç 1pre(X) )
c Turn-based Game ab hhP1ii c =( X) ( c Ç 1pre(X) )
c Turn-based Game ab hhP1ii c =( X) ( c Ç 1pre(X) ) hhP2ii c=( X) ( c Ç 2pre(X) )
c Turn-based Game ab hhP1ii c =( X) ( c Ç 1pre(X) ) hhP2ii c=( X) ( c Ç 2pre(X) )
c Turn-based Game ab hhP1ii c =( X) ( c Ç 1pre(X) ) hhP2ii c=( X) ( c Ç 2pre(X) )
R P1 R Given R µ Q, find the states from which player 1 has a strategy to force the game to R. Reachability Game
R R [ 1pre(R) P1 R Given R µ Q, find the states from which player 1 has a strategy to force the game to R. Reachability Game
R R [ 1pre(R) R [ 1pre(R) [ 1pre 2 (R) P1 R Given R µ Q, find the states from which player 1 has a strategy to force the game to R. Reachability Game
R... 1 R R [ 1pre(R) R [ 1pre(R) [ 1pre 2 (R) P1 R = ( X) (R Ç 1pre(X)) Given R µ Q, find the states from which player 1 has a strategy to force the game to R. Reachability Game
P1 R Given R µ Q, find the states from which player 1 has a strategy to keep the game in R. R Safety Game
R \ 1pre(R) P1 R Given R µ Q, find the states from which player 1 has a strategy to keep the game in R. R Safety Game
R \ 1pre(R) R \ 1pre(R) \ 1pre 2 (R) P1 R Given R µ Q, find the states from which player 1 has a strategy to keep the game in R. R Safety Game
... 1 R R \ 1pre(R) R \ 1pre(R) \ 1pre 2 (R) P1 R = ( X) (R Æ 1pre(X)) Given R µ Q, find the states from which player 1 has a strategy to keep the game in R. R Safety Game
Q 1, Q 2 states( Q = Q 1 [ Q 2 ) transition labels : Q N £ Q transition function Quantitative Game
Q 1, Q 2 states( Q = Q 1 [ Q 2 ) transition labels : Q N £ Q transition function = [ Q ! N ] regions with V = N 1pre: 1pre(R)(q) = (max ) max( 1 (q, ), R( 2 (q, )) ) if q 2 Q 1 (min 2 ) max( 1 (q, ), R( 2 (q, )) ) if q 2 Q 2 Quantitative Game
Q 1, Q 2 states( Q = Q 1 [ Q 2 ) transition labels : Q N £ Q transition function = [ Q ! N ] regions with V = N 1pre: 1pre(R)(q) = (max ) max( 1 (q, ), R( 2 (q, )) ) if q 2 Q 1 (min 2 ) max( 1 (q, ), R( 2 (q, )) ) if q 2 Q 2 2pre: 2pre(R)(q) = (min ) max( 1 (q, ), R( (q, )) ) if q 2 Q 1 (max 2 ) max( 1 (q, ), R( 2 (q, )) ) if q 2 Q 2 Quantitative Game
c Maximizing Game ab
c ab hhP1ii 0 =( X) max( 0, 1pre(X) )
c Maximizing Game ab hhP1ii 0 =( X) max( 0, 1pre(X) )
c Maximizing Game ab hhP1ii 0 =( X) max( 0, 1pre(X) )
c Maximizing Game ab hhP1ii 0 =( X) max( 0, 1pre(X) )
B B Given B µ Q, find the states from which some path visits B infinitely often. Buechi Graph
B R 1 = pre(B)... pre(B) pre(B) [ pre 2 (B) B Given B µ Q, find the states from which some path visits B infinitely often. Buechi Graph
B R 1 = pre(B) B Given B µ Q, find the states from which some path visits B infinitely often. Buechi Graph
B R 1 = pre(B) R 2 = pre(B Å R 1 ) B Given B µ Q, find the states from which some path visits B infinitely often. Buechi Graph
B... B B = ( Y) 9 (B Æ 9 pre(Y)) Given B µ Q, find the states from which some path visits B infinitely often. Buechi Graph
B... B B = ( Y) ( X) ((B Æ 9 pre(Y)) Ç 9 pre(X)) Given B µ Q, find the states from which some path visits B infinitely often. Buechi Graph
B P1 B Given B µ Q, find the states from which player 1 has a strategy to force the game to B infinitely often. Buechi Game
B... P1 B R 2 = P1 1pre(B Å R 1 ) R 1 = P1 1pre(B) P1 B = ( Y) ( X) ((B Æ 1pre(Y)) Ç 1pre(X)) Given B µ Q, find the states from which player 1 has a strategy to force the game to B infinitely often. Buechi Game
Can we use the same value iteration scheme? Yes, iff the fixpoint expression computes correctly on all single-player (player 1 and player 2) structures. Reachability:9 p = ( X) (p Ç 9pre(X)) 8 p = ( X) (p Ç 8pre(X)) Hence:hhP1ii p = ( X) (p Ç 1pre(X)) hhP2ii p = ( X) (p Ç 2pre(X)) From Graphs to Games
Complexity of Turn-based Games 1.Reachability, safety: linear time (P-complete) 2.Buechi: quadratic time (optimal ???) 3.Parity: NP Å coNP (in P ???)
Complexity of Turn-based Games 1.Reachability, safety: linear time (P-complete) 2.Buechi: quadratic time (optimal ???) 3.Parity: NP Å coNP (in P ???) on graphs polynomial on graphs linear
Graph-based (finite-carrier) systems: Q = B m = boolean formulas [e.g. BDDs] 9 pre = ( 9 x 2 B ) Timed and hybrid systems: Q = B m £ R n = formulas of ( Q, ·,+) [e.g. polyhedral sets] 9 pre = ( 9 x 2 Q ) Beyond Graphs as Finite Carrier Sets
Q states 1, 2 moves of both players : Q 1 2 Q transition function Concurrent Game
Q states 1, 2 moves of both players : Q 1 2 Q transition function = [ Q ! {0,1} ] regions with V = B 1pre: q 1pre(R) iff ( 1 1 ) ( 2 2 ) (q, 1, 2 ) R Concurrent Game
Q states 1, 2 moves of both players : Q 1 2 Q transition function = [ Q ! {0,1} ] regions with V = B 1pre: q 1pre(R) iff ( 1 1 ) ( 2 2 ) (q, 1, 2 ) R 2pre: q 2pre(R) iff ( 2 2 ) ( 1 1 ) (q, 1, 2 ) R Concurrent Game
acb 1,11,21,11,2 2,12,22,12,2 1,11,22,21,11,22,2 2,12,1
acb 1,11,21,11,2 2,12,22,12,2 1,11,22,21,11,22,2 2,12,1 hhP2ii c=( X) ( c Ç 2pre(X) )
acb 1,11,21,11,2 2,12,22,12,2 1,11,22,21,11,22,2 2,12,1 Concurrent Game hhP2ii c=( X) ( c Ç 2pre(X) )
acb 1,11,21,11,2 2,12,22,12,2 1,11,22,21,11,22,2 2,12,1 Concurrent Game hhP2ii c=( X) ( c Ç 2pre(X) ) Pr(1): 0.5 Pr(2): 0.5
graph Extended Graph Models CONTROL: game graph OBJECTIVE: -automaton PROBABILITIES: Markov decision process stochastic game regular game CLOCKS: timed automaton stochastic hybrid system
Nondeterministic closed system. q1 q2 q3 Graph: 1 Player b a a
a Probabilistic closed system q1 q3 q2 q5 q4 MDP: 1.5 Players a b a c
Asynchronous open system. q1 q3 q2 q5 q4 Turn-based Game: 2 Players a b a c a
a Probabilistic asynchronous open system q1 q3 q2 q5 q4 q7 q6 Turn-based Stochastic Game: 2.5 Players cb c a b a
a aa q1 bb q2 q4q5q3 1,1 1,2 2,1 2,2 Concurrent Game Synchronous open system.
a aa q1 bb q2 q4q5q3 q2: 0.3 q3: 0.2 q4: 0.5 q5: q2: 0.1 q3: 0.1 q4: 0.5 q5: 0.3 q2: q3: 0.2 q4: 0.1 q5: 0.7 q2: 1.0 q3: q4: q5: Matrix game at each vertex. q1: Concurrent Stochastic Game Probabilistic synchronous open system.
Graph: nondeterministic generator of behaviors (possibly stochastic) Strategy: deterministic selector of behaviors (possibly randomized) Graph + Strategies for both players ! Behavior
Two pure strategies at q1: “left” and “right”. Two pure behaviors: ab; aa. Model = graph Pure behavior = path q1 q2 q3 b a a
Two pure strategies at q1: “left” and “right”. Two pure behaviors: {ab: 1}; {aac: 0.4, aaa: 0.6}. Model = MDP Pure behavior = probability distribution on paths = p-path a q1 q3 q2 q5 q4 a b a c
Model = turn-based game Pure behavior = path Two pure pl. 1 strategies at q1: “left” and “right”. Two pure pl. 2 strategies at q3: “left” and “right”. Three pure behaviors: ab; aac; aaa. q1 q3 q2 q5 q4 a b a c a
Model = turn-based game Pure behavior = path General (randomized) behavior = p-path Three pure behaviors: ab; aac; aaa. Infinitely many behaviors, e.g. {aac: 0.5, aaa: 0.5}. q1 q3 q2 q5 q4 a b a c a
The objective of each player is to find a strategy that optimizes the value of the resulting behavior. How do we define “value”? A. Assign a value to each path B. Assign a value to each behavior (expected value of A.) C. Assign a value to each state (strategy sup inf of B.)
A. Value of Paths Qualitative value function: : Q ! {0,1} e.g. -regular subsets of Q
B. Value of Behaviors path t: (T) = (t) p-path T: (T) = Exp { (T)} (expected value) Example: T = {aaa: 0.2, aab: 0.7, bbb: 0.1 } (} b)(T) = 0.8
C. Value of States hh1ii (q) = sup x inf y ( Outcome x,y (q) ) hh2ii (q) = sup y inf x ( Outcome x,y (q) )
Q states 1, 2 moves of both players : Q 1 2 Dist(Q) probabilistic transition function = [ Q ! [0,1] ] regions with V = [0,1] Concurrent Stochastic Game
Q states 1, 2 moves of both players : Q 1 2 Dist(Q) probabilistic transition function = [ Q ! [0,1] ] regions with V = [0,1] 1pre: 1pre(R)(q) = (sup 1 1 ) (inf 2 2 ) R( (q, 1, 2 )) Concurrent Stochastic Game
Q states 1, 2 moves of both players : Q 1 2 Dist(Q) probabilistic transition function = [ Q ! [0,1] ] regions with V = [0,1] 1pre: 1pre(R)(q) = (sup 1 1 ) (inf 2 2 ) R( (q, 1, 2 )) 2pre: 2pre(R)(q) = (sup 2 2 ) (inf 1 1 ) R( (q, 1, 2 )) Concurrent Stochastic Game
acb Pl.1 Pl.2 a: 0.6 b: 0.4 a: 0.1 b: 0.9 a: 0.5 b: 0.5 a: 0.2 b: Pl.1 Pl.2 a: 0.0 c: 1.0 a: 0.7 c: 0.3 a: 0.0 c: 1.0 Concurrent Stochastic Game
acb Pl.1 Pl.2 a: 0.6 b: 0.4 a: 0.1 b: 0.9 a: 0.5 b: 0.5 a: 0.2 b: Pl.1 Pl.2 a: 0.0 c: 1.0 a: 0.7 c: 0.3 a: 0.0 c: 1.0 Concurrent Stochastic Game hhP1ii c =( X) max( c, 1pre(X) ) 0 10
acb Pl.1 Pl.2 a: 0.6 b: 0.4 a: 0.1 b: 0.9 a: 0.5 b: 0.5 a: 0.2 b: Pl.1 Pl.2 a: 0.0 c: 1.0 a: 0.7 c: 0.3 a: 0.0 c: 1.0 Concurrent Stochastic Game hhP1ii c =( X) max( c, 1pre(X) ) 0 11
acb Pl.1 Pl.2 a: 0.6 b: 0.4 a: 0.1 b: 0.9 a: 0.5 b: 0.5 a: 0.2 b: Pl.1 Pl.2 a: 0.0 c: 1.0 a: 0.7 c: 0.3 a: 0.0 c: 1.0 Concurrent Stochastic Game hhP1ii c =( X) max( c, 1pre(X) )
acb Pl.1 Pl.2 a: 0.6 b: 0.4 a: 0.1 b: 0.9 a: 0.5 b: 0.5 a: 0.2 b: Pl.1 Pl.2 a: 0.0 c: 1.0 a: 0.7 c: 0.3 a: 0.0 c: 1.0 Concurrent Stochastic Game hhP1ii c =( X) max( c, 1pre(X) )
acb Pl.1 Pl.2 a: 0.6 b: 0.4 a: 0.1 b: 0.9 a: 0.5 b: 0.5 a: 0.2 b: Pl.1 Pl.2 a: 0.0 c: 1.0 a: 0.7 c: 0.3 a: 0.0 c: 1.0 Concurrent Stochastic Game hhP1ii c =( X) max( c, 1pre(X) ) limit 1 11
Solving Games by Value Iteration Reachability / max: Buechi / lim sup: Parity: …
Solving Games by Value Iteration Reachability / max: Buechi / lim sup: Parity: … Many open questions: How do different evaluation orders compare? How fast do these algorithms converge? When are they optimal?
1.Number of players: 1, 1.5, 2, Alternation: turn-based or concurrent 3.Strategies: pure or randomized 4.Value of a path: qualitative (boolean) or quantitative (real) 5.Objective: Borel 1, 2, 3 6.Zero-sum vs. nonzero-sum Summary: Classification of Games
The two players have complementary path values: 2 (t) = 1 – 1 (t) -reachability vs. safety / max vs. min -Buechi vs. coBuechi / lim sup vs. lim inf -Rabin vs. Streett Main Theorem [Martin75, Martin98]: The concurrent stochastic games are determined for all Borel objectives, i.e., hh1ii 1 (q) + hh2ii 2 (q) = 1. sup inf = inf sup Summary: Zero-Sum Games
1.5 players 2 players 2.5 players concurrent parity CY98, dAl97: polynomial GH82, EJ88 dAM01 dAH00, CdAH06: NP Å coNP Summary: Zero-Sum Games
-optimal strategies may not exist -limit values may not be rational - -close strategies, for fixed , may require infinite memory -no determinacy for pure strategies a aa q1 bb 1,1 1,2 2,1 2,2 hhP1ii (} a) (q1) = 0 hhP2ii (} b) (q1) = 0 Concurrent Games are Difficult
-optimal strategies always exist [McIver/Morgan] -in the non-stochastic case, pure finite-memory optimal strategies exist for -regular objectives [Gurevich/Harrington] -for parity objectives, pure memoryless optimal strategies exist [Emerson/Jutla: non-stochastic Rabin; Condon: stochastic reachability; Chatterjee/deAlfaro/H: stochastic Rabin], hence NP Å coNP Turn-based Games are More Pleasant
-optimal strategies always exist [McIver/Morgan] -in the non-stochastic case, pure finite-memory optimal strategies exist for -regular objectives [Gurevich/Harrington] -for parity objectives, pure memoryless optimal strategies exist [Emerson/Jutla: non-stochastic Rabin; Condon: stochastic reachability; Chatterjee/deAlfaro/H: stochastic Rabin], hence NP Å coNP If solvable in P is open for non-stochastic parity games and for stochastic reachability games. Turn-based Games are More Pleasant
Summary Verification and control are very special (boolean) cases of graph-based optimization problems. They can be generalized to solve questions that involve multiple players, quantitative resources, probabilistic transitions, and continuous state spaces. The theory and practice of this is still wide open …