Presentation is loading. Please wait.

Presentation is loading. Please wait.

Randomness for Free Laurent Doyen LSV, ENS Cachan & CNRS joint work with Krishnendu Chatterjee, Hugo Gimbert, Tom Henzinger.

Similar presentations


Presentation on theme: "Randomness for Free Laurent Doyen LSV, ENS Cachan & CNRS joint work with Krishnendu Chatterjee, Hugo Gimbert, Tom Henzinger."— Presentation transcript:

1 Randomness for Free Laurent Doyen LSV, ENS Cachan & CNRS joint work with Krishnendu Chatterjee, Hugo Gimbert, Tom Henzinger

2 Stochastic games on graphs Games for synthesis - Reactive system synthesis = finding a winning strategy in a two-player game Game played on a graph - Infinite number of rounds - Player’s moves determine successor state - Outcome = infinite path in the graph When is randomness more powerful ? When is randomness for free ?

3 Games for synthesis - Reactive system synthesis = finding a winning strategy in a game Game played on a graph - Infinite number of rounds - Player’s moves determine successor state - Outcome = infinite path in the graph When is randomness more powerful ? When is randomness for free ? … in game structures ? … in strategies ? Stochastic games on graphs

4 Interaction: how players' moves are combined. Information: what is visible to the players. Classification according to Information & Interaction Round 1 Stochastic games on graphs

5 Interaction: how players' moves are combined. Information: what is visible to the players. Classification according to Information & Interaction Round 2 Stochastic games on graphs

6 Interaction: how players' moves are combined. Information: what is visible to the players. Classification according to Information & Interaction Round 3 Stochastic games on graphs

7 Mode of interaction Classification according to Information & Interaction Interaction General case: concurrent & stochastic Player 1’s move Player 2’s move Probability distribution on successor state Players choose their moves simultaneously and independently -player games

8 Mode of interaction Classification according to Information & Interaction Interaction General case: concurrent & stochastic Player 1’s move Player 2’s move Probability distribution on successor state Players choose their moves simultaneously and independently -player games (MDP) if A 1 or A 2 is a singleton.

9 Mode of interaction Interaction General case: concurrent & stochastic Separation of concurrency & probabilities

10 Mode of interaction Interaction Special case: turn-based Player 1 state Player 2 state In each state, one player’s move determines successor

11 Knowledge Classification according to Information & Interaction In state, player sees such that Information General case: partial observation Two partitions and Player 1’s view Player 2’s view

12 Knowledge Information Special case 1: one-sided complete observation or Player 1’s view Player 2’s view

13 Knowledge Information Special case 2: complete observation and Player 1’s view Player 2’s view

14 Classification Classification according to Information & Interaction complete observation one-sided complete obs. partial observation turn-based concurrent Information Interaction -player games

15 Classification Classification according to Information & Interaction complete observation MDP one-sided complete obs. POMDP turn-based concurrent = Information Interaction -player games Markov Decision Process partial observation =

16 Strategies A strategy for Player is a function that maps histories to probability distribution over actions. Strategies are observation-based:

17 Strategies A strategy for Player is a function that maps histories to probability distribution over actions. Strategies are observation-based: Special case: pure strategies

18 Objectives An objective (for player 1) is a measurable set of infinite sequences of states:

19 Objectives An objective (for player 1) is a measurable set of infinite sequences of states: Examples: - Reachability, safety - Büchi, coBüchi - Parity - Borel BC BüchicoBüchi ReachabilitySafety

20 Value Probability of finite prefix of a play: induces a unique probability measure on measurable sets of plays: and a value function for Player 1:

21 Value Probability of finite prefix of a play: induces a unique probability measure on measurable sets of plays: and a value function for Player 1: Our reductions preserve values and existence of optimal strategies.

22 Outline Randomness in game structures - for free with complete-observation, concurrent - for free with one-sided, turn-based Randomness in strategies - for free in (PO)MDP Corollary: undecidability results

23 Preliminary Rational probabilitiesProbabilities only 1. Make all probabilistic states have two successors 1 2 3 4 1 2 3 4

24 Preliminary Rational probabilitiesProbabilities only 2. Use binary encoding of p and q-p to simulate [Zwick,Paterson 96] :

25 Preliminary Rational probabilitiesProbabilities only 2. Use binary encoding of p and q-p to simulate [Zwick,Paterson 96] :

26 Preliminary Rational probabilitiesProbabilities only 2. Use binary encoding of p and q-p to simulate [Zwick,Paterson 96] : Polynomial-time reduction, for parity objectives

27 Outline Randomness in game structure - for free with complete-observation, concurrent - for free with one-sided, turn-based Randomness in strategies - for free in (PO)MDP Corollary: undecidability results

28 Classification Classification according to Information & Interaction complete observation one-sided complete obs. partial observation turn-based concurrent Information Interaction

29 Reduction Simulate probabilistic state with concurrent state

30 Probability to go from s to s’ 0 : Reduction Simulate probabilistic state with concurrent state

31 Reduction Simulate probabilistic state with concurrent state Probability to go from s to s’ 0 : If, then

32 Probability to go from s to s’ 0 : If, then Reduction Simulate probabilistic state with concurrent state

33 Probability to go from s to s’ 0 : If, then Simulate probabilistic state with concurrent state Each player can unilaterally decide to simulate the original game. Reduction

34 Simulate probabilistic state with concurrent state Player 1 can obtain at least the value v(s) by playing all actions uniformly at random: v’(s) ≥ v(s) Player 2 can force the value to be at most v(s) by playing all actions uniformly at random: v’(s) ≤ v(s) Reduction

35 For {complete,one-sided,partial} observation, given a game with rational probabilities we can construct a concurrent game with deterministic transition function such that: and existence of optimal observation-based strategies is preserved. The reduction is in polynomial time for complete-observation parity games. Reduction

36 Information leak from the back edges… Partial information

37 Information leak from the back edges… Partial information No information leakage if all probabilities have same denominator q  use 1/q=gcd of all probabilities in G

38 Information leak from the back edges… Partial information No information leakage if all probabilities have same denominator q  use 1/q=gcd of all probabilities in G The reduction is then exponential.

39 Example

40 Outline Randomness in game structure - for free with complete-observation, concurrent - for free with one-sided, turn-based Randomness in strategies - for free in (PO)MDP Corollary: undecidability results

41 Overview Classification according to Information & Interaction complete observation one-sided complete obs. partial observation turn-based concurrent Information Interaction

42 Reduction Simulate probabilistic state with imperfect information turn-based states Player 1 states Player 2 state Player 1 observation

43 Reduction Each player can unilaterally decide to simulate the probabilistic state by playing uniformly at random: Player 2 chooses states (s,0), (s,1) unifiormly at random Player 1 chooses actions 0,1 unifiormly at random Simulate probabilistic state with imperfect information turn-based states

44 Games with rational probabilities can be reduced to turn- based-games with deterministic transitions and (at least) one-sided complete observation. Values and existence of optimal strategies are preserved. Reduction

45 Randomness for free In transition function (this talk):

46 When randomness is not for free Complete-information turn-based ( ) games - in deterministic games, value is either 0 or 1 [Martin98] - MDPs with reachability objective can have values in [0,1] Randomness is not for free. -player games (MDP & POMDP) - in deterministic partial info -player games, value is either 0 or 1 [see later] - MDPs have value in [0,1] Randomness is not for free.

47 Randomness for free In transition function (this talk):

48 Randomness for free In transition function (this talk): In strategies (Everett’57, Martin’98, CDHR’07):

49 Randomness in strategies Example - concurrent, complete observation - reachability Reminder: randomized strategy for Player : pure strategy for Player : Randomized strategies are more powerful

50 Randomness in strategies Example - turn-based, one-sided complete observation - reachability Randomized strategies are more powerful

51 Outline Randomness in game structure - for free with complete-observation, concurrent - for free with one-sided, turn-based Randomness in strategies - for free in (PO)MDP Corollary: undecidability results

52 (p) Randomness in strategies Randomized strategy probability p a b (1-p) RandStrategy(p) { x = rand(0…1) if x≤p then out(a) else out(b) }

53 (p) Randomness in strategies Randomized strategy probability p a b (1-p) Random generator Uniform probability x  [0,1] Pure strategy a b x ≤ p x > p

54 Randomness for free in strategies For every randomized observation-based strategy, there exists a pure observation-based strategy such that: 1½-player game (POMDP), s 0 initial state. Proof. (assume alphabet of size 2, and fan-out = 2) Given σ, we show that the value of σ can be obtained as the average of the value of pure strategies σ x :

55 Randomness for free in strategies Strategies σ x are obtained by «de-randomization» of σ Assume σ(s 1 )(a) = p and σ(s 1 )(b) = 1-p σ is equivalent to playing σ 0 with frequency p and σ 1 with frequency 1-p where:

56 Randomness for free in strategies Strategies σ x are obtained by «de-randomization» of σ Assume σ(s 1 )(a) = p and σ(s 1 )(b) = 1-p σ is equivalent to playing σ 0 with frequency p and σ 1 with frequency 1-p where:

57 Randomness for free in strategies Equivalently, toss a coin x  [0,1], play σ 0 if x ≤ p, and play σ 1 if x > p. Playing σ in G can be viewed as a sequence of coin tosses: s0s0 a b x ≤ p x > p toss x  [0,1] [ p = σ(s 1 )(a) ]

58 s1s1 Randomness for free in strategies Equivalently, toss a coin x  [0,1], play σ 0 if x ≤ p, and play σ 1 if x > p. Playing σ in G can be viewed as a sequence of coin tosses: s0s0 a b x ≤ p x > p y ≤ q a y > q a y ≤ q b y > q b s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 toss x  [0,1] toss y  [0,1] [ p = σ(s 1 )(a) ] [ q a = δ(s 0,a)(s 1 ) ]s’ [ q b = δ(s 0,b)(s 1 ) ]s’’’

59 Randomness for free in strategies Equivalently, toss a coin x  [0,1], play σ 0 if x ≤ p, and play σ 1 if x > p. Playing σ in G can be viewed as a sequence of coin tosses: s0s0 a b x ≤ p x > p y ≤ q a y > q a y ≤ q b y > q b … … s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 s1s1 toss x  [0,1] toss y  [0,1]toss x  [0,1] [ p = σ(s 1 )(a) ] [ q a = δ(s 1,a)(s 1 ) ]s’ [ q b = δ(s 1,b)(s 1 ) ]s’’’

60 Randomness for free in strategies Given an infinite sequence x=(x n ) n≥0  [0,1] ω, define for all s 0, s 1, …, s n : σ x is a pure and observation-based strategy ! σ x plays like σ, assuming that the result of the coin tosses is the sequence x. The value of σ is the « average of the outcome » of the strategies σ x.

61 Randomness for free in strategies Assume x=(x n ) n≥0 and y=(y n ) n≥0 are fixed. Then σ determines a unique outcome… which is the outcome of some pure strategy ! s0s0 a b x 0 ≤ p 0 x 0 > p 0 y 0 ≤ q 0a y 0 > q 0a y 0 ≤ q 0b y 0 > q 0b s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 s1s1 x 1 > p 1 y 1 > q 1b

62 Randomness for free in strategies s0s0 a b x 0 ≤ p x 0 > p y 0 ≤ q a y 0 > q a y 0 ≤ q b y 0 > q b … … s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 s1s1 Assume x=(x n ) n≥0 and y=(y n ) n≥0 are fixed. Let where

63 Randomness for free in strategies s0s0 a b x 0 ≤ p x 0 > p y 0 ≤ q a y 0 > q a y 0 ≤ q b y 0 > q b … … s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 s1s1 Assume x=(x n ) n≥0 and y=(y n ) n≥0 are fixed. Let where Theorem:

64 Randomness for free in strategies s0s0 a b x 0 ≤ p x 0 > p y 0 ≤ q a y 0 > q a y 0 ≤ q b y 0 > q b … … s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 s1s1 Assume x=(x n ) n≥0 and y=(y n ) n≥0 are fixed. Let where Playing σ in G is equivalent to choosing (x n ) and (y n ) uniformly at random, and then producing

65 Randomness for free in strategies s0s0 a b x 0 ≤ p x 0 > p y 0 ≤ q a y 0 > q a y 0 ≤ q b y 0 > q b … … s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 s1s1 Assume x=(x n ) n≥0 and y=(y n ) n≥0 are fixed. Let where Playing σ in G is equivalent to choosing (x n ) and (y n ) uniformly at random, and then producing The random sequences (x n ) and (y n ) can be generated separately, and independently ! σ x is a pure strategy ! pure strategy

66 Proof The value of σ is the « average of the outcome » of the strategies σ x.

67 Proof The value of σ is the « average of the outcome » of the strategies σ x. By Fubini’s theorem…

68 Proof The value of σ is the « average of the outcome » of the strategies σ x. Pure and randomized strategies are equally powerful in POMDPs !

69 Randomness for free In transition function: In strategies:

70 Outline Randomness in game structure - for free with complete-observation, concurrent - for free with one-sided, turn-based Randomness in strategies - for free in (PO)MDP Corollary: undecidability results

71 Corollary Almost-sure coBüchi (and positive Büchi) with randomized (or pure) strategies is undecidable for POMDPs. Using [BaierBertrandGrößer’08]: Using randomness for free in transition functions : Almost-sure coBüchi (and positive Büchi) is undecidable for deterministic turn-based one-sided complete observation games.

72 Thank you ! Questions ?


Download ppt "Randomness for Free Laurent Doyen LSV, ENS Cachan & CNRS joint work with Krishnendu Chatterjee, Hugo Gimbert, Tom Henzinger."

Similar presentations


Ads by Google