Randomness for Free Laurent Doyen LSV, ENS Cachan & CNRS joint work with Krishnendu Chatterjee, Hugo Gimbert, Tom Henzinger.

Slides:

Advertisements

Similar presentations

Uri Zwick Tel Aviv University Simple Stochastic Games Mean Payoff Games Parity Games.

Advertisements

Winning concurrent reachability games requires doubly-exponential patience Michal Koucký IM AS CR, Prague Kristoffer Arnsfelt Hansen, Peter Bro Miltersen.

Markov Decision Process

Distributed Markov Chains P S Thiagarajan School of Computing, National University of Singapore Joint work with Madhavan Mukund, Sumit K Jha and Ratul.

Partially Observable Markov Decision Process (POMDP)

Energy and Mean-Payoff Parity Markov Decision Processes Laurent Doyen LSV, ENS Cachan & CNRS Krishnendu Chatterjee IST Austria MFCS 2011.

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

Krishnendu Chatterjee1 Graph Games with Reachabillity Objectives: Mixing Chess, Soccer and Poker Krishnendu Chatterjee 5 th Workshop on Reachability Problems,

Krishnendu Chatterjee1 Partial-information Games with Reachability Objectives Krishnendu Chatterjee Formal Methods for Robotics and Automation July 15,

Online Vertex-Coloring Games in Random Graphs Revisited Reto Spöhel (joint work with Torsten Mütze and Thomas Rast; appeared at SODA ’11)

Alpaga A Tool for Solving Parity Games with Imperfect Information Dietmar Berwanger 1 Krishnendu Chatterjee 2 Martin De Wulf 3 Laurent Doyen 3,4 Tom Henzinger.

MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.

Randomized Algorithms Kyomin Jung KAIST Applied Algorithm Lab Jan 12, WSAC

Interchanging distance and capacity in probabilistic mappings Uriel Feige Weizmann Institute.

Synchronizing Words for Probabilistic Automata Laurent Doyen LSV, ENS Cachan & CNRS Thierry Massart, Mahsa Shirmohammadi Université Libre de Bruxelles.

An Introduction to Markov Decision Processes Sarah Hickmott

Planning under Uncertainty

Rational Learning Leads to Nash Equilibrium Ehud Kalai and Ehud Lehrer Econometrica, Vol. 61 No. 5 (Sep 1993), Presented by Vincent Mak

Krishnendu Chatterjee and Laurent Doyen. Energy parity games are inﬁnite two-player turn-based games played on weighted graphs. The objective of the game.

SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.

Stochastic Zero-sum and Nonzero-sum  -regular Games A Survey of Results Krishnendu Chatterjee Chess Review May 11, 2005.

Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger.

Stochastic Games Games played on graphs with stochastic transitions Markov decision processes Games against nature Turn-based games Games against adversary.

25/06/2015Marius Mikucionis, AAU SSE1/22 Principles and Methods of Testing Finite State Machines – A Survey David Lee, Senior Member, IEEE and Mihalis.

Coloring random graphs online without creating monochromatic subgraphs Torsten Mütze, ETH Zürich Joint work with Thomas Rast (ETH Zürich) and Reto Spöhel.

Markov Decision Processes

DANSS Colloquium By Prof. Danny Dolev Presented by Rica Gonen

Variable-Length Codes: Huffman Codes

Theory of Computing Lecture 22 MAS 714 Hartmut Klauck.

Department of Computer Science Undergraduate Events More

Stochastic Games Krishnendu Chatterjee CS 294 Game Theory.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.

Alternating-Offers Bargaining under One-Sided Uncertainty on Deadlines Francesco Di Giunta and Nicola Gatti Dipartimento di Elettronica e Informazione.

Quantitative Languages Krishnendu Chatterjee, UCSC Laurent Doyen, EPFL Tom Henzinger, EPFL CSL 2008.

MAKING COMPLEX DEClSlONS

Energy Parity Games Laurent Doyen LSV, ENS Cachan & CNRS Krishnendu Chatterjee IST Austria.

Zvi Kohavi and Niraj K. Jha 1 Memory, Definiteness, and Information Losslessness of Finite Automata.

Algorithmic Software Verification III. Finite state games and pushdown automata.

Uri Zwick Tel Aviv University Simple Stochastic Games Mean Payoff Games Parity Games TexPoint fonts used in EMF. Read the TexPoint manual before you delete.

Epistemic Strategies and Games on Concurrent Processes Prakash Panangaden: Oxford University (on leave from McGill University). Joint work with Sophia.

Games with Secure Equilibria Krishnendu Chatterjee (Berkeley) Thomas A. Henzinger (EPFL) Marcin Jurdzinski (Warwick)

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto.

Part 3 Linear Programming

1 What is Game Theory About? r Analysis of situations where conflict of interests is present r Goal is to prescribe how conflicts can be resolved 2 2 r.

Umans Complexity Theory Lectures Lecture 7b: Randomization in Communication Complexity.

Reinforcement Learning

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.

Chapter 6 Large Random Samples Weiqi Luo ( 骆伟祺 ) School of Data & Computer Science Sun Yat-Sen University ：

Space Complexity. Reminder: P, NP classes P is the class of problems that can be solved with algorithms that runs in polynomial time NP is the class of.

Simulation Discrete Variables. What is it? A mathematical model Probabilistic Uses the entire range of possible values of a variable in the model.

On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA Twenty.

Ch 6. Markov Random Fields 6.1 ~ 6.3 Adaptive Cooperative Systems, Martin Beckerman, Summarized by H.-W. Lim Biointelligence Laboratory, Seoul National.

1 Markov Decision Processes Finite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

Krishnendu ChatterjeeFormal Methods Class1 MARKOV CHAINS.

Chapters 11 and 12 Decision Problems and Undecidability.

PROBABILITY AND COMPUTING RANDOMIZED ALGORITHMS AND PROBABILISTIC ANALYSIS CHAPTER 1 IWAMA and ITO Lab. M1 Sakaidani Hikaru 1.

Making complex decisions

The Multiple Dimensions of Mean-Payoff Games

Path Coupling And Approximate Counting

Stochastic -Regular Games

Markov Decision Processes

Turnstile Streaming Algorithms Might as Well Be Linear Sketches

Markov Decision Processes

Alternating tree Automata and Parity games

Uri Zwick Tel Aviv University

Quantitative Modeling, Verification, and Synthesis

Reinforcement Learning Dealing with Partial Observability

Presentation transcript:

Randomness for Free Laurent Doyen LSV, ENS Cachan & CNRS joint work with Krishnendu Chatterjee, Hugo Gimbert, Tom Henzinger

Stochastic games on graphs Games for synthesis - Reactive system synthesis = finding a winning strategy in a two-player game Game played on a graph - Infinite number of rounds - Player’s moves determine successor state - Outcome = infinite path in the graph When is randomness more powerful ? When is randomness for free ?

Games for synthesis - Reactive system synthesis = finding a winning strategy in a game Game played on a graph - Infinite number of rounds - Player’s moves determine successor state - Outcome = infinite path in the graph When is randomness more powerful ? When is randomness for free ? … in game structures ? … in strategies ? Stochastic games on graphs

Interaction: how players' moves are combined. Information: what is visible to the players. Classification according to Information & Interaction Round 1 Stochastic games on graphs

Interaction: how players' moves are combined. Information: what is visible to the players. Classification according to Information & Interaction Round 2 Stochastic games on graphs

Interaction: how players' moves are combined. Information: what is visible to the players. Classification according to Information & Interaction Round 3 Stochastic games on graphs

Mode of interaction Classification according to Information & Interaction Interaction General case: concurrent & stochastic Player 1’s move Player 2’s move Probability distribution on successor state Players choose their moves simultaneously and independently -player games

Mode of interaction Classification according to Information & Interaction Interaction General case: concurrent & stochastic Player 1’s move Player 2’s move Probability distribution on successor state Players choose their moves simultaneously and independently -player games (MDP) if A 1 or A 2 is a singleton.

Mode of interaction Interaction General case: concurrent & stochastic Separation of concurrency & probabilities

Mode of interaction Interaction Special case: turn-based Player 1 state Player 2 state In each state, one player’s move determines successor

Knowledge Classification according to Information & Interaction In state, player sees such that Information General case: partial observation Two partitions and Player 1’s view Player 2’s view

Knowledge Information Special case 1: one-sided complete observation or Player 1’s view Player 2’s view

Knowledge Information Special case 2: complete observation and Player 1’s view Player 2’s view

Classification Classification according to Information & Interaction complete observation one-sided complete obs. partial observation turn-based concurrent Information Interaction -player games

Classification Classification according to Information & Interaction complete observation MDP one-sided complete obs. POMDP turn-based concurrent = Information Interaction -player games Markov Decision Process partial observation =

Strategies A strategy for Player is a function that maps histories to probability distribution over actions. Strategies are observation-based:

Strategies A strategy for Player is a function that maps histories to probability distribution over actions. Strategies are observation-based: Special case: pure strategies

Objectives An objective (for player 1) is a measurable set of infinite sequences of states:

Objectives An objective (for player 1) is a measurable set of infinite sequences of states: Examples: - Reachability, safety - Büchi, coBüchi - Parity - Borel BC BüchicoBüchi ReachabilitySafety

Value Probability of finite prefix of a play: induces a unique probability measure on measurable sets of plays: and a value function for Player 1:

Value Probability of finite prefix of a play: induces a unique probability measure on measurable sets of plays: and a value function for Player 1: Our reductions preserve values and existence of optimal strategies.

Outline Randomness in game structures - for free with complete-observation, concurrent - for free with one-sided, turn-based Randomness in strategies - for free in (PO)MDP Corollary: undecidability results

Preliminary Rational probabilitiesProbabilities only 1. Make all probabilistic states have two successors

Preliminary Rational probabilitiesProbabilities only 2. Use binary encoding of p and q-p to simulate [Zwick,Paterson 96] :

Preliminary Rational probabilitiesProbabilities only 2. Use binary encoding of p and q-p to simulate [Zwick,Paterson 96] :

Preliminary Rational probabilitiesProbabilities only 2. Use binary encoding of p and q-p to simulate [Zwick,Paterson 96] : Polynomial-time reduction, for parity objectives

Outline Randomness in game structure - for free with complete-observation, concurrent - for free with one-sided, turn-based Randomness in strategies - for free in (PO)MDP Corollary: undecidability results

Classification Classification according to Information & Interaction complete observation one-sided complete obs. partial observation turn-based concurrent Information Interaction

Reduction Simulate probabilistic state with concurrent state

Probability to go from s to s’ 0 : Reduction Simulate probabilistic state with concurrent state

Reduction Simulate probabilistic state with concurrent state Probability to go from s to s’ 0 : If, then

Probability to go from s to s’ 0 : If, then Reduction Simulate probabilistic state with concurrent state

Probability to go from s to s’ 0 : If, then Simulate probabilistic state with concurrent state Each player can unilaterally decide to simulate the original game. Reduction

Simulate probabilistic state with concurrent state Player 1 can obtain at least the value v(s) by playing all actions uniformly at random: v’(s) ≥ v(s) Player 2 can force the value to be at most v(s) by playing all actions uniformly at random: v’(s) ≤ v(s) Reduction

For {complete,one-sided,partial} observation, given a game with rational probabilities we can construct a concurrent game with deterministic transition function such that: and existence of optimal observation-based strategies is preserved. The reduction is in polynomial time for complete-observation parity games. Reduction

Information leak from the back edges… Partial information

Information leak from the back edges… Partial information No information leakage if all probabilities have same denominator q  use 1/q=gcd of all probabilities in G

Information leak from the back edges… Partial information No information leakage if all probabilities have same denominator q  use 1/q=gcd of all probabilities in G The reduction is then exponential.

Example

Outline Randomness in game structure - for free with complete-observation, concurrent - for free with one-sided, turn-based Randomness in strategies - for free in (PO)MDP Corollary: undecidability results

Overview Classification according to Information & Interaction complete observation one-sided complete obs. partial observation turn-based concurrent Information Interaction

Reduction Simulate probabilistic state with imperfect information turn-based states Player 1 states Player 2 state Player 1 observation

Reduction Each player can unilaterally decide to simulate the probabilistic state by playing uniformly at random: Player 2 chooses states (s,0), (s,1) unifiormly at random Player 1 chooses actions 0,1 unifiormly at random Simulate probabilistic state with imperfect information turn-based states

Games with rational probabilities can be reduced to turn- based-games with deterministic transitions and (at least) one-sided complete observation. Values and existence of optimal strategies are preserved. Reduction

Randomness for free In transition function (this talk):

When randomness is not for free Complete-information turn-based ( ) games - in deterministic games, value is either 0 or 1 [Martin98] - MDPs with reachability objective can have values in [0,1] Randomness is not for free. -player games (MDP & POMDP) - in deterministic partial info -player games, value is either 0 or 1 [see later] - MDPs have value in [0,1] Randomness is not for free.

Randomness for free In transition function (this talk):

Randomness for free In transition function (this talk): In strategies (Everett’57, Martin’98, CDHR’07):

Randomness in strategies Example - concurrent, complete observation - reachability Reminder: randomized strategy for Player : pure strategy for Player : Randomized strategies are more powerful

Randomness in strategies Example - turn-based, one-sided complete observation - reachability Randomized strategies are more powerful

Outline Randomness in game structure - for free with complete-observation, concurrent - for free with one-sided, turn-based Randomness in strategies - for free in (PO)MDP Corollary: undecidability results

(p) Randomness in strategies Randomized strategy probability p a b (1-p) RandStrategy(p) { x = rand(0…1) if x≤p then out(a) else out(b) }

(p) Randomness in strategies Randomized strategy probability p a b (1-p) Random generator Uniform probability x  [0,1] Pure strategy a b x ≤ p x > p

Randomness for free in strategies For every randomized observation-based strategy, there exists a pure observation-based strategy such that: 1½-player game (POMDP), s 0 initial state. Proof. (assume alphabet of size 2, and fan-out = 2) Given σ, we show that the value of σ can be obtained as the average of the value of pure strategies σ x :

Randomness for free in strategies Strategies σ x are obtained by «de-randomization» of σ Assume σ(s 1 )(a) = p and σ(s 1 )(b) = 1-p σ is equivalent to playing σ 0 with frequency p and σ 1 with frequency 1-p where:

Randomness for free in strategies Strategies σ x are obtained by «de-randomization» of σ Assume σ(s 1 )(a) = p and σ(s 1 )(b) = 1-p σ is equivalent to playing σ 0 with frequency p and σ 1 with frequency 1-p where:

Randomness for free in strategies Equivalently, toss a coin x  [0,1], play σ 0 if x ≤ p, and play σ 1 if x > p. Playing σ in G can be viewed as a sequence of coin tosses: s0s0 a b x ≤ p x > p toss x  [0,1] [ p = σ(s 1 )(a) ]

s1s1 Randomness for free in strategies Equivalently, toss a coin x  [0,1], play σ 0 if x ≤ p, and play σ 1 if x > p. Playing σ in G can be viewed as a sequence of coin tosses: s0s0 a b x ≤ p x > p y ≤ q a y > q a y ≤ q b y > q b s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 toss x  [0,1] toss y  [0,1] [ p = σ(s 1 )(a) ] [ q a = δ(s 0,a)(s 1 ) ]s’ [ q b = δ(s 0,b)(s 1 ) ]s’’’

Randomness for free in strategies Equivalently, toss a coin x  [0,1], play σ 0 if x ≤ p, and play σ 1 if x > p. Playing σ in G can be viewed as a sequence of coin tosses: s0s0 a b x ≤ p x > p y ≤ q a y > q a y ≤ q b y > q b … … s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 s1s1 toss x  [0,1] toss y  [0,1]toss x  [0,1] [ p = σ(s 1 )(a) ] [ q a = δ(s 1,a)(s 1 ) ]s’ [ q b = δ(s 1,b)(s 1 ) ]s’’’

Randomness for free in strategies Given an infinite sequence x=(x n ) n≥0  [0,1] ω, define for all s 0, s 1, …, s n : σ x is a pure and observation-based strategy ! σ x plays like σ, assuming that the result of the coin tosses is the sequence x. The value of σ is the « average of the outcome » of the strategies σ x.

Randomness for free in strategies Assume x=(x n ) n≥0 and y=(y n ) n≥0 are fixed. Then σ determines a unique outcome… which is the outcome of some pure strategy ! s0s0 a b x 0 ≤ p 0 x 0 > p 0 y 0 ≤ q 0a y 0 > q 0a y 0 ≤ q 0b y 0 > q 0b s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 s1s1 x 1 > p 1 y 1 > q 1b

Randomness for free in strategies s0s0 a b x 0 ≤ p x 0 > p y 0 ≤ q a y 0 > q a y 0 ≤ q b y 0 > q b … … s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 s1s1 Assume x=(x n ) n≥0 and y=(y n ) n≥0 are fixed. Let where

Randomness for free in strategies s0s0 a b x 0 ≤ p x 0 > p y 0 ≤ q a y 0 > q a y 0 ≤ q b y 0 > q b … … s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 s1s1 Assume x=(x n ) n≥0 and y=(y n ) n≥0 are fixed. Let where Theorem:

Randomness for free in strategies s0s0 a b x 0 ≤ p x 0 > p y 0 ≤ q a y 0 > q a y 0 ≤ q b y 0 > q b … … s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 s1s1 Assume x=(x n ) n≥0 and y=(y n ) n≥0 are fixed. Let where Playing σ in G is equivalent to choosing (x n ) and (y n ) uniformly at random, and then producing

Randomness for free in strategies s0s0 a b x 0 ≤ p x 0 > p y 0 ≤ q a y 0 > q a y 0 ≤ q b y 0 > q b … … s’ s’’ s’’’ s’’’’ s1s1 s1s1 s1s1 s1s1 Assume x=(x n ) n≥0 and y=(y n ) n≥0 are fixed. Let where Playing σ in G is equivalent to choosing (x n ) and (y n ) uniformly at random, and then producing The random sequences (x n ) and (y n ) can be generated separately, and independently ! σ x is a pure strategy ! pure strategy

Proof The value of σ is the « average of the outcome » of the strategies σ x.

Proof The value of σ is the « average of the outcome » of the strategies σ x. By Fubini’s theorem…

Proof The value of σ is the « average of the outcome » of the strategies σ x. Pure and randomized strategies are equally powerful in POMDPs !

Randomness for free In transition function: In strategies:

Outline Randomness in game structure - for free with complete-observation, concurrent - for free with one-sided, turn-based Randomness in strategies - for free in (PO)MDP Corollary: undecidability results

Corollary Almost-sure coBüchi (and positive Büchi) with randomized (or pure) strategies is undecidable for POMDPs. Using [BaierBertrandGrößer’08]: Using randomness for free in transition functions : Almost-sure coBüchi (and positive Büchi) is undecidable for deterministic turn-based one-sided complete observation games.

Thank you ! Questions ?