Energy and Mean-Payoff Parity Markov Decision Processes Laurent Doyen LSV, ENS Cachan & CNRS Krishnendu Chatterjee IST Austria MFCS 2011.

Slides:



Advertisements
Similar presentations
Model Checking Lecture 3. Specification Automata Syntax, given a set A of atomic observations: Sfinite set of states S 0 Sset of initial states S S transition.
Advertisements

From Graph Models to Game Models Tom Henzinger EPFL.
Avoiding Determinization Orna Kupferman Hebrew University Joint work with Moshe Vardi.
Automatic Verification Book: Chapter 6. How can we check the model? The model is a graph. The specification should refer the the graph representation.
Part VI NP-Hardness. Lecture 23 Whats NP? Hard Problems.
Uri Zwick Tel Aviv University Simple Stochastic Games Mean Payoff Games Parity Games.
Winning concurrent reachability games requires doubly-exponential patience Michal Koucký IM AS CR, Prague Kristoffer Arnsfelt Hansen, Peter Bro Miltersen.
CS 267: Automated Verification Lecture 8: Automata Theoretic Model Checking Instructor: Tevfik Bultan.
Automatic Verification Book: Chapter 6. What is verification? Traditionally, verification means proof of correctness automatic: model checking deductive:
Markov Decision Process
Distributed Markov Chains P S Thiagarajan School of Computing, National University of Singapore Joint work with Madhavan Mukund, Sumit K Jha and Ratul.
The Big Picture Chapter 3. We want to examine a given computational problem and see how difficult it is. Then we need to compare problems Problems appear.
1 Temporal Claims A temporal claim is defined in Promela by the syntax: never { … body … } never is a keyword, like proctype. The body is the same as for.
Krishnendu Chatterjee1 Graph Games with Reachabillity Objectives: Mixing Chess, Soccer and Poker Krishnendu Chatterjee 5 th Workshop on Reachability Problems,
Krishnendu Chatterjee1 Partial-information Games with Reachability Objectives Krishnendu Chatterjee Formal Methods for Robotics and Automation July 15,
Randomness for Free Laurent Doyen LSV, ENS Cachan & CNRS joint work with Krishnendu Chatterjee, Hugo Gimbert, Tom Henzinger.
Alpaga A Tool for Solving Parity Games with Imperfect Information Dietmar Berwanger 1 Krishnendu Chatterjee 2 Martin De Wulf 3 Laurent Doyen 3,4 Tom Henzinger.
Simulation Games Michael Maurer. Overview Motivation 4 Different (Bi)simulation relations and their rules to determine the winner Problem with delayed.
Synchronizing Words for Probabilistic Automata Laurent Doyen LSV, ENS Cachan & CNRS Thierry Massart, Mahsa Shirmohammadi Université Libre de Bruxelles.
Markov Decision Processes
Probabilistic CEGAR* Björn Wachter Joint work with Holger Hermanns, Lijun Zhang TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Krishnendu Chatterjee and Laurent Doyen. Energy parity games are infinite two-player turn-based games played on weighted graphs. The objective of the game.
CSE 555 Protocol Engineering Dr. Mohammed H. Sqalli Computer Engineering Department King Fahd University of Petroleum & Minerals Credits: Dr. Abdul Waheed.
Stochastic Zero-sum and Nonzero-sum  -regular Games A Survey of Results Krishnendu Chatterjee Chess Review May 11, 2005.
On the Use of Automata Techniques to Decide Satisfiability Mia Minnes May 3, 2005.
Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger.
Models and Theory of Computation (MTC) EPFL Dirk Beyer, Jasmin Fisher, Nir Piterman Simon Kramer: Logic for cryptography Marc Schaub: Models for biological.
1 Polynomial Church-Turing thesis A decision problem can be solved in polynomial time by using a reasonable sequential model of computation if and only.
NP-complete and NP-hard problems Transitivity of polynomial-time many-one reductions Definition of complexity class NP –Nondeterministic computation –Problems.
Stochastic Games Games played on graphs with stochastic transitions Markov decision processes Games against nature Turn-based games Games against adversary.
Approaches to Reactive System Synthesis J.-H. Roland Jiang.
NP-complete and NP-hard problems
Review of the automata-theoretic approach to model-checking.
Chess Review November 18, 2004 Berkeley, CA Hybrid Systems Theory Edited and Presented by Thomas A. Henzinger, Co-PI UC Berkeley.
1 Completeness and Complexity of Bounded Model Checking.
Stochastic Games Krishnendu Chatterjee CS 294 Game Theory.
Quantitative Languages Krishnendu Chatterjee, UCSC Laurent Doyen, EPFL Tom Henzinger, EPFL CSL 2008.
Solving Games Without Determinization Nir Piterman École Polytechnique Fédéral de Lausanne (EPFL) Switzerland Joint work with Thomas A. Henzinger.
Regular Model Checking Ahmed Bouajjani,Benget Jonsson, Marcus Nillson and Tayssir Touili Moran Ben Tulila
Institute for Applied Information Processing and Communications 1 Karin Greimel Semmering, Open Implication.
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
Energy Parity Games Laurent Doyen LSV, ENS Cachan & CNRS Krishnendu Chatterjee IST Austria.
Model Checking Lecture 3 Tom Henzinger. Model-Checking Problem I |= S System modelSystem property.
Tonga Institute of Higher Education Design and Analysis of Algorithms IT 254 Lecture 8: Complexity Theory.
Algorithmic Software Verification III. Finite state games and pushdown automata.
Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.
Uri Zwick Tel Aviv University Simple Stochastic Games Mean Payoff Games Parity Games TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Games with Secure Equilibria Krishnendu Chatterjee (Berkeley) Thomas A. Henzinger (EPFL) Marcin Jurdzinski (Warwick)
Avoiding Determinization Orna Kupferman Hebrew University Joint work with Moshe Vardi.
CSCI 3160 Design and Analysis of Algorithms Tutorial 10 Chengyu Lin.
Expressiveness and Closure Properties for Quantitative Languages Krishnendu Chatterjee, IST Austria Laurent Doyen, ULB Belgium Tom Henzinger, EPFL Switzerland.
1 Design and Analysis of Algorithms Yoram Moses Lecture 11 June 3, 2010
Ásbjörn H Kristbjörnsson1 The complexity of Finding Nash Equilibria Ásbjörn H Kristbjörnsson Algorithms, Logic and Complexity.
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
Variants of LTL Query Checking Hana ChocklerArie Gurfinkel Ofer Strichman IBM Research SEI Technion Technion - Israel Institute of Technology.
Church’s Problem and a Tour through Automata Theory Wolfgang Thomas Pillars of Computer Science. Springer Berlin Heidelberg, 2008.
Theory of Computational Complexity TA : Junichi Teruyama Iwama lab. D3
1 Markov Decision Processes Finite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Krishnendu ChatterjeeFormal Methods Class1 MARKOV CHAINS.
Part VI NP-Hardness.
The Multiple Dimensions of Mean-Payoff Games
Automatic Verification
Stochastic -Regular Games
Intro to Theory of Computation
Alternating tree Automata and Parity games
Chapter 34: NP-Completeness
Uri Zwick Tel Aviv University
Quantitative Modeling, Verification, and Synthesis
Instructor: Aaron Roth
Presentation transcript:

Energy and Mean-Payoff Parity Markov Decision Processes Laurent Doyen LSV, ENS Cachan & CNRS Krishnendu Chatterjee IST Austria MFCS 2011

Games for system analysis Verification: check if a given system is correct  reduces to graph searching System input output Spec: φ( input,output ) Environment

Games for system analysis ? input output Spec: φ( input,output ) Environment Verification: check if a given system is correct  reduces to graph searching Synthesis : construct a correct system  reduces to game solving – finding a winning strategy

Games for system analysis Spec: φ( input,output ) Verification: check if a given system is correct  reduces to graph searching Synthesis : construct a correct system  reduces to game solving – finding a winning strategy This talk: environment is abstracted as a stochastic process ? input output Environment input output = Markov decision process (MDP) ?

Markov decision process

Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

Markov decision process (MDP) Nondeterministic (player 1) Probabilistic Strategy (policy) = recipe to extend the play prefix

Markov decision process (MDP) Nondeterministic (player 1) Probabilistic Strategy (policy) = recipe to extend the play prefix

Markov decision process (MDP) Nondeterministic (player 1) Probabilistic (player 2) Strategy (policy) = recipe to extend the play prefix weak

Objective Strategy is almost-sure winning, if with probability 1: - Büchi: visit accepting states infinitely often. - Parity: least priority visited infinitely often is even. Fix a strategy  (infinite) Markov chain

Decision problem Given an MDP, decide whether there exists an almost-sure winning strategy for parity objective.

Decision problem End-component = set of states U s.t. if then some successor of q is in U if then all successors of q are in U strongly connected U Given an MDP, decide whether there exists an almost-sure winning strategy for parity objective.

Decision problem End-component is good if least priority is even Almost-sure reachability to good end-components in PTIME strongly connected U Given an MDP, decide whether there exists an almost-sure winning strategy for parity objective. End-component = set of states U s.t. if then some successor of q is in U if then all successors of q are in U strongly connected

Given an MDP, decide whether there exists an almost-sure winning strategy for parity objective. Decision problem End-component is good if least priority is even Almost-sure reachability to good end-components in PTIME strongly connected End-component = set of states U s.t. if then some successor of q is in U if then all successors of q are in U strongly connected

Parity condition Qualitative ω-regular specifications (reactivity, liveness,…) Objectives

Parity conditionEnergy condition QualitativeQuantitative ω-regular specifications (reactivity, liveness,…) Resource-constrained specifications Objectives

Energy objective Positive and negative weights (encoded in binary)

Energy objective Energy level: 20, 21, 11, 1, 2,… (sum of weights) Positive and negative weights (encoded in binary)

Energy objective Positive and negative weights Energy level: 20, 21, 11, 1, 2,… (sum of weights) Initial credit

Energy objective Positive and negative weights Energy level: 20, 21, 11, 1, 2,… (sum of weights) Initial credit 20 A play is winning if the energy level is always nonnegative. “Never exhaust the resource (memory, battery, …)”

Decision problem Given a weighted MDP, decide whether there exist an initial credit c 0 and an almost-sure winning strategy to maintain the energy level always nonnegative.

Decision problem Equivalent to a two-player game: If player 2 can force a negative energy level on a path, then the path is finite and has positive probability in MDP player 2 state Given a weighted MDP, decide whether there exist an initial credit c 0 and an almost-sure winning strategy to maintain the energy level always nonnegative.

Decision problem Equivalent to a two-player game: If player 2 can force a negative energy level on a path, then the path is finite and has positive probability in MDP player 2 state

Energy Parity MDP

Objectives Energy parity MDP QualitativeQuantitative Mixed qualitative-quantitative ω-regular specifications (reactivity, liveness,…) Resource-constrained specifications Parity condition Energy condition

Strategy is almost-sure winning with initial credit c 0, if with probability 1: energy condition and parity condition hold Energy parity MDP “never exhaust the resource” and “always eventually do something useful”

Algorithm for Energy Büchi MDP For parity, probabilistic player is my friend For energy, probabilistic player is my opponent

Algorithm for Energy Büchi MDP Replace each probabilistic state by the gadget: For parity, probabilistic player is my friend For energy, probabilistic player is my opponent

Algorithm for Energy Büchi MDP Reduction of energy Büchi MDP to energy Büchi game

Algorithm for Energy Büchi MDP Reduction of energy Büchi MDP to energy Büchi game Player 1 can guess an even priority 2i, and win in the energy Büchi MDP where: Büchi states are 2i-states, and transitions to states with priority <2i are disallowed Reduction of energy parity MDP to energy Büchi MDP

Mean-payoff Parity MDP

Mean-payoff Mean-payoff value of a play = limit- average of the visited weights Optimal mean-payoff value can be achieved with a memoryless strategy. Decision problem: Given a rational threshold, decide if there exists a strategy for player 1 to ensure mean-payoff value at least with probability 1.

Mean-payoff Mean-payoff value of a play = limit- average of the visited weights Optimal mean-payoff value can be achieved with a memoryless strategy. Decision problem: Given a rational threshold, decide if there exists a strategy for player 1 to ensure mean-payoff value at least with probability 1.

Mean-payoff games Memoryless strategy σ ensures nonnegative mean-payoff value iff all cycles are nonnegative in G σ iff memoryless strategy σ is winning in energy game Mean-payoff games with threshold 0 are equivalent to energy games.

Mean-payoff vs. Energy EnergyMean-Payoff Games NP  coNP MDP NP  coNP PTIME

Mean-payoff parity MDPs Find a strategy which ensures with probability 1: - parity condition, and - mean-payoff value ≥ Gadget reduction does not work: Player 1 almost- surely wins Player 1 loses

Algorithm for mean-payoff parity End-component analysis almost-surely all states of end-component can be visited infinitely often expected mean-payoff value of all states in end-component is same strongly connected End-component is good if - least priority is even - expected mean-payoff value ≥ Almost-sure reachability to good end-component in PTIME

Algorithm for mean-payoff parity End-component analysis almost-surely all states of end-component can be visited infinitely often expected mean-payoff value of all states in end-component is same strongly connected End-component is good if - least priority is even - expected mean-payoff value ≥ Almost-sure reachability to even end-component in PTIME

MDP Parity MDPsEnergy MDPs Mean-payoff MDPs Energy parity MDPs Mean-payoff parity MDPs QualitativeQuantitative Mixed qualitative-quantitative

MDP Parity MDPsEnergy MDPs Mean-payoff MDPs Energy parity MDPs Mean-payoff parity MDPs QualitativeQuantitative Mixed qualitative-quantitative

Summary Energy parityMean-Payoff parity Games NP  coNP MDP NP  coNP PTIME Algorithmic complexity

Summary Energy parityMean-Payoff parity Games NP  coNP MDP NP  coNP PTIME Strategy complexity Energy parityMean-Payoff parity Gamesn·d·Winfinite MDP2·n·Winfinite Algorithmic complexity

Thank you ! Questions ? The end

References [CdAHS03]A.Chakrabarti, L. de Alfaro, T.A. Henzinger, and M. Stoelinga. Resource interfaces, Proc. of EMSOFT: Embedded Software, LNCS 2855, Springer, pp , 2003 [EM79]A. Ehrenfeucht, and J. Mycielski, Positional Strategies for Mean-Payoff Games, International Journal of Game Theory, vol. 8, pp , 1979 [BFL+08]P. Bouyer, U. Fahrenberg, K.G. Larsen, N. Markey, and J. Srba, Infinite Runs in Weighted Timed Automata with Energy Constraints, Proc. of FORMATS: Formal Modeling and Analysis of Timed Systems, LNCS 5215, Springer, pp , 2008 [CHJ05]K. Chatterjee, T.A. Henzinger, M. Jurdzinski. Mean-payoff Parity Games, Proc. of LICS: Logic in Computer Science, IEEE, pp , 2005.

Energy parity

Mean-payoff parity

Complexity StrategyAlgorithmic Player 1Player 2complexity Energymemoryless NP  coNP Paritymemoryless NP  coNP Energy parityexponentialmemoryless NP  coNP Mean-payoffmemoryless NP  coNP Mean-payoff parity infinitememoryless NP  coNP