Discounting the Future in Systems Theory Chess Review May 11, 2005 Berkeley, CA Luca de Alfaro, UC Santa Cruz Tom Henzinger, UC Berkeley Rupak Majumdar,

Slides:

Advertisements

Similar presentations

From Graph Models to Game Models Tom Henzinger EPFL.

Advertisements

An improved on-the-fly tableau construction for a real-time temporal logic Marc Geilen 12 July 2003 /e.

Timed Automata Rajeev Alur University of Pennsylvania SFM-RT, Bertinoro, Sept 2004.

Markov Decision Process

Hybrid Systems Presented by: Arnab De Anand S. An Intuitive Introduction to Hybrid Systems Discrete program with an analog environment. What does it mean?

1 Coinduction Principle for Games Michel de Rougemont Université Paris II & LRI.

Energy and Mean-Payoff Parity Markov Decision Processes Laurent Doyen LSV, ENS Cachan & CNRS Krishnendu Chatterjee IST Austria MFCS 2011.

Questions?. Setting a reward function, with and without subgoals Difference between agent and environment AI for games, Roomba Markov Property – Broken.

Interface-based design Philippe Giabbanelli CMPT 894 – Spring 2008.

Optimal Policies for POMDP Presented by Alp Sardağ.

Krishnendu Chatterjee1 Partial-information Games with Reachability Objectives Krishnendu Chatterjee Formal Methods for Robotics and Automation July 15,

Possibilistic and probabilistic abstraction-based model checking Michael Huth Computing Imperial College London, United Kingdom.

An Introduction to Markov Decision Processes Sarah Hickmott

Planning under Uncertainty

Fault-Tolerant Real-Time Networks Tom Henzinger UC Berkeley MURI Kick-off Workshop Berkeley, May 2000.

Quantitative Verification Arindam Chakrabarti * Krishnendu Chatterjee * Thomas A. Henzinger * Orna Kupferman ** Rupak Majumdar *** * UC Berkeley ** Hebrew.

Stochastic Zero-sum and Nonzero-sum  -regular Games A Survey of Results Krishnendu Chatterjee Chess Review May 11, 2005.

Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger.

Models and Theory of Computation (MTC) EPFL Dirk Beyer, Jasmin Fisher, Nir Piterman Simon Kramer: Logic for cryptography Marc Schaub: Models for biological.

Markov Decision Processes CSE 473 May 28, 2004 AI textbook : Sections Russel and Norvig Decision-Theoretic Planning: Structural Assumptions.

1 On Generating Safe Controllers for Discrete-Time Linear Systems By Adam Cataldo EE 290N Project UC Berkeley December 10, 2004 unsafe state disable this.

Metrics for real time probabilistic processes Radha Jagadeesan, DePaul University Vineet Gupta, Google Inc Prakash Panangaden, McGill University Josee.

Math443/543 Mathematical Modeling and Optimization

Stochastic Games Games played on graphs with stochastic transitions Markov decision processes Games against nature Turn-based games Games against adversary.

Approaches to Reactive System Synthesis J.-H. Roland Jiang.

Uninformed Search Reading: Chapter 3 by today, Chapter by Wednesday, 9/12 Homework #2 will be given out on Wednesday DID YOU TURN IN YOUR SURVEY?

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Chess Review November 21, 2005 Berkeley, CA Edited and presented by Advances in Hybrid System Theory: Overview Claire J. Tomlin UC Berkeley.

Chess Review October 4, 2006 Alexandria, VA Edited and presented by Hybrid Systems: Theoretical Contributions Part II Tom Henzinger.

Chess Review November 18, 2004 Berkeley, CA Hybrid Systems Theory Edited and Presented by Thomas A. Henzinger, Co-PI UC Berkeley.

Stochastic Games Krishnendu Chatterjee CS 294 Game Theory.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

NSF Foundations of Hybrid and Embedded Software Systems UC Berkeley: Chess Vanderbilt University: ISIS University of Memphis: MSI Program Review May 10,

Abstract Verification is traditionally done by determining the truth of a temporal formula (the specification) with respect to a timed transition system.

Antoine Girard VAL-AMS Project Meeting April 2007 Behavioral Metrics for Simulation-based Circuit Validation.

Quantitative Languages Krishnendu Chatterjee, UCSC Laurent Doyen, EPFL Tom Henzinger, EPFL CSL 2008.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Approximation Metrics for Discrete and Continuous Systems Antoine Girard and George J. Pappas VERIMAG Workshop.

MAKING COMPLEX DEClSlONS

Model Checking Lecture 4 Tom Henzinger. Model-Checking Problem I |= S System modelSystem property.

Reinforcement Learning

Model Checking Lecture 3 Tom Henzinger. Model-Checking Problem I |= S System modelSystem property.

Languages of nested trees Swarat Chaudhuri University of Pennsylvania (with Rajeev Alur and P. Madhusudan)

Rutgers University A Polynomial-time Nash Equilibrium Algorithm for Repeated Stochastic Games Enrique Munoz de Cote Michael L. Littman.

Uri Zwick Tel Aviv University Simple Stochastic Games Mean Payoff Games Parity Games TexPoint fonts used in EMF. Read the TexPoint manual before you delete.

Expressiveness and Closure Properties for Quantitative Languages Krishnendu Chatterjee, IST Austria Laurent Doyen, ULB Belgium Tom Henzinger, EPFL Switzerland.

© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.

Four Lectures on Model Checking Tom Henzinger University of California, Berkeley.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto.

Management Science Helps analyze and solve organizational problems. It uses scientific and quantitative methods to set up models that are based on controllable.

CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur

Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.

Symbolic Algorithms for Infinite-state Systems Rupak Majumdar (UC Berkeley) Joint work with Luca de Alfaro (UC Santa Cruz) Thomas A. Henzinger (UC Berkeley)

SAT-Based Model Checking Without Unrolling Aaron R. Bradley.

Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.

Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.

Krishnendu ChatterjeeFormal Methods Class1 MARKOV CHAINS.

Four Lectures on Model Checking Tom Henzinger University of California, Berkeley.

Adversarial Search and Game-Playing

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Stochastic -Regular Games

Markov Decision Processes

Markov Decision Processes

CS 188: Artificial Intelligence Fall 2007

Quantitative Modeling, Verification, and Synthesis

Metrics for real time probabilistic processes

CS51A David Kauchak Spring 2019

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Presentation transcript:

Discounting the Future in Systems Theory Chess Review May 11, 2005 Berkeley, CA Luca de Alfaro, UC Santa Cruz Tom Henzinger, UC Berkeley Rupak Majumdar, UC Los Angeles

acb A Graph Model of a System

acb Property  c ("eventually c")

acb   c … some trace has the property  c

acb Property  c ("eventually c")   c… some trace has the property  c   c… all traces have the property  c

graph Richer Models ADVERSARIAL CONCURRENCY: game graph FAIRNESS:  -automaton PROBABILITIES: Markov decision process Stochastic game Parity game

acb 1,12,21,12,2 1,22,11,22,1 1,11,22,21,11,22,2 2,12,1 Concurrent Game player "left" player "right" -for modeling open systems [Abramsky, Alur, Kupferman, Vardi, …] -for strategy synthesis ("control") [Ramadge, Wonham, Pnueli, Rosner]

acb 1,1 2,2 1,2 2,1 1,1 1,2 2,2 2,1 hhleftii  c … player "left" has a strategy to enforce  c Property  c

acb 1,1 2,2 1,2 2,1 1,1 1,2 2,2 2,1 hhleftii  c … player "left" has a strategy to enforce  c  left   c … player “left" has a randomized strategy to enforce  c Property  c Pr(1): 0.5 Pr(2): 0.5

Qualitative Models Trace: sequence of observations Property p: assigns a reward to each trace boolean rewards Model m: generates a set of traces (game) graph Value(p,m): defined from the rewards of the generated traces 9 or 8 (98) B

acb left right a: 0.6 b: 0.4 a: 0.1 b: 0.9 a: 0.5 b: 0.5 a: 0.2 b: left right a: 0.0 c: 1.0 a: 0.7 b: 0.3 a: 0.0 c: 1.0 a: 0.0 b: 1.0 Stochastic Game

acb left right a: 0.6 b: 0.4 a: 0.1 b: 0.9 a: 0.5 b: 0.5 a: 0.2 b: left right a: 0.0 c: 1.0 a: 0.7 b: 0.3 a: 0.0 c: 1.0 a: 0.0 b: 1.0 Probability with which player "left" can enforce  c ? Property  c

Semi-Quantitative Models Trace: sequence of observations Property p: assigns a reward to each trace boolean rewards Model m: generates a set of traces (game) graph Value(p,m): defined from the rewards of the generated traces sup or inf (sup inf) [0,1] µ R

Class of properties p over traces Algorithm for computing Value(p,m) over models m Distance between models w.r.t. property values A Systems Theory

Algorithm for computing Value(p,m) over models m Distance between models w.r.t. property values  -regular properties  -calculus bisimilarity GRAPHS A Systems Theory Class of properties p over traces

Transition Graph Q states  : Q  2 Q transition relation

Graph Regions Q states  : Q  2 Q transition relation  = [ Q ! B ] regions 9pre, 8pre:    8pre(R) 9pre(R) R µ Q 9 8

Graph Property Values: Reachability   R Given R µ Q, find the states from which some trace leads to R. R

R... RR R [  pre(R) R [  pre(R) [  pre 2 (R)   R = (  X) (R Ç 9 pre(X)) Given R µ Q, find the states from which some trace leads to R. Graph Property Values: Reachability

R... RR R [  pre(R) R [  pre(R) [  pre 2 (R)   R = (  X) (R Ç 8 pre(X)) Given R µ Q, find the states from which all traces lead to R. Graph Property Values: Reachability

Q states  l,  r moves of both players  : Q   l   r  Q transition function Concurrent Game

Q states  l,  r moves of both players  : Q   l   r  Q transition function  = [ Q ! B ] regions lpre, rpre:    q  lpre(R) iff (  l   l ) (  r   r )  (q,  l,  r )  R Game Regions lpre(R) R µ Q 1,1 2,* 1,2

R Game Property Values: Reachability  left   R Given R µ Q, find the states from which player "left" has a strategy to force the game to R.

R...  left   R R [ lpre(R) R [ lpre(R) [ lpre 2 (R)  left   R = (  X) (R Ç lpre(X)) Given R µ Q, find the states from which player "left" has a strategy to force the game to R. Game Property Values: Reachability

Algorithm for computing Value(p,m) over models m Distance between models w.r.t. property values  -regular properties (lpre,rpre) fixpoint calculus alternating bisimilarity [Alur, H, Kupferman, Vardi] GAME GRAPHS An Open Systems Theory Class of winning conditions p over traces

Algorithm for computing Value(p,m) over models m  -regular properties (lpre,rpre) fixpoint calculus GAME GRAPHS An Open Systems Theory Every deterministic fixpoint formula  computes Value(p,m), where p is the linear interpretation [Vardi] of . Class of winning conditions p over traces hhleftii  R (  X) (R Ç lpre(X))

Algorithm for computing Value(p,m) over models m Distance between models w.r.t. property values (lpre,rpre) fixpoint calculus alternating bisimilarity GAME GRAPHS An Open Systems Theory Two states agree on the values of all fixpoint formulas iff they are alternating bisimilar [Alur, H, Kupferman, Vardi].

Q states  l,  r moves of both players  : Q   l   r  Dist(Q) probabilistic transition function Stochastic Game

Q states  l,  r moves of both players  : Q   l   r  Dist(Q) probabilistic transition function  = [ Q ! [0,1] ] quantitative regions lpre, rpre:    lpre(R)(q) = (sup  l   l ) (inf  r   r ) R(  (q,  l,  r )) Quantitative Game Regions

Q states  l,  r moves of both players  : Q   l   r  Dist(Q) probabilistic transition function  = [ Q ! [0,1] ] quantitative regions lpre, rpre:    lpre(R)(q) = (sup  l   l ) (inf  r   r ) R(  (q,  l,  r )) Quantitative Game Regions 98 B

acb left right a: 0.6 b: 0.4 a: 0.1 b: 0.9 a: 0.5 b: 0.5 a: 0.2 b: left right a: 0.0 c: 1.0 a: 0.7 b: 0.3 a: 0.0 c: 1.0 a: 0.0 b: 1.0 Probability with which player "left" can enforce  c : (  X) (c Ç lpre(X)) 010 Ç = pointwise max

acb left right a: 0.6 b: 0.4 a: 0.1 b: 0.9 a: 0.5 b: 0.5 a: 0.2 b: left right a: 0.0 c: 1.0 a: 0.7 b: 0.3 a: 0.0 c: 1.0 a: 0.0 b: 1.0 Probability with which player "left" can enforce  c : (  X) (c Ç lpre(X)) 011 Ç = pointwise max

acb left right a: 0.6 b: 0.4 a: 0.1 b: 0.9 a: 0.5 b: 0.5 a: 0.2 b: left right a: 0.0 c: 1.0 a: 0.7 b: 0.3 a: 0.0 c: 1.0 a: 0.0 b: 1.0 Probability with which player "left" can enforce  c : (  X) (c Ç lpre(X)) Ç = pointwise max

acb left right a: 0.6 b: 0.4 a: 0.1 b: 0.9 a: 0.5 b: 0.5 a: 0.2 b: left right a: 0.0 c: 1.0 a: 0.7 b: 0.3 a: 0.0 c: 1.0 a: 0.0 b: 1.0 Probability with which player "left" can enforce  c : (  X) (c Ç lpre(X)) Ç = pointwise max

acb left right a: 0.6 b: 0.4 a: 0.1 b: 0.9 a: 0.5 b: 0.5 a: 0.2 b: left right a: 0.0 c: 1.0 a: 0.7 b: 0.3 a: 0.0 c: 1.0 a: 0.0 b: 1.0 Probability with which player "left" can enforce  c : (  X) (c Ç lpre(X)) 111 Ç = pointwise max In the limit, the deterministic fixpoint formulas work for all  -regular properties [de Alfaro, Majumdar].

Algorithm for computing Value(p,m) over models m Distance between models w.r.t. property values  -regular properties quantitative fixpoint calculus quantitative bisimilarity [Desharnais, Gupta, Jagadeesan, Panangaden] MARKOV DECISION PROCESSES A Probabilistic Systems Theory Class of properties p over traces

Algorithm for computing Value(p,m) over models m quantitative  -regular properties quantitative fixpoint calculus Class of properties p over traces Every deterministic fixpoint formula  computes expected Value(p,m), where p is the linear interpretation of . A Probabilistic Systems Theory MARKOV DECISION PROCESSES max expected value of satisfying  R (  X) (R Ç 9 pre(X))

Qualitative Bisimilarity e: Q 2 ! {0,1} … equivalence relation F … function on equivalences F(e)(q,q') = 0 if q and q' disagree on observations = min { e(r,r’) | r2 9pre(q) Æ r’2 9pre(q’) } else Qualitative bisimilarity … greatest fixpoint of F

Quantitative Bisimilarity d: Q 2 ! [0,1] … pseudo-metric ("distance") F … function on pseudo-metrics F(d)(q,q') = 1 if q and q' disagree on observations ¼ max of sup l inf r d(  (q,l,r),  (q',l,r)) sup r inf l d(  (q,l,r),  (q',l,r)) else Quantitative bisimilarity … greatest fixpoint of F Natural generalization of bisimilarity from binary relations to pseudo-metrics.

Algorithm for computing Value(p,m) over models m Distance between models w.r.t. property values quantitative fixpoint calculus quantitative bisimilarity A Probabilistic Systems Theory MARKOV DECISION PROCESSES Two states agree on the values of all quantitative fixpoint formulas iff their quantitative bisimilarity distance is 0.

Great BUT … 1 The theory is too precise. Even the smallest change in the probability of a transition can cause an arbitrarily large change in the value of a property. 2 The theory is not computational. We cannot bound the rate of convergence for quantitative fixpoint formulas.

Solution: Discounting Economics: A dollar today is better than a dollar tomorrow. Value of $1.- today:1 Tomorrow:  for discount factor 0 <  < 1 Day after tomorrow:  2 etc.

Solution: Discounting Economics: A dollar today is better than a dollar tomorrow. Value of $1.- today:1 Tomorrow:  for discount factor 0 <  < 1 Day after tomorrow:  2 etc. Engineering: A bug today is worse than a bug tomorrow.

Discounted Reachability Reward (   c ) =  k if c is first true after k transitions 0 ifc is never true The reward is proportional to how quickly c is satisfied.

acb Discounted Property   c   c  c 1  22

acb   c  c 1  22 Discounted fixpoint calculus: pre(  )  ¢ pre(  )

Fully Quantitative Models Trace: sequence of observations Property p: assigns a reward to each trace real reward Model m: generates a set of traces (game) graph Value(p,m): defined from the rewards of the generated traces sup or inf (sup inf) [0,1] µ R

Discounted Bisimilarity d: Q 2 ! [0,1] … pseudo-metric ("distance") F … function on pseudo-metrics F(d)(q,q') = 1 if q and q' disagree on observations ¼ max of sup l inf r d(  (q,l,r),  (q',l,r)) sup r inf l d(  (q,l,r),  (q',l,r)) else Quantitative bisimilarity … greatest fixpoint of F  ¢

Algorithm for computing Value(p,m) over models m Distance between models w.r.t. property values discounted  -regular properties discounted fixpoint calculus discounted bisimilarity A Discounted Systems Theory STOCHASTIC GAMES Class of winning rewards p over traces

Algorithm for computing Value(p,m) over models m discounted  -regular properties discounted fixpoint calculus A Discounted Systems Theory STOCHASTIC GAMES Every discounted deterministic fixpoint formula  computes Value(p,m), where p is the linear interpretation of . Class of expected rewards p over traces max expected reward   R achievable by left player (  X) (R Ç  ¢ l pre(X))

Algorithm for computing Value(p,m) over models m Distance between models w.r.t. property values discounted fixpoint calculus discounted bisimilarity A Discounted Systems Theory STOCHASTIC GAMES The difference between two states in the values of discounted fixpoint formulas is bounded by their discounted bisimilarity distance.

Discounting is Robust Continuity over Traces: Every discounted fixpoint formula defines a reward function on traces that is continuous in the Cantor metric. Continuity over Models: If transition probabilities are perturbed by , then discounted bisimilarity distances change by at most f(  ). Discounting is robust against effects at infinity, and against numerical perturbations.

Discounting is Computational The iterative evaluation of an  -discounted fixpoint formula converges geometrically in . (So we can compute to any desired precision.)

Discounting is Approximation If the discount factor tends towards 1, then we recover the classical theory: lim  ! 1  -discounted interpretation of fixpoint formula  = classical interpretation of  lim  ! 1  -discounted bisimilarity = classical (alternating; quantitative) bisimilarity

Further Work Exact computation of discounted values of temporal formulas over finite-state systems [de Alfaro, Faella, H, Majumdar, Stoelinga]. Discounting real-time systems: continuous discounting of time delay rather than discrete discounting of number of steps [Prabhu].

Conclusions Discounting provides a continuous and computational approximation theory of discrete and probabilistic processes. Discounting captures an important engineering intuition. "In the long run, we're all dead." J.M. Keynes