Planning and Verification for Stochastic Processes with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University.

Slides:

Advertisements

Similar presentations

Modeling and Simulation By Lecturer: Nada Ahmed. Introduction to simulation and Modeling.

Advertisements

Probabilistic Planning (goal-oriented) Action Probabilistic Outcome Time 1 Time 2 Goal State 1 Action State Maximize Goal Achievement Dead End A1A2 I A1.

Partially Observable Markov Decision Process (POMDP)

Probabilistic Verification of Discrete Event Systems using Acceptance Sampling Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

1 Chapter 5 Continuous time Markov Chains Learning objectives : Introduce continuous time Markov Chain Model manufacturing systems using Markov Chain Able.

MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.

Statistical Probabilistic Model Checking Håkan L. S. Younes Carnegie Mellon University.

Ymer: A Statistical Model Checker Håkan L. S. Younes Carnegie Mellon University.

Probabilistic Verification of Discrete Event Systems Håkan L. S. Younes Reid G. Simmons (initial work performed at HTC, Summer 2001)

An Introduction to Markov Decision Processes Sarah Hickmott

COSC 878 Seminar on Large Scale Statistical Machine Learning 1.

Planning under Uncertainty

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.

Reinforcement Learning

Discrete Event Simulation How to generate RV according to a specified distribution? geometric Poisson etc. Example of a DEVS: repair problem.

Concurrent Probabilistic Temporal Planning (CPTP) Mausam Joint work with Daniel S. Weld University of Washington Seattle.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Discretization Pieter Abbeel UC Berkeley EECS

Probabilistic Verification of Discrete Event Systems Håkan L. S. Younes.

Probabilistic Temporal Planning with Uncertain Durations Mausam Joint work with Daniel S. Weld University of Washington Seattle.

Multiagent Planning with Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Department of Computer Science Undergraduate Events More

VESTA: A Statistical Model- checker and Analyzer for Probabilistic Systems Authors: Koushik Sen Mahesh Viswanathan Gul Agha University of Illinois at Urbana-Champaign.

Reinforcement Learning (1)

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell.

Policy Generation for Continuous-time Stochastic Domains with Concurrency Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.

Introduction to Stochastic Models GSLM 54100

Planning with Concurrency in Continuous-time Stochastic Domains (thesis proposal) Håkan L. S. Younes Thesis Committee: Reid G. Simmons (chair) Geoffrey.

Search and Planning for Inference and Learning in Computer Vision

 1  Outline  stages and topics in simulation  generation of random variates.

Instructor: Spyros Reveliotis homepage: IE6650: Probabilistic Models Fall 2007.

Planning and Execution with Phase Transitions Håkan L. S. Younes Carnegie Mellon University Follow-up paper to Younes & Simmons’ “Solving Generalized Semi-Markov.

Some Probability Theory and Computational models A short overview.

Reinforcement Learning

Haley: A Hierarchical Framework for Logical Composition of Web Services Haibo Zhao, Prashant Doshi LSDIS Lab, Dept. of Computer Science, University of.

By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering.

Solving POMDPs through Macro Decomposition

Carnegie Mellon University

Generalized Semi- Markov Processes (GSMP). Summary Some Definitions The Poisson Process Properties of the Poisson Process  Interarrival times  Memoryless.

Extending PDDL to Model Stochastic Decision Processes Håkan L. S. Younes Carnegie Mellon University.

Decision-Theoretic Planning with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University.

Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto.

Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.

Probabilistic Verification of Discrete Event Systems using Acceptance Sampling Håkan L. S. Younes Carnegie Mellon University.

The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

CONCURRENT SIMULATION: A TUTORIAL Christos G. Cassandras Dept. of Manufacturing Engineering Boston University Boston, MA Scope of.

Probabilistic Verification of Discrete Event Systems Håkan L. S. Younes.

A Formalism for Stochastic Decision Processes with Asynchronous Events Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.

1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.

Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia

Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.

Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.

Håkan L. S. YounesDavid J. Musliner Carnegie Mellon UniversityHoneywell Laboratories Probabilistic Plan Verification through Acceptance Sampling.

1 Chapter 5 Continuous time Markov Chains Learning objectives : Introduce continuous time Markov Chain Model manufacturing systems using Markov Chain Able.

October 20-23rd, 2015 Sandboxing and Reasoning on Malware Infection Trees Kris Ghosh 1, Jose Morales 2, Will Casey 2 and Bud Mishra 3 1. Miami University.

CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.

Partially Observable Markov Decision Process and RL

Comparing Dynamic Programming / Decision Trees and Simulation Techniques BDAuU, Prof. Eckstein.

Statistical probabilistic model checking

Analytics and OR DP- summary.

Markov Decision Processes

Robust Belief-based Execution of Manipulation Programs

Markov Decision Processes

Statistical Model-Checking of “Black-Box” Probabilistic Systems VESTA

On Statistical Model Checking of Stochastic Systems

Statistical Probabilistic Model Checking

Reinforcement Learning (2)

Presentation transcript:

Planning and Verification for Stochastic Processes with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University

2 Introduction Asynchronous processes are abundant in the real world Telephone system, computer network, etc. Discrete-time models are inappropriate for systems with asynchronous events We need to go beyond semi-Markov models!

3 Two Problems Model Checking Given a model of an asynchronous system, check if some property holds Planning In the presence of asynchronous events, find a control policy that satisfies specified objectives My thesis research: Provide solution to both problems

4 Illustrative Example: Tandem Queuing Network q1q1 q2q2 arriveroutedepart q 1 = 0 q 2 = 0 q 1 = 1 q 2 = 0 q 1 = 1 q 2 = 1 q 1 = 2 q 2 = 0 q 1 = 1 q 2 = 0 t = 0t = 1.2t = 3.7t = 3.9t = 5.5 With both queues empty, is the probability less than 0.5 that both queues become full within 5 seconds? q 1 = 1 q 2 = 0 q 1 = 2 q 2 = 0 q 1 = 1 q 2 = 1 q 1 = 1 q 2 = 0

5 Probabilistic Model Checking Given a model M, a state s, and a property , does  hold in s for M ? Model: stochastic discrete event system Property: probabilistic temporal logic formula

6 Probabilistic Model Checking: Example With both queues empty, is the probability less than 0.5 that both queues become full within 5 seconds? State: q 1 = 0  q 2 = 0 Property (CSL): P <0.5 (true U ≤5 q 1 = 2  q 2 = 2)

7 Statistical Solution Method [Younes & Simmons, CAV-02] Use discrete event simulation to generate sample paths Use acceptance sampling to verify probabilistic properties Hypothesis: P ≥  (  ) Observation: verify  over a sample path Not estimation!

8 Why Statistical Approach? Benefits Insensitive to size of system Easy to trade accuracy for speed Easy to parallelize Alternative: Numerical approach Memory intensive Limited to certain classes of systems

9 Tandem Queuing Network: Results [Younes et al., TACAS-04] Verification time (seconds) Size of state space −2 10 −  P ≥0.5 (true U ≤T full)  = 10 −6  =  = 10 −2  = 0.5·10 −2

10 Tandem Queuing Network: Distributed Sampling Use multiple machines to generate observations m 1 : Pentium IV 3GHz m 2 : Pentium III 733MHz m 3 : Pentium III 500MHz % observations m 1 only n m1m1 m2m2 m3m3 timem1m1 m2m

11 Planning with Asynchronous Events Goal oriented planning Satisfy goal condition, expressed in CSL Local search with policy repair [Younes & Simmons, ICAPS-03, ICAPS-04] Decision theoretic planning Maximize reward Generalized semi-Markov decision process (GSMDP) [Younes & Simmons, AAAI-04]

12 Generate, Test and Debug [Simmons, AAAI-88] Generate initial policy Test if policy is good Debug and repair policy good badrepeat

13 Policy Generation Probabilistic planning problem Policy (decision tree) Eliminate uncertainty Solve using temporal planner (e.g. VHPOP [Younes & Simmons, JAIR 20] ) Generate training data by simulating plan Decision tree learning Deterministic planning problem Temporal plan State-action pairs

14 Policy Debugging Sample execution paths Revised policy Sample path analysis Solve deterministic planning problem taking failure scenario into account Generate training data by simulating plan Incremental decision tree learning [Utgoff et al., MLJ 29] Failure scenarios Temporal plan State-action pairs

15 A Model of Stochastic Discrete Event Systems Generalized semi-Markov process (GSMP) [Matthes 1962] A set of events E A set of states S GSMDP Actions A  E are controllable events Policy: mapping from execution histories to sets of actions

16 Events With each event e is associated: A condition  e identifying the set of states in which e is enabled A distribution G e governing the time e must remain enabled before it triggers A distribution p e (s′|s) determining the probability that the next state is s′ if e triggers in state s

17 Events: Example Three events: e 1, e 2, e 3  1 = a,  2 = b,  3 = c G 1 = Exp(1), G 2 = D(2), G 3 = U(0,1) a  b  ca  b  c a  b  ca  b  c G 1 = Exp(1) G 2 = D(2) G 3 = U(0,1) Asynchronous events  beyond semi-Markov a  b  ca  b  c G 2 = D(1.4)G 3 = U(0,1) t = 0t = 0.6t = 2 e1e1 e2e2

18 Rewards and Optimality Lump sum reward k(s,e,s′) associated with transition from s to s′ caused by e Continuous reward rate r(s,A) associated with A being enabled in s Infinite-horizon discounted reward Unit reward earned at time t counts as e –  t Optimal choice may depend on entire execution history

19 GSMDP Solution Method Continuous-time MDPGSMDPDiscrete-time MDP Phase-type distributions (approximation) Uniformization [Jensen 1953] GSMDPContinuous-time MDP MDP policyGSMDP policy Simulate phase transitions

20 Continuous Phase-Type Distributions [Neuts 1981] Time to absorption in a continuous-time Markov chain with n transient states 0 Exponential 10 p 1 (1 – p) 1 2 Two-phase Coxian n – 110 … p (1 – p) n -phase generalized Erlang

21 Method of Moments Approximate general distribution G with phase-type distribution PH by matching the first k moments Mean (first moment):  1 Variance:  2 =  2 –  1 2 The i th moment:  i = E[X i ]

22 System Administration Network of n machines Reward rate c(s) = k in states where k machines are up One crash event and one reboot action per machine At most one action enabled at any time (single agent)

23 System Administration: Performance Reward n Reboot-time distribution: U(0,1)

24 System Administration: Performance 1 moment2 moments3 moments sizestatestime (s)statestime (s)statestime (s) n2n (n+1)2 n (1.5n+1)2 n

25 Summary Asynchronous processes exist in the real world The complexity is manageable Model checking: statistical hypothesis testing and simulation Planning: sample path analysis and phase- type distributions

26 Challenges Ahead Evaluation of policy repair techniques Complexity results for optimal GSMDP planning (undecidable?) Discrete phase-type distributions Handles deterministic distributions

27 Tools Ymer Statistical probabilistic model checking Tempastic-DTP Decision theoretic planning with asynchronous events