A Formalism for Stochastic Decision Processes with Asynchronous Events Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
Markov Decision Process
Advertisements

Partially Observable Markov Decision Process (POMDP)
Statistical Probabilistic Model Checking Håkan L. S. Younes Carnegie Mellon University.
An Introduction to Markov Decision Processes Sarah Hickmott
1 J.Y Greff, L. Idoumghar and R. Schott TDF && IECN / LORIA - INRIA July 2002 Using Markov Decision Processes to the Frequency Assignment Problem.
COSC 878 Seminar on Large Scale Statistical Machine Learning 1.
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
Markov Decision Processes
Planning under Uncertainty
1 Policies for POMDPs Minqing Hu. 2 Background on Solving POMDPs MDPs policy: to find a mapping from states to actions POMDPs policy: to find a mapping.
SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.
Reinforcement Learning
An Introduction to PO-MDP Presented by Alp Sardağ.
4/1 Agenda: Markov Decision Processes (& Decision Theoretic Planning)
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Discretization Pieter Abbeel UC Berkeley EECS
Probabilistic Verification of Discrete Event Systems Håkan L. S. Younes.
Markov Decision Processes
Multiagent Planning with Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
Algorithms For Inverse Reinforcement Learning Presented by Alp Sardağ.
Department of Computer Science Undergraduate Events More
Reinforcement Learning (1)
Lecture 11 – Stochastic Processes
A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell.
Policy Generation for Continuous-time Stochastic Domains with Concurrency Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.
Introduction to Stochastic Models GSLM 54100
MAKING COMPLEX DEClSlONS
Search and Planning for Inference and Learning in Computer Vision
Planning and Verification for Stochastic Processes with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University.
Reinforcement Learning
Generalized Semi-Markov Processes (GSMP)
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
Planning and Execution with Phase Transitions Håkan L. S. Younes Carnegie Mellon University Follow-up paper to Younes & Simmons’ “Solving Generalized Semi-Markov.
1 Operations Research Prepared by: Abed Alhameed Mohammed Alfarra Supervised by: Dr. Sana’a Wafa Al-Sayegh 2 nd Semester ITGD4207 University.
Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.
Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn.
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.
Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.
By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering.
Department of Computer Science Undergraduate Events More
Carnegie Mellon University
Generalized Semi- Markov Processes (GSMP). Summary Some Definitions The Poisson Process Properties of the Poisson Process  Interarrival times  Memoryless.
Extending PDDL to Model Stochastic Decision Processes Håkan L. S. Younes Carnegie Mellon University.
Decision-Theoretic Planning with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University.
© 2015 McGraw-Hill Education. All rights reserved. Chapter 19 Markov Decision Processes.
CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.
CONCURRENT SIMULATION: A TUTORIAL Christos G. Cassandras Dept. of Manufacturing Engineering Boston University Boston, MA Scope of.
Bayesian Brain: Probabilistic Approaches to Neural Coding Chapter 12: Optimal Control Theory Kenju Doya, Shin Ishii, Alexandre Pouget, and Rajesh P.N.Rao.
CSE 473Markov Decision Processes Dan Weld Many slides from Chris Bishop, Mausam, Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer.
Department of Computer Science Undergraduate Events More
Markov Decision Process (MDP)
Some Final Thoughts Abhijit Gosavi. From MDPs to SMDPs The Semi-MDP is a more general model in which the time for transition is also a random variable.
1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.
Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia
Instructor: Spyros Reveliotis IE7201: Production & Service Systems Engineering Fall 2009 Closure.
Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.
1 Markov Decision Processes Finite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Making complex decisions
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Markov Decision Processes
Markov Decision Processes
Reinforcement Learning in MDPs by Lease-Square Policy Iteration
Chapter 17 – Making Complex Decisions
CS 416 Artificial Intelligence
Reinforcement Nisheeth 18th January 2019.
Reinforcement Learning (2)
Reinforcement Learning (2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Presentation transcript:

A Formalism for Stochastic Decision Processes with Asynchronous Events Håkan L. S. YounesReid G. Simmons Carnegie Mellon University

2 Introduction Asynchronous processes are abundant in the real world Telephone system, computer network, etc. Discrete-time and semi-Markov models are inappropriate for systems with asynchronous events Generalized semi-Markov (decision) processes, GSM(D)Ps, are great for this!

3 Asynchronous Processes: Example m1m1 m2m2 m 1 up m 2 up t = 0

4 Asynchronous Processes: Example m1m1 m2m2 m 1 up m 2 up m 1 up m 2 down m 2 crashes t = 0t = 2.5

5 Asynchronous Processes: Example m1m1 m2m2 m 1 up m 2 up m 1 up m 2 down m 1 down m 2 down m 1 crashesm 2 crashes t = 0t = 2.5t = 3.1

6 Asynchronous Processes: Example m1m1 m2m2 m 1 up m 2 up m 1 up m 2 down m 1 down m 2 down m 1 down m 2 up m 1 crashes m 2 rebooted t = 0t = 2.5t = 3.1t = 4.9

7 A Model of Stochastic Discrete Event Systems Generalized semi-Markov process (GSMP) [Matthes 1962] A set of events E A set of states S GSMDP Actions A  E are controllable events

8 Events With each event e is associated: A condition  e identifying the set of states in which e is enabled A distribution G e governing the time e must remain enabled before it triggers A distribution p e (s′|s) determining the probability that the next state is s′ if e triggers in state s

9 Events: Example Network with two machines Crash time: Exp(1) Reboot time: U(0,1) G c1 = Exp(1) G c2 = Exp(1) Asynchronous events  beyond semi-Markov G c1 = Exp(1) G r2 = U(0,1) G r2 = U(0,0.5) t = 0t = 0.6t = 1.1 crash m 2 m 1 up m 2 up m 1 up m 2 down m 1 down m 2 down crash m 1

10 Semantics of GSMP Model Associate a real-valued clock t e with e For each e  E s sample t e from G e Let e* = argmin e  E s t e, t* = min e  E s t e Sample s′ from p e* (s′|s) For each e  E s′ t e ′ = t e – t* if e  E s \ {e*} sample t e ′ from G e otherwise Repeat with s = s′ and t e = t e ′

11 Semantics: Example Network with two machines Crash time: Exp(1) Reboot time: U(0,1) t c1 = 1.1 t c2 = 0.6 t c1 = 0.5 t r2 = 0.7 t r2 = 0.2 t = 0t = 0.6t = 1.1 crash m 2 m 1 up m 2 up m 1 up m 2 down m 1 down m 2 down crash m 1

12 General State-Space Markov Chain (GSSMC) Model is Markovian if we include the clocks in description of states Extended state space X Next-state distribution f(x′|x) well-defined Clock values are not know to observer Time events have been enabled is know

13 Observations: Example Network with two machines Crash time: Exp(1) Reboot time: U(0,1) t c1 = 1.1 t c2 = 0.6 t c1 = 0.5 t r2 = 0.7 t r2 = 0.2 t = 0t = 0.6t = 1.1 crash m 2 m 1 up m 2 up m 1 up m 2 down m 1 down m 2 down crash m 1 u c1 = 0 u c2 = 0 u c1 = 0.6 u r2 = 0 u r2 = 0.5

14 Observation Model An observation o is a state s and a real value u e for each event representing the time e has currently been enabled Summarizes execution history f(x|o) is well-defined Observations change continuously!

15 Policies Actions as controllable events We can choose to disable an action even if its enabling condition is satisfied A policy determines the set of actions to keep enabled at any given time during execution

16 Rewards and Optimality Lump sum reward k(s,e,s′) associated with transition from s to s′ caused by e Continuous reward rate r(s,A′) associated with A′  A being enabled in s Infinite-horizon discounted reward Unit reward earned at time t counts as e –  t Optimal choice may depend on entire execution history Policy maps observations to sets of states

17 Shape of Optimal Policy Continuous-time MDP: G e = Exp( e ) for all events Sufficient to change action choice at the time of state transitions General distributions Action choice may have to change between state transitions (see [Chitgopekar 1969, Stone 1973, Cantaluppi 1984] for SMDPs)

18 Solving GSMDPs Optimal infinite-horizon GSMDP planning: feasible? Approximate solution techniques Discretize time Approximate general distributions with phase-type distributions and solve resulting MDP [Younes & Simmons, AAAI-04]

19 Summary Current research in decision theoretic planning aimed at synchronous systems Asynchronous systems have largely been ignored The generalized semi-Markov decision process fills the gap