Planning and Execution with Phase Transitions Håkan L. S. Younes Carnegie Mellon University Follow-up paper to Younes & Simmons’ “Solving Generalized Semi-Markov.

Slides:



Advertisements
Similar presentations
Lirong Xia Reinforcement Learning (2) Tue, March 21, 2014.
Advertisements

Markov Decision Processes (MDPs) read Ch utility-based agents –goals encoded in utility function U(s), or U:S  effects of actions encoded in.
State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
Markov Decision Process
Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
Partially Observable Markov Decision Process (POMDP)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Decision Theoretic Planning
1 Chapter 5 Continuous time Markov Chains Learning objectives : Introduce continuous time Markov Chain Model manufacturing systems using Markov Chain Able.
Statistical Probabilistic Model Checking Håkan L. S. Younes Carnegie Mellon University.
Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.
What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.
An Introduction to Markov Decision Processes Sarah Hickmott
Partially Observable Markov Decision Process By Nezih Ergin Özkucur.
COSC 878 Seminar on Large Scale Statistical Machine Learning 1.
Planning under Uncertainty
Concurrent Markov Decision Processes Mausam, Daniel S. Weld University of Washington Seattle.
Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Markov Decision Processes
Multiagent Planning with Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
Department of Computer Science Undergraduate Events More
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
Policy Generation for Continuous-time Stochastic Domains with Concurrency Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.
Instructor: Vincent Conitzer
A1A1 A4A4 A2A2 A3A3 Context-Specific Multiagent Coordination and Planning with Factored MDPs Carlos Guestrin Shobha Venkataraman Daphne Koller Stanford.
Search and Planning for Inference and Learning in Computer Vision
Reinforcement Learning on Markov Games Nilanjan Dasgupta Department of Electrical and Computer Engineering Duke University Durham, NC Machine Learning.
Planning and Verification for Stochastic Processes with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University.
Chapter 5 Statistical Models in Simulation
CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn.
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
CPSC 7373: Artificial Intelligence Lecture 10: Planning with Uncertainty Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering.
Department of Computer Science Undergraduate Events More
Solving POMDPs through Macro Decomposition
Carnegie Mellon University
A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.
Extending PDDL to Model Stochastic Decision Processes Håkan L. S. Younes Carnegie Mellon University.
Decision-Theoretic Planning with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University.
Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking Presented by Shihao Ji Duke University Machine Learning.
Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)
Announcements  Upcoming due dates  Wednesday 11/4, 11:59pm Homework 8  Friday 10/30, 5pm Project 3  Watch out for Daylight Savings and UTC.
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.
A Formalism for Stochastic Decision Processes with Asynchronous Events Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.
A Structured Solution Approach for Markov Regenerative Processes Elvio G. Amparore 1, Peter Buchholz 2, Susanna Donatelli 1 1 Dipartimento di Informatica,
1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.
Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.
1 Chapter 5 Continuous time Markov Chains Learning objectives : Introduce continuous time Markov Chain Model manufacturing systems using Markov Chain Able.
István Szita & András Lőrincz
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Markov Decision Processes
Planning to Maximize Reward: Markov Decision Processes
Markov Decision Processes
Reinforcement Learning
Hidden Markov Models (cont.) Markov Decision Processes
Unifying Variational and GBP Learning Parameters of MNs EM for BNs
CS 416 Artificial Intelligence
Reinforcement Learning Dealing with Partial Observability
Reinforcement Nisheeth 18th January 2019.
Reinforcement Learning (2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Reinforcement Learning (2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

Planning and Execution with Phase Transitions Håkan L. S. Younes Carnegie Mellon University Follow-up paper to Younes & Simmons’ “Solving Generalized Semi-Markov Processes using Continuous Phase-Type Distributions” (AAAI’04)

YounesPlanning and Execution with Phase Transitions2 Planning with Time and Probability Temporal uncertainty When to schedule service? WorkingFailedServiced Service F 1 Return F 2 Fail F 3 Memoryless distributions  MDP

YounesPlanning and Execution with Phase Transitions3 The Role of Phases Memoryless property of exponential distribution makes MDPs tractable Many phenomena are not memoryless Lifetime of product or computer process Phases introduce memory into state space Modeling tool—not part of real world (phases are hidden variables)

YounesPlanning and Execution with Phase Transitions4 Phase-Type Distributions Time to absorption in Markov chain with n transient states and one absorbing state 1 Exponential 21 p 1 (1 – p) 1 2 Two-phase Coxian n21  n -phase Erlang

YounesPlanning and Execution with Phase Transitions5 Phase-Type Approximation F(x)F(x) 1 x Weibull(8,4.5) Erlang(7.3/8,8) Erlang(7.3/4,4) Exponential(1/7.3)

YounesPlanning and Execution with Phase Transitions6 Solving MDPs with Phase Transitions Solve MDP as if phases were observable Maintain belief distribution over phase configurations during execution AAAI’04 paper: Simulate phase transitions Use Q MDP value method to select actions OK, because there are typically no actions that are useful for observing phases

YounesPlanning and Execution with Phase Transitions7 Factored Event Model Asynchronous events and actions Each event/action has: Enabling condition  e (Phase-type) distribution G e governing the time e must remain enabled before it triggers A distribution p e (s′; s) determining the probability that the next state is s′ if e triggers in state s

YounesPlanning and Execution with Phase Transitions8 Event Example Computer cluster with crash event and reboot action for each computer Crash event for computer i :  i = up i ; p(s′; s) = 1 iff s  up i and s′   up i Reboot action for computer i:  i =  up i ; p(s′; s) = 1 iff s   up i and s′  up i

YounesPlanning and Execution with Phase Transitions9 Event Example Cluster with two computers t = 0t = 0.6t = 0.9 crash 2 up 1 up 2 up 1  up 2  up 1  up 2 crash 1

YounesPlanning and Execution with Phase Transitions10 Exploiting Structure Value iteration with ADDs: Factored state representation means that some variable assignments may be invalid Phase is one for disabled events Binary encoding of phases with n ≠ 2 k Value iteration with state filter f :

YounesPlanning and Execution with Phase Transitions11 Phase Tracking Infer phase configurations from observable features of the environment Physical state of process Current time Transient analysis for Markov chains The probability of being in state s at time t

YounesPlanning and Execution with Phase Transitions12 Phase Tracking Belief distribution changes continuously Use fixed update interval  Independent phase tracking for each event Q matrix is n  n for n -phase distribution Phase tracking is O(n) for Erlang distribution

YounesPlanning and Execution with Phase Transitions13 The Foreman’s Dilemma When to enable “Service” action in “Working” state? Working c = 1 Failed c = 0 Serviced c = –0.1 Service Exp(10) Return Exp(1) Fail G

YounesPlanning and Execution with Phase Transitions14 Foreman’s Dilemma: Policy Performance n = 8,  = 0.5 n = 4,  = 0.5 n = 8, simulate n = 4, simulate SMDP x Percent of optimal Failure-time distribution: W(1.6x,4.5)

YounesPlanning and Execution with Phase Transitions15 Foreman’s Dilemma: Enabling Time for Action optimal n = 8,  = 0.5 n = 8, simulate SMDP x Service enabling time

YounesPlanning and Execution with Phase Transitions16 System Administration Network of n machines Reward rate c(s) = k in states where k machines are up One crash event and one reboot action per machine At most one action enabled at any time (single agent)

YounesPlanning and Execution with Phase Transitions17 System Administration: Policy Performance n = 4,  =  n = 2,  =  n = 4, simulate n = 2, simulate n = 1 3 Expected discounted reward Number of machines Reboot-time distribution: U(0,1)

YounesPlanning and Execution with Phase Transitions18 System Administration: Planning Time no filter, n = 4 pre-filter, n = 5 pre-filter, n = 4 filter, n = 5 filter, n = 4 no filter, n = Number of machines Planning time –

YounesPlanning and Execution with Phase Transitions19 Discussion Phase-type approximations add state variables, but structure can be exploited Use approximate solution methods (e.g., Guestrin et al., JAIR 19) Improved phase tracking using transient analysis instead of phase simulation Recent CTBN work with phase-type dist.: modeling—not planning

YounesPlanning and Execution with Phase Transitions20 Tempastic-DTP A tool for GSMDP planning: