1 Factored MDPs Alan Fern * * Based in part on slides by Craig Boutilier.

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 3 Ann Nowé By Sutton.
Dynamic Bayesian Networks (DBNs)
Slide 1 Planning: Representation and Forward Search Jim Little UBC CS 322 October 10, 2014 Textbook §8.1 (Skip )- 8.2)
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
LCSLCS 18 September 2002DARPA MARS PI Meeting Intelligent Adaptive Mobile Robots Georgios Theocharous MIT AI Laboratory with Terran Lane and Leslie Pack.
Planning under Uncertainty with Markov Decision Processes: Lecture I Craig Boutilier Department of Computer Science University of Toronto.
Infinite Horizon Problems
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Generalizing Plans to New Environments in Relational MDPs
CPSC 322, Lecture 18Slide 1 Planning: Heuristics and CSP Planning Computer Science cpsc322, Lecture 18 (Textbook Chpt 8) February, 12, 2010.
10/28 Temporal Probabilistic Models. Temporal (Sequential) Process A temporal process is the evolution of system state over time Often the system state.
(c) , C. Boutilier, E. Hansen, D. Weld
Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
Markov Decision Processes CSE 473 May 28, 2004 AI textbook : Sections Russel and Norvig Decision-Theoretic Planning: Structural Assumptions.
Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
Markov Decision Processes
CPSC 322, Lecture 17Slide 1 Planning: Representation and Forward Search Computer Science cpsc322, Lecture 17 (Textbook Chpt 8.1 (Skip )- 8.2) February,
Representing Uncertainty CSE 473. © Daniel S. Weld 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one.
CPSC 322, Lecture 17Slide 1 Planning: Representation and Forward Search Computer Science cpsc322, Lecture 17 (Textbook Chpt 8.1 (Skip )- 8.2) February,
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Multiagent Planning with Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
Dynamic Bayesian Networks CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanningLearning.
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)
Decision Making Under Uncertainty Lec #7: Markov Decision Processes UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Craig.
1 Symbolic Dynamic Programming Alan Fern * * Based in part on slides by Craig Boutilier.
CPSC 502, Lecture 13Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 13 Oct, 25, 2011 Slide credit POMDP: C. Conati.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 6: Optimality Criterion in MDPs Dr. Itamar Arel College of Engineering Department.
Solving Large Markov Decision Processes Yilan Gu Dept. of Computer Science University of Toronto April 12, 2004.
Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn.
Introduction to Bayesian Networks
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Announcements Project 4: Ghostbusters Homework 7
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Decision Making Under Uncertainty Lec #5: Markov Decision Processes UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Craig.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
1 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart CSC384: Lecture 24  Last time Variable elimination, decision trees  Today decision.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
1 Automated Planning and Decision Making 2007 Automated Planning and Decision Making Prof. Ronen Brafman Various Subjects.
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
1 Markov Decision Processes Finite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Computer Science CPSC 322 Lecture 12 SLS wrap up, Forward Planning, Planning as CSP (Ch 8.2, 8.4) Slide 1.
CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.
Planning as model checking, (OBDDs)
Sequential Stochastic Models
Planning: Forward Planning and CSP Planning
Planning: Representation and Forward Search
Planning: Representation and Forward Search
Representing Uncertainty
Stochastic Planning using Decision Diagrams
Chapter 20. Learning and Acting with Bayes Nets
CS 416 Artificial Intelligence
Reinforcement Learning Dealing with Partial Observability
Symbolic Dynamic Programming
Reinforcement Learning (2)
Planning: Representation and Forward Search
Reinforcement Learning (2)
Presentation transcript:

1 Factored MDPs Alan Fern * * Based in part on slides by Craig Boutilier

2 Planning in Large State Space MDPs  You have learned algorithms for computing optimal policies  Value Iteration  Policy Iteration  These algorithms explicitly enumerate the state space  Often this is impractical  Simulation-based planning and RL allowed for approximate planning in large MDPs  Did not utilize an explicit model of the MDP. Only used a strong or weak simulator.  What if we had a compact representation for a large MDP and could efficiently plan with it?  Would allow for exact solutions to very large MDPs.  We will study representations and algorithms for doing this

3 A Planning Problem

4 Logical or Feature-based Problems  For most AI problems state are not viewed as atomic entities.  They contain structure. For example, they are described by a set of propositions  |S| exponential in number of propositions  Basic policy and value iteration do nothing to exploit the structure of the MDP when it is available

5 Solution?  Require structured representations  compactly represent transition function  compactly represent reward function  compactly represent value functions and policies  Require structured computation  perform steps of PI or VI directly on structured representations  can avoid the need to enumerate state space  We start by representing the transition structure as dynamic Bayesian networks

6 Propositional Representations  States decomposable into state variables  Structured representations the norm in AI  Decision diagrams, Bayesian networks, etc.  Describe how actions affect/depend on features  Natural, concise, can be exploited computationally  Same ideas can be used for MDPs

7 Robot Domain as Propositional MDP  Propositional variables for single user version  Loc (robot’s locat’n): Office, Entrance  T (lab is tidy): boolean  CR (coffee request outstanding): boolean  RHC (robot holding coffee): boolean  RHM (robot holding mail): boolean  M (mail waiting for pickup): boolean  Actions/Events  move to an adjacent location, pickup mail, get coffee, deliver mail, deliver coffee, tidy lab  mail arrival, coffee request issued, lab gets messy  Rewards  rewarded for tidy lab, satisfying a coffee request, delivering mail  (or penalized for their negation)

8 State Space  State of MDP: assignment to these six variables  64 states  grows exponentially with number of variables  Transition matrices  4032 parameters required per matrix  one matrix per action (6 or 7 or more actions)  Reward function  64 reward values needed  Factored state and action descriptions will break this exponential dependence (generally)

9 Dynamic Bayesian Networks (DBNs)  Bayesian networks (BNs) a common representation for probability distributions  A graph (DAG) represents conditional independence  Conditional probability tables (CPTs) quantify local probability distributions  Dynamic Bayes net action representation  one Bayes net for each action a, representing the set of conditional distributions Pr(S t+1 |A t,S t )  each state variable occurs at time t and t+1  dependence of t+1 variables on t variables depicted by directed arcs

10 DBN Representation: deliver coffee TtTt LtLt CR t RHC t T t+1 L t+1 CR t+1 RHC t+1 Pr(CR t+1 | L t,CR t,RHC t ) Pr (T t+1 | T t ) L CR RHC CR (t+1) CR (t+1) O T T E T T O F T E F T O T F E T F O F F E F F T T (t+1) T (t+1) T F RHM t RHM t+1 MtMt M t+1 Pr(RHM t+1 |RHM t ) RHM R (t+1) R (t+1) T F

11 Benefits of DBN Representation Pr (S t+1 | S t ) = Pr (RHM t+1,M t+1,T t+1, L t+1,C t+1, RHC t+1 | RHM t,M t,T t, L t,C t, RHC t ) = Pr (RHM t+1 |RHM t ) * Pr (M t+1 | M t ) * Pr (T t+1 | T t ) * Pr (L t+1 | L t ) * Pr (CR t+1 | CR t, RHC t, L t ) * Pr (RHC t+1 | RHC t, L t ) - Only 20 parameters vs for matrix -Removes global exponential dependence s 1 s 2... s 64 s s S TtTt LtLt CR t RHC t T t+1 L t+1 CR t+1 RHC t+1 RHM t RHM t+1 MtMt M t+1 Full Matrix

12 Structure in CPTs  So far we have represented each CPT as a table of size exponential in the number of parents  Notice that there’s regularity in CPTs  e.g., Pr (CR t+1 | L t, CR t, RHC t ) has many similar entries  Compact function representations for CPTs can be used to great effect  decision trees  algebraic decision diagrams (ADDs/BDDs)  Here we show examples of decision trees (DTs)

13 Action Representation – DBN/DT CR(t) RHC(t) L(t) 0.2 Decision Tree (DT) TtTt LtLt CR t RHC t T t+1 L t+1 CR t+1 RHC t+1 RHM t RHM t+1 MtMt M t+1 f t t o e Leaves of DT give Pr (CR t+1 =true | L t, CR t, RHC t ) DTs can often represent conditional probabilities much more compactly than a full conditional probability table e.g. If CR(t) = true & RHC(t) = false then CR(t+1)=TRUE with prob f

14 Reward Representation  Rewards represented with DTs in a similar fashion  Would require vector of size 2 n for explicit representation CR M T f tf t -10 f t Small reward for satisfying all of these conditions High cost for unsatisfied coffee request High, but lower, cost for undelivered mail Cost for lab being untidy

15 Structured Computation  Given compact representation, can we solve MDP without explicit state space enumeration?  Can we avoid O(|S|)-computations by exploiting regularities made explicit by representation?  We will study a general approach for doing this called structured dynamic programming