Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy.

Slides:



Advertisements
Similar presentations
UIUC CS 497: Section EA Lecture #2 Reasoning in Artificial Intelligence Professor: Eyal Amir Spring Semester 2004.
Advertisements

Situation Calculus for Action Descriptions We talked about STRIPS representations for actions. Another common representation is called the Situation Calculus.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Decision Theoretic Planning
Optimal Policies for POMDP Presented by Alp Sardağ.
5/11/2015 Mahdi Naser-Moghadasi Texas Tech University.
Meeting 3 POMDP (Partial Observability MDP) 資工四 阮鶴鳴 李運寰 Advisor: 李琳山教授.
CS594 Automated decision making University of Illinois, Chicago
MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.
What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.
Partially Observable Markov Decision Process By Nezih Ergin Özkucur.
主講人:虞台文 大同大學資工所 智慧型多媒體研究室
Partially-Observable Markov Decision Processes Tom Dietterich MCAI
Planning under Uncertainty
1 Policies for POMDPs Minqing Hu. 2 Background on Solving POMDPs MDPs policy: to find a mapping from states to actions POMDPs policy: to find a mapping.
POMDPs: Partially Observable Markov Decision Processes Advanced AI
SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.
Markov Decision Processes CSE 473 May 28, 2004 AI textbook : Sections Russel and Norvig Decision-Theoretic Planning: Structural Assumptions.
Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
Markov Decision Processes
Goals, plans, and planning Northwestern University CS 395 Behavior-Based Robotics Ian Horswill.
Nov 14 th  Homework 4 due  Project 4 due 11/26.
Navigation Jeremy Wyatt School of Computer Science University of Birmingham.
Planning to Gather Information
Incremental Pruning CSE 574 May 9, 2003 Stanley Kok.
Planning in MDPs S&B: Sec 3.6; Ch. 4. Administrivia Reminder: Final project proposal due this Friday If you haven’t talked to me yet, you still have the.
Markov Decision Processes
Department of Computer Science Undergraduate Events More
Presented by Alp Sardağ Algorithms for POMDP. Monahan Enumeration Phase Generate all vectors: Number of gen. Vectors = |A|M |  | where M vectors of previous.
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.
Instructor: Vincent Conitzer
Discrete Mathematics CS 2610 March 26, 2009 Skip: structural induction generalized induction Skip section 4.5.
MAKING COMPLEX DEClSlONS
Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 15: Partially Observable Markov Decision Processes (POMDPs) Dr. Itamar Arel College.
Decision Making Under Uncertainty Lec #7: Markov Decision Processes UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Craig.
CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Decision Making Under Uncertainty Lec #4: Planning and Sensing UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2005 Uses slides by José Luis.
CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.
TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.
Computer Science CPSC 502 Lecture 14 Markov Decision Processes (Ch. 9, up to 9.5.3)
Dynamic Programming for Partially Observable Stochastic Games Daniel S. Bernstein University of Massachusetts Amherst in collaboration with Christopher.
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Department of Computer Science Undergraduate Events More
Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.
Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto.
U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty.
Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.
Decision Making Under Uncertainty Lec #1: Introduction UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2005.
Department of Computer Science Undergraduate Events More
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.
1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.
CS 416 Artificial Intelligence Lecture 20 Making Complex Decisions Chapter 17 Lecture 20 Making Complex Decisions Chapter 17.
Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.
Partial Observability “Planning and acting in partially observable stochastic domains” Leslie Pack Kaelbling, Michael L. Littman, Anthony R. Cassandra;
1 Markov Decision Processes Finite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
POMDPs Logistics Outline No class Wed
Markov Decision Processes
Chapter 16 Planning Based on Markov Decision Processes
Markov Decision Processes
ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 15: Partially Observable Markov Decision Processes (POMDPs) November 5, 2015 Dr.
Instructor: Vincent Conitzer
CS 416 Artificial Intelligence
Reinforcement Learning Dealing with Partial Observability
ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 15: Partially Observable Markov Decision Processes (POMDPs) November 5, 2015 Dr.
Presentation transcript:

Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy Wyatt (U Birmingham), Alp Sardağ, and Craig Boutilier (Toronto)

Partially Observable Planning

Today Partially Observable Markov Decision Processes: –Stochastic Domains –Partially observable

POMDPs Partially observable Markov Decision Process (POMDP): –a stochastic system  = (S, A, P) as before –A finite set O of observations P a (o|s) = probability of observation o in state s after executing action a –Require that for each a and s, ∑ o in O P a (o|s) = 1 O models partial observability –The controller can’t observe s directly; it can only observe o –The same observation o can occur in more than one state Why do the observations depend on the action a? Why do we have P a (o|s) rather than P(o|s)? –This is a way to model sensing actions, which do not change the state but return information make some observation available (e.g., from a sensor)

Example of a Sensing Action Suppose there are a state s 1 action a 1, and observation o 1 with the following properties: –For every state s, P a 1 (s|s) = 1 a 1 does not change the state –P a 1 (o 1 |s 1 ) = 1, and P a 1 (o 1 |s) = 0 for every state s ≠ s 1 After performing a 1, o 1 occurs if and only if we’re in state s 1 Then to tell if you’re in state s 1, just perform action a 1 and see whether you observe o 1 Two states s and s’ are indistinguishable if for every o and a, P a (o|s) = P a (o|s’)

Belief States At each point we will have a probability distribution b(s) over the states in S –b(s) is called a belief state (our belief about what state we’re in) Basic properties: –0 ≤ b(s) ≤ 1 for every s in S –∑ s in S b(s) = 1 Definitions: –b a = the belief state after doing action a in belief state b Thus b a (s) = P(in s after doing a in b) = ∑ s' in S P a (s|s') b(s') –b a (o) = P(observe o after doing a in b) = ∑ s in S P a (o|s) b(s) –b a o (s) = P(in s after doing a in b and observing o)

Belief States (Continued) Recall that in general, P(x|y,z) P(y|z) = P(x,y|z) Thus P a (o|s) b a (s) = P(observe o after doing a in s) P(in s after doing a in b) = P(in s and observe o after doing a in b) Similarly, b a o (s) b a (o) = P(in s after doing a in b and observing o) * P(observe o after doing a in b) = P(in s and observe o after doing a in b) Thus b a o (s) = P a (o|s) b a (s) / b a (o) Can use this to distinguish states that would otherwise be indistinguishable

Example Robot r1 can move between l1 and l2 –move(r1,l1,l2) –move(r1,l2,l1) There may be a container c1 in location l2 –in(c1,l2) O = {full, empty} –full: c1 is present –empty: c1 is absent –abbreviate full as f, and empty as e a = move(r1,l1,l2) state b a baba baba baba baba b b b b state b

Neither “move” action returns useful observations For every state s and for a = either “move” action, –P a (f|s) = P a (e|s) = P a (f|s) = P a (e|s) = 0.5 Thus if there are no other actions, then –s1 and s2 are indistinguishable –s3 and s4 are indistinguishable a = move(r1,l1,l2) state b a baba baba baba baba b b b b state b Example (Continued)

Suppose there’s a sensing action see that works perfectly in location l2 P see (f|s4) = P see (e|s3) = 1 P see (f|s3) = P see (e|s4) = 0 see does not work elsewhere P see (f|s1) = P see (e|s1) = P see (f|s2) = P see (e|s2) = 0.5 Then –s1 and s2 are still indistinguishable –s3 and s4 are now distinguishable Example (Continued) a = move(r1,l1,l2) state b a baba baba baba baba b b b b state b

By itself, see doesn’t tell us the state with certainty –b see e (s3) = P see (e|s3) * b see (s3) / b see (e) = 1 * 0.25 / 0.5 = 0.5 If we first do a=move(l1,l2) then do see, this will tell the state with certainty –Let b' = b a –b' see e (s3) = P see (e|s3) * b' see (s3) / b' see (e) = 1 * 0.5 / 0.5 = 1 Example (Continued) a = move(r1,l1,l2) state b' = b a baba baba baba baba b b b b state b

Policies on Belief States Let B be the set of all belief states In a partially observable domain, a policy is a partial function from B into A S was finite, but B is infinite and continuous –A policy may be either finite or infinite

a = move(r1,l1,l2) state b' = b a baba baba baba baba b b b b state b Modified Example Suppose we know the initial belief state is b Policy to tell if there ’ s a container in l2: –π = {(b, move(r1,l1,l2)), (b', see)}

Solving POMDPs Information-state MDPs –Belief states of POMDP are states in new MDP –Continuous state space –Discretise Policy-tree algorithm

Policy Trees Tree(a,T) – create a new policy tree with action a at root and observation z=T(z) Vp – vector for value function for policy tree p with one component per state Act(p) – action at root of tree p Subtree(p,z) – subtree of p after obs z Stval(a,z,p) – vector for probability- weighted value of tree p after a,z

Monahan Enumeration Phase Generate all vectors: Number of gen. Vectors = |A|M |  | where M vectors of previous state

Monahan Reduction Phase All vectors can be kept: –Each time maximize over all vectors. –Lot of excess baggage –The number of vectors in next step will be even large. LP used to trim away useless vectors

Remove Dominated Policy Trees For a vector to be useful, there must be at least one belief point it gives larger value than others Thus, we solve LP and find if d>0 for every policy tree p,p’ If d=0 for some p’, we remove p

Witness Algorithm for Enumeration of Policy Trees Maintain a set U of useful policies If p is new not dominates by U, then U is not complete: –Find x where p dominates U –Find p’ that dominates all other p on x, and is lexicographically better than all other p –This p’ is useful – add it to U How do we find p’?

Witness/Sondik’s Algorithm Create set U of depth t from U of depth t-1

Homework 1.Read readings for next time: Alex’s paper