Download presentation
Presentation is loading. Please wait.
Published byStephanie Mitchell Modified over 8 years ago
1
Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy Wyatt (U Birmingham), Alp Sardağ, and Craig Boutilier (Toronto)
2
Partially Observable Planning
3
Today Partially Observable Markov Decision Processes: –Stochastic Domains –Partially observable
4
POMDPs Partially observable Markov Decision Process (POMDP): –a stochastic system = (S, A, P) as before –A finite set O of observations P a (o|s) = probability of observation o in state s after executing action a –Require that for each a and s, ∑ o in O P a (o|s) = 1 O models partial observability –The controller can’t observe s directly; it can only observe o –The same observation o can occur in more than one state Why do the observations depend on the action a? Why do we have P a (o|s) rather than P(o|s)? –This is a way to model sensing actions, which do not change the state but return information make some observation available (e.g., from a sensor)
5
Example of a Sensing Action Suppose there are a state s 1 action a 1, and observation o 1 with the following properties: –For every state s, P a 1 (s|s) = 1 a 1 does not change the state –P a 1 (o 1 |s 1 ) = 1, and P a 1 (o 1 |s) = 0 for every state s ≠ s 1 After performing a 1, o 1 occurs if and only if we’re in state s 1 Then to tell if you’re in state s 1, just perform action a 1 and see whether you observe o 1 Two states s and s’ are indistinguishable if for every o and a, P a (o|s) = P a (o|s’)
6
Belief States At each point we will have a probability distribution b(s) over the states in S –b(s) is called a belief state (our belief about what state we’re in) Basic properties: –0 ≤ b(s) ≤ 1 for every s in S –∑ s in S b(s) = 1 Definitions: –b a = the belief state after doing action a in belief state b Thus b a (s) = P(in s after doing a in b) = ∑ s' in S P a (s|s') b(s') –b a (o) = P(observe o after doing a in b) = ∑ s in S P a (o|s) b(s) –b a o (s) = P(in s after doing a in b and observing o)
7
Belief States (Continued) Recall that in general, P(x|y,z) P(y|z) = P(x,y|z) Thus P a (o|s) b a (s) = P(observe o after doing a in s) P(in s after doing a in b) = P(in s and observe o after doing a in b) Similarly, b a o (s) b a (o) = P(in s after doing a in b and observing o) * P(observe o after doing a in b) = P(in s and observe o after doing a in b) Thus b a o (s) = P a (o|s) b a (s) / b a (o) Can use this to distinguish states that would otherwise be indistinguishable
8
Example Robot r1 can move between l1 and l2 –move(r1,l1,l2) –move(r1,l2,l1) There may be a container c1 in location l2 –in(c1,l2) O = {full, empty} –full: c1 is present –empty: c1 is absent –abbreviate full as f, and empty as e a = move(r1,l1,l2) state b a baba baba baba baba b b b b state b
9
Neither “move” action returns useful observations For every state s and for a = either “move” action, –P a (f|s) = P a (e|s) = P a (f|s) = P a (e|s) = 0.5 Thus if there are no other actions, then –s1 and s2 are indistinguishable –s3 and s4 are indistinguishable a = move(r1,l1,l2) state b a baba baba baba baba b b b b state b Example (Continued)
10
Suppose there’s a sensing action see that works perfectly in location l2 P see (f|s4) = P see (e|s3) = 1 P see (f|s3) = P see (e|s4) = 0 see does not work elsewhere P see (f|s1) = P see (e|s1) = P see (f|s2) = P see (e|s2) = 0.5 Then –s1 and s2 are still indistinguishable –s3 and s4 are now distinguishable Example (Continued) a = move(r1,l1,l2) state b a baba baba baba baba b b b b state b
11
By itself, see doesn’t tell us the state with certainty –b see e (s3) = P see (e|s3) * b see (s3) / b see (e) = 1 * 0.25 / 0.5 = 0.5 If we first do a=move(l1,l2) then do see, this will tell the state with certainty –Let b' = b a –b' see e (s3) = P see (e|s3) * b' see (s3) / b' see (e) = 1 * 0.5 / 0.5 = 1 Example (Continued) a = move(r1,l1,l2) state b' = b a baba baba baba baba b b b b state b
12
Policies on Belief States Let B be the set of all belief states In a partially observable domain, a policy is a partial function from B into A S was finite, but B is infinite and continuous –A policy may be either finite or infinite
13
a = move(r1,l1,l2) state b' = b a baba baba baba baba b b b b state b Modified Example Suppose we know the initial belief state is b Policy to tell if there ’ s a container in l2: –π = {(b, move(r1,l1,l2)), (b', see)}
14
Solving POMDPs Information-state MDPs –Belief states of POMDP are states in new MDP –Continuous state space –Discretise Policy-tree algorithm
15
Policy Trees Tree(a,T) – create a new policy tree with action a at root and observation z=T(z) Vp – vector for value function for policy tree p with one component per state Act(p) – action at root of tree p Subtree(p,z) – subtree of p after obs z Stval(a,z,p) – vector for probability- weighted value of tree p after a,z
16
Monahan Enumeration Phase Generate all vectors: Number of gen. Vectors = |A|M | | where M vectors of previous state
17
Monahan Reduction Phase All vectors can be kept: –Each time maximize over all vectors. –Lot of excess baggage –The number of vectors in next step will be even large. LP used to trim away useless vectors
18
Remove Dominated Policy Trees For a vector to be useful, there must be at least one belief point it gives larger value than others Thus, we solve LP and find if d>0 for every policy tree p,p’ If d=0 for some p’, we remove p
19
Witness Algorithm for Enumeration of Policy Trees Maintain a set U of useful policies If p is new not dominates by U, then U is not complete: –Find x where p dominates U –Find p’ that dominates all other p on x, and is lexicographically better than all other p –This p’ is useful – add it to U How do we find p’?
20
Witness/Sondik’s Algorithm Create set U of depth t from U of depth t-1
21
Homework 1.Read readings for next time: Alex’s paper
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.