Download presentation
Presentation is loading. Please wait.
Published byRonald Hill Modified over 9 years ago
1
Partially-Observable Markov Decision Processes Tom Dietterich MCAI 20131
2
Markov Decision Process as a Decision Diagram MCAI 20132
3
What If We Can’t Directly Observe the State? MCAI 20133
4
POMDPs are Hard to Solve Tradeoff between taking actions to gain information and taking actions to change the world – Some actions can do both MCAI 20134
5
Optimal Management of Difficult-to-Observe Invasive Species [Regan et al., 2011] MCAI 20135 Branched Broomrape (Orobanche ramosa) Annual parasitic plant Attaches to root system of host plant Results in 75-90% reduction in host biomass Each plant makes ~50,000 seeds Seeds are viable for 12 years
6
Quarantine Area in S. Australia MCAI 20136 375 farms; 70km x 70km area Google maps
7
Formulation as a POMDP: Single Farm 7 State Diagram MCAI 2013
8
Optimal MDP Policy 8 If plant is detected, Fumigate; Else Do Nothing Assumes perfect detection www.grdc.com.au MCAI 2013
9
9 Same as the Optimal MDP Policy Action OBSERVATION Decision State After State MCAI 2013 0 1 Fumigate ABSENT PRESENT Nothing ABSENT PRESENT
10
10MCAI 2013 Deny 01 Fumigate ABS PRESENT ABS PRESENT 2 ABS 16 PRESENT... Nothing ABS PRESENT
11
Probability of Eradication 11MCAI 2013
12
Discussion 12MCAI 2013
13
Ways to Avoid a POMDP (1) MCAI 201313
14
Ways to Avoid a POMDP (2) MCAI 201314
15
Formulation as an MDP MCAI 201315
16
Belief States MCAI 201316 empty seeds weeds + seeds
17
Belief State Reasoning MCAI 201317 Each observation updates the belief state Example: observing the presence of weeds means weeds are present and seeds might also be present empty seeds weeds + seeds empty seeds weeds + seeds observe present
18
Taking Actions MCAI 201318 Each action updates the belief state Example: fumigate empty seeds weeds + seeds empty seeds weeds + seeds fumigate
19
Belief MDP MCAI 201319 State space: all reachable belief states Action space: same actions as the POMDP Reward function: expected rewards derived from the underlying states Transition function: moves in belief space Problem: Belief space is continuous and there can be an immense number of reachable states
20
Monte Carlo Policy Evaluation MCAI 201320 Key Insight: It is just as easy to evaluate a policy via Monte Carlo trials in a POMDP as it is an in MDP! Approach: Define a space of policies Evaluate them by Monte Carlo trials Pick the best one
21
Finite State Machine Policies MCAI 201321 In many POMDPs (and MDPs), a policy can be represented as a finite state machine We can design a set of FSM policies and then evaluate them There are algorithms for incrementally improving FSM policies Deny 01 Fumigate ABS PRESENT ABS PRESENT 2 ABS 16 PRESENT... Nothing ABS PRESENT
22
Summary MCAI 201322
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.