Presentation is loading. Please wait.

Presentation is loading. Please wait.

Partially-Observable Markov Decision Processes Tom Dietterich MCAI 20131.

Similar presentations


Presentation on theme: "Partially-Observable Markov Decision Processes Tom Dietterich MCAI 20131."— Presentation transcript:

1 Partially-Observable Markov Decision Processes Tom Dietterich MCAI 20131

2 Markov Decision Process as a Decision Diagram MCAI 20132

3 What If We Can’t Directly Observe the State? MCAI 20133

4 POMDPs are Hard to Solve Tradeoff between taking actions to gain information and taking actions to change the world – Some actions can do both MCAI 20134

5 Optimal Management of Difficult-to-Observe Invasive Species [Regan et al., 2011] MCAI 20135  Branched Broomrape (Orobanche ramosa)  Annual parasitic plant  Attaches to root system of host plant  Results in 75-90% reduction in host biomass  Each plant makes ~50,000 seeds  Seeds are viable for 12 years

6 Quarantine Area in S. Australia MCAI 20136  375 farms; 70km x 70km area Google maps

7 Formulation as a POMDP: Single Farm 7 State Diagram MCAI 2013

8 Optimal MDP Policy 8  If plant is detected, Fumigate; Else Do Nothing  Assumes perfect detection www.grdc.com.au MCAI 2013

9 9  Same as the Optimal MDP Policy Action OBSERVATION Decision State After State MCAI 2013 0 1 Fumigate ABSENT PRESENT Nothing ABSENT PRESENT

10 10MCAI 2013 Deny 01 Fumigate ABS PRESENT ABS PRESENT 2 ABS 16 PRESENT... Nothing ABS PRESENT

11 Probability of Eradication 11MCAI 2013

12 Discussion 12MCAI 2013

13 Ways to Avoid a POMDP (1) MCAI 201313

14 Ways to Avoid a POMDP (2) MCAI 201314

15 Formulation as an MDP MCAI 201315

16 Belief States MCAI 201316 empty seeds weeds + seeds

17 Belief State Reasoning MCAI 201317  Each observation updates the belief state  Example: observing the presence of weeds means weeds are present and seeds might also be present empty seeds weeds + seeds empty seeds weeds + seeds observe present

18 Taking Actions MCAI 201318  Each action updates the belief state  Example: fumigate empty seeds weeds + seeds empty seeds weeds + seeds fumigate

19 Belief MDP MCAI 201319  State space: all reachable belief states  Action space: same actions as the POMDP  Reward function: expected rewards derived from the underlying states  Transition function: moves in belief space  Problem: Belief space is continuous and there can be an immense number of reachable states

20 Monte Carlo Policy Evaluation MCAI 201320  Key Insight: It is just as easy to evaluate a policy via Monte Carlo trials in a POMDP as it is an in MDP!  Approach:  Define a space of policies  Evaluate them by Monte Carlo trials  Pick the best one

21 Finite State Machine Policies MCAI 201321  In many POMDPs (and MDPs), a policy can be represented as a finite state machine  We can design a set of FSM policies and then evaluate them  There are algorithms for incrementally improving FSM policies Deny 01 Fumigate ABS PRESENT ABS PRESENT 2 ABS 16 PRESENT... Nothing ABS PRESENT

22 Summary MCAI 201322


Download ppt "Partially-Observable Markov Decision Processes Tom Dietterich MCAI 20131."

Similar presentations


Ads by Google