Partially-Observable Markov Decision Processes Tom Dietterich MCAI 20131
Markov Decision Process as a Decision Diagram MCAI 20132
What If We Can’t Directly Observe the State? MCAI 20133
POMDPs are Hard to Solve Tradeoff between taking actions to gain information and taking actions to change the world – Some actions can do both MCAI 20134
Optimal Management of Difficult-to-Observe Invasive Species [Regan et al., 2011] MCAI Branched Broomrape (Orobanche ramosa) Annual parasitic plant Attaches to root system of host plant Results in 75-90% reduction in host biomass Each plant makes ~50,000 seeds Seeds are viable for 12 years
Quarantine Area in S. Australia MCAI 375 farms; 70km x 70km area Google maps
Formulation as a POMDP: Single Farm 7 State Diagram MCAI 2013
Optimal MDP Policy 8 If plant is detected, Fumigate; Else Do Nothing Assumes perfect detection MCAI 2013
9 Same as the Optimal MDP Policy Action OBSERVATION Decision State After State MCAI Fumigate ABSENT PRESENT Nothing ABSENT PRESENT
10MCAI 2013 Deny 01 Fumigate ABS PRESENT ABS PRESENT 2 ABS 16 PRESENT... Nothing ABS PRESENT
Probability of Eradication 11MCAI 2013
Discussion 12MCAI 2013
Ways to Avoid a POMDP (1) MCAI
Ways to Avoid a POMDP (2) MCAI
Formulation as an MDP MCAI
Belief States MCAI empty seeds weeds + seeds
Belief State Reasoning MCAI Each observation updates the belief state Example: observing the presence of weeds means weeds are present and seeds might also be present empty seeds weeds + seeds empty seeds weeds + seeds observe present
Taking Actions MCAI Each action updates the belief state Example: fumigate empty seeds weeds + seeds empty seeds weeds + seeds fumigate
Belief MDP MCAI State space: all reachable belief states Action space: same actions as the POMDP Reward function: expected rewards derived from the underlying states Transition function: moves in belief space Problem: Belief space is continuous and there can be an immense number of reachable states
Monte Carlo Policy Evaluation MCAI Key Insight: It is just as easy to evaluate a policy via Monte Carlo trials in a POMDP as it is an in MDP! Approach: Define a space of policies Evaluate them by Monte Carlo trials Pick the best one
Finite State Machine Policies MCAI In many POMDPs (and MDPs), a policy can be represented as a finite state machine We can design a set of FSM policies and then evaluate them There are algorithms for incrementally improving FSM policies Deny 01 Fumigate ABS PRESENT ABS PRESENT 2 ABS 16 PRESENT... Nothing ABS PRESENT
Summary MCAI