Partially-Observable Markov Decision Processes Tom Dietterich MCAI 20131.

Slides:



Advertisements
Similar presentations
Markov Decision Process
Advertisements

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Partially Observable Markov Decision Process (POMDP)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Partially Observable Markov Decision Processes
5/11/2015 Mahdi Naser-Moghadasi Texas Tech University.
Meeting 3 POMDP (Partial Observability MDP) 資工四 阮鶴鳴 李運寰 Advisor: 李琳山教授.
CS594 Automated decision making University of Illinois, Chicago
MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.
What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.
An Introduction to Markov Decision Processes Sarah Hickmott
Partially Observable Markov Decision Process By Nezih Ergin Özkucur.
MEAN FIELD FOR MARKOV DECISION PROCESSES: FROM DISCRETE TO CONTINUOUS OPTIMIZATION Jean-Yves Le Boudec, Nicolas Gast, Bruno Gaujal July 26,
Planning under Uncertainty
1 Policies for POMDPs Minqing Hu. 2 Background on Solving POMDPs MDPs policy: to find a mapping from states to actions POMDPs policy: to find a mapping.
POMDPs: Partially Observable Markov Decision Processes Advanced AI
Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.
Machine LearningRL1 Reinforcement Learning in Partially Observable Environments Michael L. Littman.
Apprenticeship Learning by Inverse Reinforcement Learning Pieter Abbeel Andrew Y. Ng Stanford University.
An Introduction to PO-MDP Presented by Alp Sardağ.
Incremental Pruning CSE 574 May 9, 2003 Stanley Kok.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment Magdalena Jankowska (M.Sc. - Algorithms) Ilya Levner (M.Sc - AI/ML)
Markov Decision Processes
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Department of Computer Science Undergraduate Events More
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
Instructor: Vincent Conitzer
MAKING COMPLEX DEClSlONS
Conference Paper by: Bikramjit Banerjee University of Southern Mississippi From the Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence.
Search and Planning for Inference and Learning in Computer Vision
Reinforcement Learning on Markov Games Nilanjan Dasgupta Department of Electrical and Computer Engineering Duke University Durham, NC Machine Learning.
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
Overview  Decision processes and Markov Decision Processes (MDP)  Rewards and Optimal Policies  Defining features of Markov Decision Process  Solving.
CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.
TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.
Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths & Weaknesses for Practical Deployment Tim Paek Microsoft Research Dialogue on Dialogues.
Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.
CPSC 7373: Artificial Intelligence Lecture 10: Planning with Uncertainty Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
Department of Computer Science Undergraduate Events More
Solving POMDPs through Macro Decomposition
Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto.
U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty.
Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.
Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy.
Heuristic Search for problems with uncertainty CSE 574 April 22, 2003 Mausam.
CPS 570: Artificial Intelligence Markov decision processes, POMDPs
Department of Computer Science Undergraduate Events More
MDPs and Reinforcement Learning. Overview MDPs Reinforcement learning.
1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.
Partial Observability “Planning and acting in partially observable stochastic domains” Leslie Pack Kaelbling, Michael L. Littman, Anthony R. Cassandra;
On-Line Markov Decision Processes for Learning Movement in Video Games
POMDPs Logistics Outline No class Wed
CPS 570: Artificial Intelligence Markov decision processes, POMDPs
Biomedical Data & Markov Decision Process
Markov Decision Processes
Planning to Maximize Reward: Markov Decision Processes
Markov Decision Processes
Instructor: Vincent Conitzer
Chapter 17 – Making Complex Decisions
CS 416 Artificial Intelligence
Reinforcement Nisheeth 18th January 2019.
Presentation transcript:

Partially-Observable Markov Decision Processes Tom Dietterich MCAI 20131

Markov Decision Process as a Decision Diagram MCAI 20132

What If We Can’t Directly Observe the State? MCAI 20133

POMDPs are Hard to Solve Tradeoff between taking actions to gain information and taking actions to change the world – Some actions can do both MCAI 20134

Optimal Management of Difficult-to-Observe Invasive Species [Regan et al., 2011] MCAI  Branched Broomrape (Orobanche ramosa)  Annual parasitic plant  Attaches to root system of host plant  Results in 75-90% reduction in host biomass  Each plant makes ~50,000 seeds  Seeds are viable for 12 years

Quarantine Area in S. Australia MCAI  375 farms; 70km x 70km area Google maps

Formulation as a POMDP: Single Farm 7 State Diagram MCAI 2013

Optimal MDP Policy 8  If plant is detected, Fumigate; Else Do Nothing  Assumes perfect detection MCAI 2013

9  Same as the Optimal MDP Policy Action OBSERVATION Decision State After State MCAI Fumigate ABSENT PRESENT Nothing ABSENT PRESENT

10MCAI 2013 Deny 01 Fumigate ABS PRESENT ABS PRESENT 2 ABS 16 PRESENT... Nothing ABS PRESENT

Probability of Eradication 11MCAI 2013

Discussion 12MCAI 2013

Ways to Avoid a POMDP (1) MCAI

Ways to Avoid a POMDP (2) MCAI

Formulation as an MDP MCAI

Belief States MCAI empty seeds weeds + seeds

Belief State Reasoning MCAI  Each observation updates the belief state  Example: observing the presence of weeds means weeds are present and seeds might also be present empty seeds weeds + seeds empty seeds weeds + seeds observe present

Taking Actions MCAI  Each action updates the belief state  Example: fumigate empty seeds weeds + seeds empty seeds weeds + seeds fumigate

Belief MDP MCAI  State space: all reachable belief states  Action space: same actions as the POMDP  Reward function: expected rewards derived from the underlying states  Transition function: moves in belief space  Problem: Belief space is continuous and there can be an immense number of reachable states

Monte Carlo Policy Evaluation MCAI  Key Insight: It is just as easy to evaluate a policy via Monte Carlo trials in a POMDP as it is an in MDP!  Approach:  Define a space of policies  Evaluate them by Monte Carlo trials  Pick the best one

Finite State Machine Policies MCAI  In many POMDPs (and MDPs), a policy can be represented as a finite state machine  We can design a set of FSM policies and then evaluate them  There are algorithms for incrementally improving FSM policies Deny 01 Fumigate ABS PRESENT ABS PRESENT 2 ABS 16 PRESENT... Nothing ABS PRESENT

Summary MCAI