Planning to Gather Information

Slides:



Advertisements
Similar presentations
Dialogue Policy Optimisation
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Markov Decision Process
MBD and CSP Meir Kalech Partially based on slides of Jia You and Brian Williams.
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Partially Observable Markov Decision Process (POMDP)
Kshitij Judah, Alan Fern, Tom Dietterich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: School of EECS, Oregon State.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
1 Approximated tracking of multiple non-rigid objects using adaptive quantization and resampling techniques. J. M. Sotoca 1, F.J. Ferri 1, J. Gutierrez.
Multiple Frame Motion Inference Using Belief Propagation Jiang Gao Jianbo Shi Presented By: Gilad Kapelushnik Visual Recognition, Spring 2005, Technion.
Decision Theoretic Planning
Lirong Xia Hidden Markov Models Tue, March 28, 2014.
5/11/2015 Mahdi Naser-Moghadasi Texas Tech University.
Meeting 3 POMDP (Partial Observability MDP) 資工四 阮鶴鳴 李運寰 Advisor: 李琳山教授.
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
LCSLCS 18 September 2002DARPA MARS PI Meeting Intelligent Adaptive Mobile Robots Georgios Theocharous MIT AI Laboratory with Terran Lane and Leslie Pack.
Planning under Uncertainty
1 Policies for POMDPs Minqing Hu. 2 Background on Solving POMDPs MDPs policy: to find a mapping from states to actions POMDPs policy: to find a mapping.
Visual Recognition Tutorial
POMDPs: Partially Observable Markov Decision Processes Advanced AI
CPSC 322, Lecture 11Slide 1 Constraint Satisfaction Problems (CSPs) Introduction Computer Science cpsc322, Lecture 11 (Textbook Chpt 4.0 – 4.2) January,
Reinforcement Learning
Machine LearningRL1 Reinforcement Learning in Partially Observable Environments Michael L. Littman.
Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.
Uninformed Search Reading: Chapter 3 by today, Chapter by Wednesday, 9/12 Homework #2 will be given out on Wednesday DID YOU TURN IN YOUR SURVEY?
Discretization Pieter Abbeel UC Berkeley EECS
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
Markov Decision Processes
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
MAKING COMPLEX DEClSlONS
Decision-Making on Robots Using POMDPs and Answer Set Programming Introduction Robots are an integral part of many sectors such as medicine, disaster rescue.
Search and Planning for Inference and Learning in Computer Vision
Computer vision: models, learning and inference Chapter 19 Temporal models.
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Simultaneous Localization and Mapping Presented by Lihan He Apr. 21, 2006.
Constraint Satisfaction Problems (CSPs) CPSC 322 – CSP 1 Poole & Mackworth textbook: Sections § Lecturer: Alan Mackworth September 28, 2012.
CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.
OLAP : Blitzkreig Introduction 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema :
TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 4, 2013.
CPSC 7373: Artificial Intelligence Lecture 10: Planning with Uncertainty Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
Solving POMDPs through Macro Decomposition
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto.
Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.
Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy.
Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas.
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.
Markov Decision Process (MDP)
COMP 2208 Dr. Long Tran-Thanh University of Southampton Reinforcement Learning.
Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.
 We have also applied VPI in a disaster management setting:  We investigate overlapping coalition formation models. Sequential Decision Making in Repeated.
Partial Observability “Planning and acting in partially observable stochastic domains” Leslie Pack Kaelbling, Michael L. Littman, Anthony R. Cassandra;
Constraint Satisfaction Problems (CSPs) Introduction
CS b659: Intelligent Robotics
Lecture 15: Text Classification & Naive Bayes
Markov Decision Processes
Robust Belief-based Execution of Manipulation Programs
Markov Decision Processes
Hierarchical POMDP Solutions
CS 416 Artificial Intelligence
Shapes.
Presentation transcript:

Planning to Gather Information Richard Dearden University of Birmingham Joint work with Moritz Göbelbecker (ALU), Charles Gretton, Bramley Merton (NOC), Zeyn Saigol, Mohan Sridharan (Texas Tech), Jeremy Wyatt

Underwater Vent Finding AUV used to find vents Can detect vent itself (reliably), plume of fresh water emitted Problem is where to go to collect data to find the vents as efficiently as possible Hard because plume detection is unreliable, can’t easily assign ‘blame’ for the detections we do make

Vision Algorithm Planning Goal: Answer queries and execute commands. Is there a red triangle in the scene? Move the mug to the right of the blue circle. Our operators: colour, shape, SIFT identification, viewpoint change, zoom etc. Problem: Build a plan to achieve the goal with high confidence

Assumptions The visual operators are unreliable Reliability can be represented by a confusion matrix, computed from data Speed of response and answering the query correctly are what really matters We want to build the fastest plan that is ‘reliable enough’ We should include planning time in our performance estimate too Observed Actual Square Circle Triangle 0.85 0.1 0.05 0.80 Don’t forget that the confusion matrices are learned from data, colour is much more reliable than shape, SIFT.

POMDPs Partially Observable Markov Decision Problems (discrete) States, stochastic actions, reward Maximise expected (discounted) long-term reward Assumption: state is completely observable POMDPs: MDPs with observations Infer state from (sequence of) observations Typically maintain belief state, plan over that $

POMDP Formulation States: Cartesian product of individual state vectors Actions: A = {Colour, Shape, SIFT, terminal actions} Observations: {red, green, blue, circle, triangle, square, empty, unknown} Transition function Observation function given by confusion matrices Reward specification time cost of actions, large +ve/-ve rewards on terminal actions Maintain belief over states, likelihood of action outcomes 6

POMDP Formulation For a broad query: ‘what is that?’ For each ROI: 26 states (5 colours x 5 shapes + term) 12 actions (2 operations, 10 terminal actions SayBlueSquare, SayRedTriangle, SayUnknown, …) 8 observations For n ROIs: 25n + 1 states Impractical for even a very small number of ROIs BUT: There’s lots of structure. How to exploit it?

A Hierarchical POMDP Proposed solution: Hierarchical Planning in POMDPs – HiPPo One LL-POMDP for planning the actions in each ROI Higher-level POMDP to choose which LL-POMDP to use at each step Significantly reduces complexity of the state-action-observation space Which Region to Process? HL POMDP Model creation and policy generation are automatic, based on the input query How to Process? LL POMDP

Low-level POMDP The LL-POMDP is the same as the flat POMDP Only ever operates on a single ROI 26 states, 12 Actions Reward combines time-based cost for actions and answer quality Terminal actions are answering the query for this region

Example Query: ‘where is the blue circle?’ State space: Actions: {RedCircle, RedTriangle, BlueCircle, BlueTriangle, …, Terminal} Actions: {Colour, Shape, …, SayFound, …} Observations: {Red, Blue, NoColour, UnknownColour, Triangle, Circle, NoShape, UnknownShape, …} Observation probabilities given by confusion matrix

Policy Policy tree for uniform prior initial state We limit all LL policies to a fixed maximum number of steps Colour B R Shape sNotFound C T Shape Shape T C C T . sFound sNotFound sFound

High-level POMDP State space consists of the regions the object of interest is in Actions are regions to process Observations are whether the object of interest was found in a particular region We derive the observation function and action costs for the HL-POMDP from the policy tree for the LL-POMDP Treat the LL-POMDP as a black box that returns definite labels (not belief densities)

Example Query: ‘where is the blue circle?’ State space: Actions: {DoR1, DoR2, SayR1, SayR2, SayR1^R2, SayNo} Observations: {FoundR1, ¬FoundR1, FoundR2, ¬FoundR2} Observation probabilities are computed from the LL-POMDP

Results (very briefly) Approach Reliability (%) No Planning 76.67 CP Hier-P 91.67

Vent Finding Approach Assume mapping using occupancy grid Rewards only for visiting cells with vents in State space also too large to solve POMDP Instead do fixed length lookahead in belief space Reasoning in belief space allows us to account for value of information gained from observations Use P(vent|all observations so far) as heuristic value at end of lookahead

What we’re working on now Most of these POMDPs are too big to solve Take a domain, problem description in a very general language, generate a classical planning problem for it Assume we can observe any variable we care about For each such observation, use a POMDP planner to determine the value of the variable with high confidence