High-level robot behavior control using POMDPs

Slides:

Advertisements

Similar presentations

A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems Sriraam Natarajan, Kshitij Judah, Prasad Tadepalli and Alan Fern School.

Advertisements

Dialogue Policy Optimisation

ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS

Partially Observable Markov Decision Process (POMDP)

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

Fast approximate POMDP planning: Overcoming the curse of history! Joelle Pineau, Geoff Gordon and Sebastian Thrun, CMU Point-based value iteration: an.

Introduction to Hierarchical Reinforcement Learning Jervis Pinto Slides adapted from Ron Parr (From ICML 2005 Rich Representations for Reinforcement Learning.

Meeting 3 POMDP (Partial Observability MDP) 資工四阮鶴鳴李運寰 Advisor: 李琳山教授.

CS594 Automated decision making University of Illinois, Chicago

MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.

Robot Localization Using Bayesian Methods

Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

LCSLCS 18 September 2002DARPA MARS PI Meeting Intelligent Adaptive Mobile Robots Georgios Theocharous MIT AI Laboratory with Terran Lane and Leslie Pack.

High-level robot behavior control using POMDPs Joelle Pineau and Sebastian Thrun Carnegie Mellon University.

What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.

Partially Observable Markov Decision Process By Nezih Ergin Özkucur.

Bayes Filters Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics TexPoint fonts used in EMF. Read the.

Planning under Uncertainty

Probabilistic Control of Human Robot Interaction: Experiments with a Robotic Assistant for Nursing Homes Joelle Pineau Michael Montemerlo Martha Pollack.

Presented by David Stavens. Autonomous Inspection Compute a path such that every point on the boundary of the workspace can be inspected from some point.

Incremental Pruning CSE 574 May 9, 2003 Stanley Kok.

Markov Decision Processes

An Overview of MAXQ Hierarchical Reinforcement Learning Thomas G. Dietterich from Oregon State Univ. Presenter: ZhiWei.

Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000.

Department of Computer Science Undergraduate Events More

Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC421 – Fall 2003 material from Jean-Claude Latombe, and Daphne Koller.

Collaborative Reinforcement Learning Presented by Dr. Ying Lu.

8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT.

CS B 659: I NTELLIGENT R OBOTICS Planning Under Uncertainty.

Markov Localization & Bayes Filtering

1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.

CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.

TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.

Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths & Weaknesses for Practical Deployment Tim Paek Microsoft Research Dialogue on Dialogues.

Conversation as Action Under Uncertainty Tim Paek Eric Horvitz.

Privacy-Preserving Bayes-Adaptive MDPs CS548 Term Project Kanghoon Lee, AIPR Lab., KAIST CS548 Advanced Information Security Spring 2010.

Towards robotic assistants in nursing homes: challenges and results Joelle Pineau Michael Montemerlo Martha Pollack * Nicholas Roy Sebastian Thrun Carnegie.

A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.

Tractable Planning for Real-World Robotics: The promises and challenges of dealing with uncertainty Joelle Pineau Robotics Institute Carnegie Mellon University.

Transfer in Variable - Reward Hierarchical Reinforcement Learning Hui Li March 31, 2006.

Chapter 10. The Explorer System in Cognitive Systems, Christensen et al. Course: Robots Learning from Humans On, Kyoung-Woon Biointelligence Laboratory.

Probabilistic approaches to reasoning and control: Towards autonomous interactive mobile robots Joelle Pineau Carnegie Mellon University TAMALE Seminar.

Model Minimization in Hierarchical Reinforcement Learning Balaraman Ravindran Andrew G. Barto Autonomous Learning Laboratory.

Reinforcement Learning Dynamic Programming I Subramanian Ramamoorthy School of Informatics 31 January, 2012.

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia

On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA Twenty.

Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.

Partial Observability “Planning and acting in partially observable stochastic domains” Leslie Pack Kaelbling, Michael L. Littman, Anthony R. Cassandra;

Engineering Societies in the Agents World Workshop 2003

Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.

CS b659: Intelligent Robotics

Chapter 1: Motion.

POMDPs Logistics Outline No class Wed

Thrust IC: Action Selection in Joint-Human-Robot Teams

Joelle Pineau Robotics Institute Carnegie Mellon University

Integrating Learning of Dialog Strategies and Semantic Parsing

Markov Decision Processes

Joelle Pineau: General info

Markov Decision Processes

Chapter 3: The Reinforcement Learning Problem

Hierarchical POMDP Solutions

Chapter 3: The Reinforcement Learning Problem

Approximate POMDP planning: Overcoming the curse of history!

Chapter 3: The Reinforcement Learning Problem

Market-based Dynamic Task Allocation in Mobile Surveillance Systems

CS 416 Artificial Intelligence

Reinforcement Learning Dealing with Partial Observability

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Reinforcement Learning (2)

Presentation transcript:

High-level robot behavior control using POMDPs Joelle Pineau and Sebastian Thrun Carnegie Mellon University I - Background III - Experimental Setup USER + WORLD + ROBOT ACTIONS OBSERVATIONS Belief state STATE Abstract: This paper describes a robot controller which uses probabilistic decision-making techniques at the highest-level of behavior control. The POMDP-based robot controller has the ability to incorporate noisy and partial sensor information, and can arbitrate between information-gathering and performance-related actions. We present a hierarchical variant of the POMDP model which exploits structure in the problem domain to accelerate planning. This POMDP controller is implemented and tested onboard a mobile robot in the context of an interactive service task. What are POMDPs? POMDPs model decision-theoretic planning problems. The goal is to find an action-selection strategy that maximizes reward, even in the presence of state uncertainty. Introducing Pearl, the nursing assistant robot: Problem size: |S|=576, |A|=19, |O|=18 Action hierarchy: State features: Robot location Person Location Person Status Reminder Goal Motion Goal Conversation Goal Observation features: Words from speech recognition Button presses from touchscreen Laser readings Reminder messages We need a high-level controller that can: select good behaviors or actions share sensor information between modules handle uncertainty negotiate over goals from different specialized modules arbitrate between information-gathering and performance actions Formally, a POMDP is an n-tuple {S,A,,b,T,O,R } POMDP task 1: Track state: After an action, what is the state of the world? POMDP task 2: Optimize policy: Which action should the controller apply next? S : Set of states A : Set of actions  : Set of observations b(s) := Pr(s | t=0) T(s,a,s’) := Pr(s’ | s,a) O(s,a,o) := Pr(o | s,a) R(s,a)  Robot belief: World state: Control layer: bt-1 ?? at-1 ot st-1 st ... ot-1 rt-1 rt Top controller Task domain: Robot provides reminders and guidance to elderly user. Experimental scenario: Robot must go meet subjects in their apartment and take them to a physiotherapy appointment, while also engaging in appropriate social interaction. Environment: Nursing home near Pittsburgh, PA Test subjects: Six elderly residents in assisted living facility. Physiotherapy Patient room Robot home People tracking/following Autominder Not so hard. Speech recognition&synthesis Autonomous navigation Very hard! Our approach: High-level robot behavior control using Partially Observable Markov Decision Processes (POMDPs) Exact policy optimization (task 2) is computationally intractable for large problems (~20+ states), therefore we need approximations. II - Robot control using Hierarchical POMDPs IV - Results Assumptions Approximating POMDPs for large domains Act InvestigateHealth Move Navigate CheckPulse AskWhere North South East West CheckMeds subtask abstract action primitive action Performance results for three contrasting users. Comparing performance of POMDP policy vs MDP policy. The MDP policy does not consider uncertainty during planning, and therefore is unable to ask clarification actions. For all users, performance is much better using the POMDP policy. Figure 1: Example of a successful guidance experiment. Each subtask is a separate POMDP. Each subtask has a subset of all actions. Primitive actions are placed in leaf nodes, and are from the original action set. Abstract actions are introduced in internal nodes. Each subtask has a non-trivial reward function (i.e. R(s,a) is not constant). Key Idea: Exploit hierarchical structure in the problem domain to break a large problem into many “related” POMDPs. What type of structure? Action set partitioning (a) Pearl approaching subject (b) Reminding of appointment Planning with Hierarchical POMDPs (c) Guidance through corridor (d) Entering physiotherapy dept. Given POMDP model M = { S, A, , b, T, O, R } and subtask hierarchy H For each subtask h  H: 1) Set components Ah  children nodes Sh  S h   bh, Th, Oh, Rh 2) Minimize model Sh  {zh(s0), …, zh(sn)} h  {yh(o0), …, yh(op)} 3) Solve subtask h  {bh, Th, Oh, Rh} Execution with Hierarchical POMDPs Move Navigate AskWhere South East West North ANav ={N,S,E,W} SNav ={X,Y} Nav ={o0,…,om} AMove ={AskWhere,Navigate} SMove ={X’,Y’,Destination} Move ={o0,…,op} Step 1 - Update belief: Step 2 - Traversing hierarchy top-down, for each subtask: 1) Get local belief: 2) Consult local policy: 3) If a is leaf node, terminate. Else, go to that subtask. (e) Asking for weather forecast (e) Pearl leaves Observations from nursing-home experiments: 100% task completion rate amongst test subjects overall high-level of excitement amongst subjects adaptive speed control for robot is necessary improved speech recognition would be great more verbal interaction during guidance is recommended