Robust Belief-based Execution of Manipulation Programs

Slides:

Advertisements

Similar presentations

Solving problems by searching

Advertisements

Reactive and Potential Field Planners

Reinforcement Learning

Dialogue Policy Optimisation

Planning with Non-Deterministic Uncertainty (Where failure is not an option) R&N: Chap. 12, Sect (+ Chap. 10, Sect 10.7)

Markov Decision Process

Partially Observable Markov Decision Process (POMDP)

Sensor Based Planners Bug algorithms.

Motion Planning for Point Robots CS 659 Kris Hauser.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.

SA-1 Probabilistic Robotics Probabilistic Sensor Models Beam-based Scan-based Landmarks.

Partially Observable Markov Decision Process By Nezih Ergin Özkucur.

Bayes Filters Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics TexPoint fonts used in EMF. Read the.

1 Last lecture  Configuration Space Free-Space and C-Space Obstacles Minkowski Sums.

Autonomous Robot Navigation Panos Trahanias ΗΥ475 Fall 2007.

Challenges Bayesian Estimation for Autonomous Object Manipulation Based on Tactile Perception Anna Petrovskaya, Oussama Khatib, Sebastian Thrun, Andrew.

Robotics R&N: ch 25 based on material from Jean- Claude Latombe, Daphne Koller, Stuart Russell.

SLAM: Simultaneous Localization and Mapping: Part I Chang Young Kim These slides are based on: Probabilistic Robotics, S. Thrun, W. Burgard, D. Fox, MIT.

Planning for Humanoid Robots Presented by Irena Pashchenko CS326a, Winter 2004.

Presented By: Huy Nguyen Kevin Hufford

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.

Markov Decision Processes

Manipulation Under Uncertainty (or: Executing Planned Grasps Robustly) Kaijen Hsiao Tomás Lozano-Pérez Leslie Kaelbling Computer Science and Artificial.

Solving problems by searching

CS B 659: I NTELLIGENT R OBOTICS Planning Under Uncertainty.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

RL for Large State Spaces: Policy Gradient

Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University

Belief space planning assuming maximum likelihood observations Robert Platt Russ Tedrake, Leslie Kaelbling, Tomas Lozano-Perez Computer Science and Artificial.

Planning and Verification for Stochastic Processes with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University.

REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.

Overview  Decision processes and Markov Decision Processes (MDP)  Rewards and Optimal Policies  Defining features of Markov Decision Process  Solving.

Simultaneous Localization and Mapping Presented by Lihan He Apr. 21, 2006.

Probabilistic Robotics Bayes Filter Implementations Gaussian filters.

1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.

TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.

Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.

Solving POMDPs through Macro Decomposition

U NCERTAINTY IN S ENSING ( AND ACTION ). P LANNING W ITH P ROBABILISTIC U NCERTAINTY IN S ENSING No motion Perpendicular motion.

COMP322/S2000/L281 Task Planning Three types of planning: l Gross Motion Planning concerns objects being moved from point A to point B without problems,

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

Planning Under Uncertainty. Sensing error Partial observability Unpredictable dynamics Other agents.

Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia

Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Achieving Goals in Decentralized POMDPs Christopher Amato Shlomo Zilberstein UMass.

Solving problems by searching Chapter 3. Types of agents Reflex agent Consider how the world IS Choose action based on current percept Do not consider.

Solving problems by searching

CS b659: Intelligent Robotics

ECE 448 Lecture 4: Search Intro

Probabilistic Robotics

Markov ó Kalman Filter Localization

Autonomous Cyber-Physical Systems: Reinforcement Learning for Planning

"Playing Atari with deep reinforcement learning."

Clearing the Jungle of Stochastic Optimization

Markov Decision Processes

Last lecture Configuration Space Free-Space and C-Space Obstacles

Non-parametric Filters

Markov Decision Processes

Hierarchical POMDP Solutions

Probabilistic Robotics

Probabilistic Map Based Localization

Bayes and Kalman Filter

CS 416 Artificial Intelligence

Non-parametric Filters

Humanoid Motion Planning for Dual-Arm Manipulation and Re-Grasping Tasks Nikolaus Vahrenkamp, Dmitry Berenson, Tamim Asfour, James Kuffner, Rudiger Dillmann.

Chapter 4 . Trajectory planning and Inverse kinematics

Deciding Under Probabilistic Uncertainty

Presentation transcript:

Robust Belief-based Execution of Manipulation Programs Kaijen Hsiao Tomás Lozano-Pérez Leslie Pack Kaelbling MIT CSAIL

Achieving Goals under Uncertainty Two kinds of uncertainty: current state: need to plan in information space results of future actions: search branches on outcomes as well as actions Choice of action must be dependent on current information state

Discrete POMDP Formulation states actions observations transition model observation model reward

 POMDP Controller Controller belief SE sensing action Environment State estimation is discrete Bayesian filter Policy maps belief states to actions

Action selection in POMDPs Off-line optimal policy generation Intractable for large spaces On-line search: finite-depth expansion of belief-space tree from current belief state to select single action Tractable in broad subclass of problems

Challenges for action selection Continuous state spaces Requirement to select action for any belief state Long horizon Action branching factor Outcome branching factor Computationally complex observation and transition models

Grasping in uncluttered environments Points of leverage: Robot pose is approximately observable Robot dynamics are nearly deterministic Bounded uncertainty over unobserved object parameters Room to maneuver

Online belief-space search Continuous state space: discretize object state space

Discretize object configuration space workspace configuration space belief state

Online belief-space search Continuous state space: discretize object state space Action for any belief: search forward from current belief state

Search forward from current belief Low entropy belief states enable reliable grasp Use entropy as static evaluation function at leaves Actions can be useful for information gathering

Online belief-space search Continuous state space: discretize object state space Action for any belief: search forward from current belief state Long horizon: use temporally extended actions

Use temporally extended actions Primitive actions Entire trajectories Reduce horizon Observations at end

Online belief-space search Continuous state space: discretize object state space Action for any belief: search forward from current belief state Long horizon: use temporally extended actions Large action branching factor: parameterize small set of action types by current belief

Parameterize actions with belief Actions are entire world-relative trajectories In current belief state, execute with respect to most likely object configuration terminate on contact or end of trajectory

Online belief-space search Continuous state space: discretize object state space Action for any belief: search forward from current belief state Long horizon: use temporally extended actions Large action branching factor: parameterize small set of action types by current belief Computationally complex observation and transition models: precompute models

Precompute models Execute WRT with respect to estimated state e in world state w Expected observation, transition Based on geometric simulation

Online belief-space search Continuous state space: discretize object state space Action for any belief: search forward from current belief state Long horizon: use temporally extended actions Large action branching factor: parameterize small set of action types by current belief Computationally complex observation and transition models: precompute models Large observation branching factor: canonicalize observations for each discrete state and action

Canonicalize observations Any (e, w) pair with same relative transformation has same world-relative outcomes and observations Only sample for one e with w varying within initial range of uncertainty Cluster observations and represent each bin of object configurations by a single representative one Only branch on canonical observations

Algorithm Off-line: plan WRTs for grasping and info gathering compute models On-line: while current belief state doesn’t satisfy goal compute expected info gain of each WRT execute best WRT until termination use observation to update current belief return to initial pose execute final grasp trajectory

Application to grasping with simulated robot arm Initial conditions (ultimately from vision) Object shape is roughly known (contacted vertices should be within ~1 cm of actual positions) Object is on table and pose (x, y, rotation) is roughly known (center of mass std ~5 cm, 30 deg) Achieve specific grasp of object

Observations Fingertips: 6-axis force/torque sensors position normal Additional contact sensors: just contact Swept non-colliding path rules out poses that would have generated contact

Grasping a Box Most likely robot-relative position Where it actually is

Initial belief state

Summed over theta

Tried to move down; finger hit corner

Probability of contact observation at each location

Updated belief

Re-centered

Trying again, with new belief Back up Try again

Final state and observation Observation probabilities Grasp

Updated belief state: Success! Goal: variance < 1 cm x, 15 cm y, 6 deg theta

What if Y coord of grasp matters?

Need explicit information gathering

Simulation Experiments Methods tested: Single open-loop execution of goal-achieving WRT with respect to the most likely state Repeated execution of goal-achieving WRT with respect to the most likely state Online selection of information-gathering and goal-achieving grasps (1-step lookahead)

Box experiments Allowed variation in goal grasp: 1 cm, 1 cm, 5 deg Initial uncertainty: 5 cm, 5 cm, 30 deg

Cup experiments

Cup experiments Goal 1 cm x, 1 cm y, rotation doesn’t matter (no info-grasps used) Start uncertainty 30 deg theta (x,y varies) Increasing uncertainty

Grasping a Brita Pitcher Target grasp: Put one finger through the handle and grasp

Brita Pitcher experiments

Brita Pitcher results Increasing uncertainty

Other recent probabilistic approaches to manipulation Off-line POMDP solution for grasping (Hsiao et al. 2007) Bayesian state estimation using tactile sensors to locate object before grasping (Petrovskaya et al. 2006) Finding a fixed trajectory that is most likely to succeed under uncertainty (Alterovitz et al. 2007, Burns and Brock 2007)

The End.

Timing For Brita Pitcher (2.16 GHz processor, 3.24 GB RAM running Python, times in seconds) 1 cm 3 deg 3 cm 9 deg 5 cm 15 deg 30 deg Grid size 5733 16337 14415 24025 Computing observation matrix (1 traj) 12 33 29 51 1st belief-state update 4 10 19 Choosing 1st info-grasp 9 17 30

Number of Actions Used 1 cm 3 deg 3 cm 9 deg 5 cm 15 deg 30 deg Robust execution of target 1.9 2.5 3.3 3.5 Robust execution with info-grasps not run 4.4 4.1 4.2

Creating Information-gain Trajectories Trajectory generation Generate endpoints, use randomized planner (such as OpenRAVE) to find nominal collision-free path Sweep through entire workspace Choose a small set based on information gain from start uncertainty