Robust Belief-based Execution of Manipulation Programs Kaijen Hsiao Tomás Lozano-Pérez Leslie Pack Kaelbling MIT CSAIL
Achieving Goals under Uncertainty Two kinds of uncertainty: current state: need to plan in information space results of future actions: search branches on outcomes as well as actions Choice of action must be dependent on current information state
Discrete POMDP Formulation states actions observations transition model observation model reward
POMDP Controller Controller belief SE sensing action Environment State estimation is discrete Bayesian filter Policy maps belief states to actions
Action selection in POMDPs Off-line optimal policy generation Intractable for large spaces On-line search: finite-depth expansion of belief-space tree from current belief state to select single action Tractable in broad subclass of problems
Challenges for action selection Continuous state spaces Requirement to select action for any belief state Long horizon Action branching factor Outcome branching factor Computationally complex observation and transition models
Grasping in uncluttered environments Points of leverage: Robot pose is approximately observable Robot dynamics are nearly deterministic Bounded uncertainty over unobserved object parameters Room to maneuver
Online belief-space search Continuous state space: discretize object state space
Discretize object configuration space workspace configuration space belief state
Online belief-space search Continuous state space: discretize object state space Action for any belief: search forward from current belief state
Search forward from current belief Low entropy belief states enable reliable grasp Use entropy as static evaluation function at leaves Actions can be useful for information gathering
Online belief-space search Continuous state space: discretize object state space Action for any belief: search forward from current belief state Long horizon: use temporally extended actions
Use temporally extended actions Primitive actions Entire trajectories Reduce horizon Observations at end
Online belief-space search Continuous state space: discretize object state space Action for any belief: search forward from current belief state Long horizon: use temporally extended actions Large action branching factor: parameterize small set of action types by current belief
Parameterize actions with belief Actions are entire world-relative trajectories In current belief state, execute with respect to most likely object configuration terminate on contact or end of trajectory
Online belief-space search Continuous state space: discretize object state space Action for any belief: search forward from current belief state Long horizon: use temporally extended actions Large action branching factor: parameterize small set of action types by current belief Computationally complex observation and transition models: precompute models
Precompute models Execute WRT with respect to estimated state e in world state w Expected observation, transition Based on geometric simulation
Online belief-space search Continuous state space: discretize object state space Action for any belief: search forward from current belief state Long horizon: use temporally extended actions Large action branching factor: parameterize small set of action types by current belief Computationally complex observation and transition models: precompute models Large observation branching factor: canonicalize observations for each discrete state and action
Canonicalize observations Any (e, w) pair with same relative transformation has same world-relative outcomes and observations Only sample for one e with w varying within initial range of uncertainty Cluster observations and represent each bin of object configurations by a single representative one Only branch on canonical observations
Algorithm Off-line: plan WRTs for grasping and info gathering compute models On-line: while current belief state doesn’t satisfy goal compute expected info gain of each WRT execute best WRT until termination use observation to update current belief return to initial pose execute final grasp trajectory
Application to grasping with simulated robot arm Initial conditions (ultimately from vision) Object shape is roughly known (contacted vertices should be within ~1 cm of actual positions) Object is on table and pose (x, y, rotation) is roughly known (center of mass std ~5 cm, 30 deg) Achieve specific grasp of object
Observations Fingertips: 6-axis force/torque sensors position normal Additional contact sensors: just contact Swept non-colliding path rules out poses that would have generated contact
Grasping a Box Most likely robot-relative position Where it actually is
Initial belief state
Summed over theta
Tried to move down; finger hit corner
Probability of contact observation at each location
Updated belief
Re-centered
Trying again, with new belief Back up Try again
Final state and observation Observation probabilities Grasp
Updated belief state: Success! Goal: variance < 1 cm x, 15 cm y, 6 deg theta
What if Y coord of grasp matters?
Need explicit information gathering
Simulation Experiments Methods tested: Single open-loop execution of goal-achieving WRT with respect to the most likely state Repeated execution of goal-achieving WRT with respect to the most likely state Online selection of information-gathering and goal-achieving grasps (1-step lookahead)
Box experiments Allowed variation in goal grasp: 1 cm, 1 cm, 5 deg Initial uncertainty: 5 cm, 5 cm, 30 deg
Cup experiments
Cup experiments Goal 1 cm x, 1 cm y, rotation doesn’t matter (no info-grasps used) Start uncertainty 30 deg theta (x,y varies) Increasing uncertainty
Grasping a Brita Pitcher Target grasp: Put one finger through the handle and grasp
Brita Pitcher experiments
Brita Pitcher results Increasing uncertainty
Other recent probabilistic approaches to manipulation Off-line POMDP solution for grasping (Hsiao et al. 2007) Bayesian state estimation using tactile sensors to locate object before grasping (Petrovskaya et al. 2006) Finding a fixed trajectory that is most likely to succeed under uncertainty (Alterovitz et al. 2007, Burns and Brock 2007)
The End.
Timing For Brita Pitcher (2.16 GHz processor, 3.24 GB RAM running Python, times in seconds) 1 cm 3 deg 3 cm 9 deg 5 cm 15 deg 30 deg Grid size 5733 16337 14415 24025 Computing observation matrix (1 traj) 12 33 29 51 1st belief-state update 4 10 19 Choosing 1st info-grasp 9 17 30
Number of Actions Used 1 cm 3 deg 3 cm 9 deg 5 cm 15 deg 30 deg Robust execution of target 1.9 2.5 3.3 3.5 Robust execution with info-grasps not run 4.4 4.1 4.2
Creating Information-gain Trajectories Trajectory generation Generate endpoints, use randomized planner (such as OpenRAVE) to find nominal collision-free path Sweep through entire workspace Choose a small set based on information gain from start uncertainty