Laboratory for Perceptual Robotics – Department of Computer Science Hierarchical Mechanisms for Robot Programming Shiraj Sen Stephen Hart Rod Grupen Laboratory for Perceptual Robotics University of Massachusetts Amherst May 30, 2008 NEMS ‘08
2 Laboratory for Perceptual Robotics – Department of Computer Science Outline Hierarchical mechanisms for robot programming representation programming Action Potential functions Value functions State representation user defined reinforcement learning intrinsic extrinsic
3 Laboratory for Perceptual Robotics – Department of Computer Science Hierarchical Actions Σ G H Σ G H Σ G H force velocity references feedback signals ϕ potential fields Φ value functions greedy traversal avoids local minimum programs closed loop primitive actions
4 Laboratory for Perceptual Robotics – Department of Computer Science Primitive Action Programming Interface Sensory Error () Visual ( u ref ) Tactile ( f ref ) Configuration variables ( θ ref ) Operational Space( x ref ) Potential Functions () Spring potential fields ( ϕ h ) Collision-free motion fields ( ϕ c ) Kinematic conditioning fields ( ϕ cond ) Motor Variables () Subsets of : Configuration Variables Operational Space Variables primitive actions: a = Nullspace Projection a 1 a 2
5 Laboratory for Perceptual Robotics – Department of Computer Science State Representation Discrete abstraction of action dynamics. 4-level logic in control predicate p i no reference ( ) convergence unknown X descending gradient
6 Laboratory for Perceptual Robotics – Department of Computer Science Hierarchical Programming A program is defined as a MDP over a vector of controller predicates: S = p 1 … p N Absorbing states in the value function capture “convergence” of programs. X Learn value functions using reinforcement learning
7 Laboratory for Perceptual Robotics – Department of Computer Science Stack Insert Grasp Touch Catalog Intrinsic Reward Goal: build deep control knowledge Reward controllable interaction with the world controllers with direct feedback from the external world. Track X convergence event X - 1 0
8 Laboratory for Perceptual Robotics – Department of Computer Science Experimental Demonstration Motor units Two 7-DOF Barrett WAMs Two 4-DOF Barrett Hands 2-DOF pan/tilt stereo head Sensory feedback Visual Hue Saturation Intensity Texture Tactile 6-axis finger-tip F/T sensors Proprioceptive Dexter
9 Laboratory for Perceptual Robotics – Department of Computer Science STAGE 1: SaccadeTrack - 25 Learning Episodes a track a saccade X 1 X 0 1 X 0 X X - X S st = p saccade p track rewarding action Track-saturation
10 Laboratory for Perceptual Robotics – Department of Computer Science S rg = p st p reach p grab STAGE 2: ReachGrab - 25 Learning Episodes rewarding action Touch Track-saturation
11 Laboratory for Perceptual Robotics – Department of Computer Science STAGE 2: ReachGrab - 25 Learning Episodes Touch Track-saturation
12 Laboratory for Perceptual Robotics – Department of Computer Science STAGE 3: VisualInspect - 25 Learning Episodes S vi = p rg p cond p track(blue) Touch Track-saturation Track-blue rewarding action
13 Laboratory for Perceptual Robotics – Department of Computer Science STAGE 3: VisualInspect - 25 Learning Episodes Touch Track-saturation Track-blue
14 Laboratory for Perceptual Robotics – Department of Computer Science STAGE 4: Grasp – User Defined Reward X X X X X X ReachGrab X X 0 0 X 1 1 X 1 0 X 0 1 a moment a force Touch Track-saturation Grasp Track-blue S grasp = p rg p moment p force rewarding action
15 Laboratory for Perceptual Robotics – Department of Computer Science STAGE 5: PickAndPlace – User Defined Reward a transport a moment X X X X Grasp X 0 - X 0 0 X X X X 1 1 X 1 0 S pnp = p g p transport p moment rewarding action
16 Laboratory for Perceptual Robotics – Department of Computer Science Conclusions Mechanisms for creating hierarchical programs. recursive formulation of potential functions and value functions. control theoretic representation for action, state, and intrinsic reward. Experimental demonstration of programming manipulation skills using staged learning episodes. Intrinsic reward pushes out new behavior and models the affordances of objects.
17 Laboratory for Perceptual Robotics – Department of Computer Science Thank You