Function Approximation for Imitation Learning in Humanoid Robots Rajesh P. N. Rao Dept of Computer Science and Engineering University of Washington,

Name: Function Approximation for Imitation Learning in Humanoid Robots Rajesh P. N. Rao Dept of Computer Science and Engineering University of Washington,
Uploaded: 2017-07-13T11:06:12+00:00
Duration: PTM6S50
Channel: Jeremy Russell
Description: Function Approximation for Imitation Learning in Humanoid Robots Rajesh P. N. Rao Dept of Computer Science and Engineering University of Washington,

Function Approximation for Imitation Learning in Humanoid Robots Rajesh P. N. Rao Dept of Computer Science and Engineering University of Washington, Seattle neural.cs.washington.edu Students: Rawichote Chalodhorn, David Grimes Funding: ONR, NSF, Packard Foundation

The Problem: Robotic Imitation of Human Actions
TODO: 1) Teacher (David Grimes) HOAP-2 Humanoid Robot (Morpheus or Mo)

Example of Motion Capture Data
Motion Capture Sequence Attempted Imitation

Goals Learn from only observations of teacher states
Expert does not control robot Also called “implicit imitation” (Price & Boutilier, 1999) Similar to how humans learn from imitation Avoid hand-coded physics-based models Learn dynamics in terms of sensory consequences of executed actions Use teacher demonstration to restrict search space of feasible actions

Step 1: Kinematic mapping
Need to solve the “correspondence problem” Solved by assuming markers are on scaled version of robot body Standard inverse kinematics recovers joint angles for motion

Step 2: Dimensionality Reduction
Humanoid robots have large DOF, making action optimization intractable HOAP-2 has 25 DOF Fortunately, most actions are highly redundant Can use dimensionality reduction techniques (e.g., PCA) to represent states and actions

Posture Representation using Eigenposes

Eigenposes for Walking

Step 3: Learning Forward Models using Function Approximation
Basic Idea: 1. Learn forward model in the neighborhood of teacher demonstration Use function approximation techniques to map actions to observed sensory consequences 2. Use the learned model to infer stable actions for imitation 3. Iterate between 1 and 2 for higher accuracy

Approach 1: RBF Networks for Deterministic Action Selection
Radial Basis Function (RBF) network used to learn the n-th order Markov function: st is the sensory state vector E.g., st = t (3D gyroscope signal) at is the action vector in latent space E.g., Servo joint angle commands in latent space

Action Selection using the Learned Function
Select optimal action for next time step t:  measures torso stability based on predicted gyroscope signals: Search for optimal action at* limited to local region around teacher trajectory in subspace (Chalodhorn et al., Humanoids, 2005; IJCAI 2007; IROS, 2009)

Example: Learning to Walk
Human motion capture data Unoptimized (kinematic) imitation

Example: Learning to Walk
Motion scaling Take baby steps first (literally!) Final Result (Chalodhorn et al., IJCAI 2007)

Result: Learning to Walk
Human Motion Capture Optimized Stable Walk

Approach 2: Gaussian Processes for Probabilistic Action Selection
Dynamic Bayesian Network (DBN) for Imitation [Slice at time t] Ot are observations of states St St = low-D joint space, gyro, foot pressure readings Ct are constraints on states (e.g., gyroscope values near zero) (Grimes et al., RSS 2006; NIPS 2007; IROS 2007; IROS 2008)

DBN for Imitative Learning
Gaussian Process-based Forward Model (input [st-1,at]): (Grimes, Chalodhorn, & Rao, RSS 2006)

Action Inference using Nonparametric Belief Propagation
Maximum marginal posterior actions Evidence (blue nodes)

Summary of Approach Learning and action inference are interleaved to yield progressively more accurate forward models and actions

Example of Learning

Progression of Imitative Learning

Result after Learning Human Action Imitation
(Grimes, Rashid, & Rao, NIPS 2007)

Other Examples

From Planning to Policy Learning
Behaviors shown in the previous slides were open-loop, based on planning by inference Can we learn closed-loop “reactive” behaviors? Idea: Learn state-to-action mappings (“policies”) based on the final optimized output of the planner and resulting sensory measurements

Policy Learning using Gaussian Processes
For a parameterized task T(), watch demonstrations for particular values of  E.g., Teacher lifting objects of different weight Parameter  not given but intrinsically encoded in sensory measurements Use inference-based planning to infer stable actions at and states st for demonstrated values of  Learn Gaussian process policy based on {st, at}: (Grimes & Rao, IROS 2008)

Example: Learning to Lift Objects of Different Weights

Generalization by Gaussian Process Policy

Generalizing to a Novel Object

Summary and Conclusions
Stable full-body human imitation in a humanoid robot may be achievable without a physics-based model Function approximation techniques play a crucial role in learning a forward model and in action inference RBF networks, Gaussian processes Function approximation also used to learn policies for reactive behavior Dimensionality reduction using PCA (via “eigenposes”) helps keep learning and inference tractable Challenges: Scaling up to large number of actions, smooth transition between actions, hierarchical control

Function Approximation for Imitation Learning in Humanoid Robots Rajesh P. N. Rao Dept of Computer Science and Engineering University of Washington,

Similar presentations

Presentation on theme: "Function Approximation for Imitation Learning in Humanoid Robots Rajesh P. N. Rao Dept of Computer Science and Engineering University of Washington,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Function Approximation for Imitation Learning in Humanoid Robots Rajesh P. N. Rao Dept of Computer Science and Engineering University of Washington,

Similar presentations

Presentation on theme: "Function Approximation for Imitation Learning in Humanoid Robots Rajesh P. N. Rao Dept of Computer Science and Engineering University of Washington,"— Presentation transcript:

Similar presentations

About project

Feedback