Learning From Demonstration
Robot Learning A good control policy u= (x,t) is often hard to engineer from first principles Reinforcement learning Learn from trial and error Direct teaching Have a human guide the robot’s motion Imitation learning Observe and mimic human demonstrations
Demos
Learning “Flavors” Given demonstrations, learn dynamics model System identification problem Given objective function, optimize policy Standard optimal control problem Can be solved using reinforcement learning (simulated demonstrations) Given policy demonstrations, find objective function Inverse optimal control / inverse reinforcement learning
Learning “Flavors” Demonstrations Performance Objective Plan or Control Policy Inverse optimal control Direct policy learning Optimal Control Dynamics model System ID
Direct Policy Learning Wish to learn u= (x) Human performances: {(x,u) i for i=1,…n} System traces Learn the mapping Nearest neighbors Regression Neural networks Locally weighted regression Etc…
Nearest Neighbors Observe {(x,u) i for i=1,…n} (x) = u i* for i* = argmin i ||x-x i || 2 Extension: K-nearest neighbors Query point
Linear Regression Hypothesize = k k (x) k (x) are basis functions Observe {(x,u) i for i=1,…n} Min i ||u i – (x i )|| 2 Least squares problem
Model-based Nonlinear Regression Hypothesize a model class (x) E.g., are feedback gain parameters Observe {(x,u) i for i=1,…n} Min i ||u i – (x i )|| 2 Nonlinear least squares problem
Inverse Optimal Control Parsimony hypothesis: goals are better than policies at describing appropriate behavior in an open world Two stages Learn the objective from demonstrations Plan using the objective and sensory input on-line Difficulty: highly underconstrained learning problem
Example
Reinforcement Learning Have immediate reward/cost function R(x,u) Find policy that maximizes expected global return Use trial and error to improve return over time TD methods Q-learning
Trajectory Following Problem 1: Learn a reference trajectory from human demonstrations Problem 2: Learn to follow a reference trajectory with dynamics, disturbances
Characterizing Performance Performance Metrics Optimality: does the learned policy perform optimally (e.g., track the reference well) Generality: does the learned policy perform well in new scenarios? (under disturbances)
Discussion Learning is useful for exotic devices, deforming environments, dynamic tasks, social robots Theory and benchmarking not developed as well as classic machine learning Temporal component Difficulty of gathering training/testing datasets Nonuniform hardware testbeds
Reminder: IU Robotics Open House April 16, 4-7pm R-House: 919 E 13 th st