Download presentation
Presentation is loading. Please wait.
Published byEvan Brian White Modified over 9 years ago
1
Learning From Demonstration
2
Robot Learning A good control policy u= (x,t) is often hard to engineer from first principles Reinforcement learning Learn from trial and error Direct teaching Have a human guide the robot’s motion Imitation learning Observe and mimic human demonstrations
3
Demos
4
Learning “Flavors” Given demonstrations, learn dynamics model System identification problem Given objective function, optimize policy Standard optimal control problem Can be solved using reinforcement learning (simulated demonstrations) Given policy demonstrations, find objective function Inverse optimal control / inverse reinforcement learning
5
Learning “Flavors” Demonstrations Performance Objective Plan or Control Policy Inverse optimal control Direct policy learning Optimal Control Dynamics model System ID
6
Direct Policy Learning Wish to learn u= (x) Human performances: {(x,u) i for i=1,…n} System traces Learn the mapping Nearest neighbors Regression Neural networks Locally weighted regression Etc…
7
Nearest Neighbors Observe {(x,u) i for i=1,…n} (x) = u i* for i* = argmin i ||x-x i || 2 Extension: K-nearest neighbors Query point
8
Linear Regression Hypothesize = k k (x) k (x) are basis functions Observe {(x,u) i for i=1,…n} Min i ||u i – (x i )|| 2 Least squares problem
9
Model-based Nonlinear Regression Hypothesize a model class (x) E.g., are feedback gain parameters Observe {(x,u) i for i=1,…n} Min i ||u i – (x i )|| 2 Nonlinear least squares problem
10
Inverse Optimal Control Parsimony hypothesis: goals are better than policies at describing appropriate behavior in an open world Two stages Learn the objective from demonstrations Plan using the objective and sensory input on-line Difficulty: highly underconstrained learning problem
11
Example
12
Reinforcement Learning Have immediate reward/cost function R(x,u) Find policy that maximizes expected global return Use trial and error to improve return over time TD methods Q-learning
13
Trajectory Following Problem 1: Learn a reference trajectory from human demonstrations Problem 2: Learn to follow a reference trajectory with dynamics, disturbances
14
Characterizing Performance Performance Metrics Optimality: does the learned policy perform optimally (e.g., track the reference well) Generality: does the learned policy perform well in new scenarios? (under disturbances)
15
Discussion Learning is useful for exotic devices, deforming environments, dynamic tasks, social robots Theory and benchmarking not developed as well as classic machine learning Temporal component Difficulty of gathering training/testing datasets Nonuniform hardware testbeds
16
Reminder: IU Robotics Open House April 16, 4-7pm R-House: 919 E 13 th st
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.