Learning From Demonstration. Robot Learning A good control policy u=  (x,t) is often hard to engineer from first principles Reinforcement learning 

Learning From Demonstration

Robot Learning A good control policy u=  (x,t) is often hard to engineer from first principles Reinforcement learning  Learn from trial and error Direct teaching  Have a human guide the robot’s motion Imitation learning  Observe and mimic human demonstrations

Learning “Flavors” Given demonstrations, learn dynamics model  System identification problem Given objective function, optimize policy  Standard optimal control problem  Can be solved using reinforcement learning (simulated demonstrations) Given policy demonstrations, find objective function  Inverse optimal control / inverse reinforcement learning

Learning “Flavors” Demonstrations Performance Objective Plan or Control Policy Inverse optimal control Direct policy learning Optimal Control Dynamics model System ID

Direct Policy Learning Wish to learn u=  (x) Human performances: {(x,u) i for i=1,…n}  System traces Learn the mapping   Nearest neighbors  Regression  Neural networks  Locally weighted regression  Etc…

Nearest Neighbors Observe {(x,u) i for i=1,…n}  (x) = u i* for i* = argmin i ||x-x i || 2 Extension: K-nearest neighbors Query point

Linear Regression Hypothesize   =   k  k (x)   k (x) are basis functions Observe {(x,u) i for i=1,…n} Min   i ||u i –   (x i )|| 2  Least squares problem

Model-based Nonlinear Regression Hypothesize a model class   (x)  E.g.,  are feedback gain parameters Observe {(x,u) i for i=1,…n} Min   i ||u i –   (x i )|| 2  Nonlinear least squares problem

Inverse Optimal Control Parsimony hypothesis: goals are better than policies at describing appropriate behavior in an open world Two stages  Learn the objective from demonstrations  Plan using the objective and sensory input on-line Difficulty: highly underconstrained learning problem

Example

Reinforcement Learning Have immediate reward/cost function R(x,u) Find policy that maximizes expected global return Use trial and error to improve return over time  TD methods  Q-learning

Trajectory Following Problem 1: Learn a reference trajectory from human demonstrations Problem 2: Learn to follow a reference trajectory with dynamics, disturbances

Characterizing Performance Performance Metrics  Optimality: does the learned policy perform optimally (e.g., track the reference well)  Generality: does the learned policy perform well in new scenarios? (under disturbances)

Discussion Learning is useful for exotic devices, deforming environments, dynamic tasks, social robots Theory and benchmarking not developed as well as classic machine learning  Temporal component  Difficulty of gathering training/testing datasets  Nonuniform hardware testbeds

Reminder: IU Robotics Open House April 16, 4-7pm R-House: 919 E 13 th st

Learning From Demonstration. Robot Learning A good control policy u=  (x,t) is often hard to engineer from first principles Reinforcement learning 

Similar presentations

Presentation on theme: "Learning From Demonstration. Robot Learning A good control policy u=  (x,t) is often hard to engineer from first principles Reinforcement learning "— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning From Demonstration. Robot Learning A good control policy u=  (x,t) is often hard to engineer from first principles Reinforcement learning 

Similar presentations

Presentation on theme: "Learning From Demonstration. Robot Learning A good control policy u=  (x,t) is often hard to engineer from first principles Reinforcement learning "— Presentation transcript:

Similar presentations

About project

Feedback