Learning From Demonstration. Robot Learning A good control policy u=  (x,t) is often hard to engineer from first principles Reinforcement learning 

Slides:



Advertisements
Similar presentations
Machine Learning for Vision-Based Motion Analysis Learning pullback metrics for linear models Oxford Brookes Vision Group Oxford Brookes University 17/10/2008.
Advertisements

Introductory Control Theory I400/B659: Intelligent robotics Kris Hauser.
Neural Simulation and Control.. Simulation Input/Output models Proces u(k) y(k+d) d(k) The NARMA model:
An Introduction of Support Vector Machine
Support Vector Machines
Institute for Theoretical Physics and Mathematics Tehran January, 2006 Value based decision making: behavior and theory.
Reinforcement learning (Chapter 21)
DARPA Mobile Autonomous Robot SoftwareMay Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial.
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
1 Finding good models for model-based control and optimization Paul Van den Hof Okko Bosgra Delft Center for Systems and Control 17 July 2007 Delft Center.
Decision Making: An Introduction 1. 2 Decision Making Decision Making is a process of choosing among two or more alternative courses of action for the.
Behavioral Models – Direct Instruction. 1. Development is a direct result of outside experiences 1. Development is a direct result of outside experiences.
Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula.
Instance Based Learning
Reinforcement Learning Rafy Michaeli Assaf Naor Supervisor: Yaakov Engel Visit project’s home page at: FOR.
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
1 MF-852 Financial Econometrics Lecture 2 Matrix Operations in Econometrics, Optimization with Excel Roy J. Epstein Fall 2003.
Identifiability of biological systems Afonso Guerra Assunção Senra Paula Freire Susana Barbosa.
INSTANCE-BASE LEARNING
Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.
CS Instance Based Learning1 Instance Based Learning.
Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.
Function Approximation for Imitation Learning in Humanoid Robots Rajesh P. N. Rao Dept of Computer Science and Engineering University of Washington,
Crowdsourcing Predictors of Behavioral Outcomes. Abstract Generating models from large data sets—and deter¬mining which subsets of data to mine—is becoming.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Introductory Control Theory CS 659 Kris Hauser.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Chapter 16. Basal Ganglia Models for Autonomous Behavior Learning in Creating Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from Humans.
An Introduction to Support Vector Machines (M. Law)
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Reinforcement Learning Control with Robust Stability Chuck Anderson, Matt Kretchmar, Department of Computer Science, Peter Young, Department of Electrical.
Knowledge Learning by Using Case Based Reasoning (CBR)
Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.
Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.
Neural Nets: Something you can use and something to think about Cris Koutsougeras What are Neural Nets What are they good for Pointers to some models and.
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Lecture 5 Neural Control
Control Based on Instantaneous Linearization Eemeli Aro
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Reinforcement learning (Chapter 21)
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Ghent University Pattern recognition with CNNs as reservoirs David Verstraeten 1 – Samuel Xavier de Souza 2 – Benjamin Schrauwen 1 Johan Suykens 2 – Dirk.
Planning Under Uncertainty. Sensing error Partial observability Unpredictable dynamics Other agents.
Spatial Point Processes Eric Feigelson Institut d’Astrophysique April 2014.
CS 8751 ML & KDDInstance Based Learning1 k-Nearest Neighbor Locally weighted regression Radial basis functions Case-based reasoning Lazy and eager learning.
Psychology and Neurobiology of Decision-Making under Uncertainty Angela Yu March 11, 2010.
Planning Policies Using Dynamic Optimization  Chris Atkeson 2012.
Optimal Decision-Making in Humans & Animals Angela Yu March 05, 2009.
Calibration and Learning ECE 383 / MEMS 442: Introduction to Robotics Kris Hauser.
Machine Learning Supervised Learning Classification and Regression
A Crash Course in Reinforcement Learning
Reinforcement learning (Chapter 21)
Reinforcement learning (Chapter 21)
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
EHPV® Technology Advanced Control Techniques for Electro-Hydraulic Control Valves by Patrick Opdenbosch Goals Develop a smarter and self-contained valve.
Overview of Supervised Learning
EHPV Technology Auto-Calibration and Control Applied to Electro-Hydraulic Valves by Patrick Opdenbosch GOALS Development of a general formulation for control.
End-to-end Driving via Conditional Imitation Learning
Optimal Control Theory
Q4 : How does Netflix recommend movies?
Goals of Psychology!.
Using Artificial Neural Networks and Support Vector Regression to Model the Lyapunov Exponent Adam Maus.
General Aspects of Learning
Dr. Unnikrishnan P.C. Professor, EEE
Nonlinear Fitting.
Bayes and Kalman Filter
Unsupervised Perceptual Rewards For Imitation Learning
Presentation transcript:

Learning From Demonstration

Robot Learning A good control policy u=  (x,t) is often hard to engineer from first principles Reinforcement learning  Learn from trial and error Direct teaching  Have a human guide the robot’s motion Imitation learning  Observe and mimic human demonstrations

Demos

Learning “Flavors” Given demonstrations, learn dynamics model  System identification problem Given objective function, optimize policy  Standard optimal control problem  Can be solved using reinforcement learning (simulated demonstrations) Given policy demonstrations, find objective function  Inverse optimal control / inverse reinforcement learning

Learning “Flavors” Demonstrations Performance Objective Plan or Control Policy Inverse optimal control Direct policy learning Optimal Control Dynamics model System ID

Direct Policy Learning Wish to learn u=  (x) Human performances: {(x,u) i for i=1,…n}  System traces Learn the mapping   Nearest neighbors  Regression  Neural networks  Locally weighted regression  Etc…

Nearest Neighbors Observe {(x,u) i for i=1,…n}  (x) = u i* for i* = argmin i ||x-x i || 2 Extension: K-nearest neighbors Query point

Linear Regression Hypothesize   =   k  k (x)   k (x) are basis functions Observe {(x,u) i for i=1,…n} Min   i ||u i –   (x i )|| 2  Least squares problem

Model-based Nonlinear Regression Hypothesize a model class   (x)  E.g.,  are feedback gain parameters Observe {(x,u) i for i=1,…n} Min   i ||u i –   (x i )|| 2  Nonlinear least squares problem

Inverse Optimal Control Parsimony hypothesis: goals are better than policies at describing appropriate behavior in an open world Two stages  Learn the objective from demonstrations  Plan using the objective and sensory input on-line Difficulty: highly underconstrained learning problem

Example

Reinforcement Learning Have immediate reward/cost function R(x,u) Find policy that maximizes expected global return Use trial and error to improve return over time  TD methods  Q-learning

Trajectory Following Problem 1: Learn a reference trajectory from human demonstrations Problem 2: Learn to follow a reference trajectory with dynamics, disturbances

Characterizing Performance Performance Metrics  Optimality: does the learned policy perform optimally (e.g., track the reference well)  Generality: does the learned policy perform well in new scenarios? (under disturbances)

Discussion Learning is useful for exotic devices, deforming environments, dynamic tasks, social robots Theory and benchmarking not developed as well as classic machine learning  Temporal component  Difficulty of gathering training/testing datasets  Nonuniform hardware testbeds

Reminder: IU Robotics Open House April 16, 4-7pm R-House: 919 E 13 th st