Apprenticeship Learning for the Dynamics Model Overview  Challenges in reinforcement learning for complex physical systems such as helicopters:  Data.

Apprenticeship Learning for the Dynamics Model Overview  Challenges in reinforcement learning for complex physical systems such as helicopters:  Data collection: Aggressive exploration is dangerous.  Difficult to specify the proper reward function for a given task.  We present apprenticeship learning algorithms which use an expert demonstration which:  Do not require explicit exploration.  Do not require an explicit reward function specification.  Experimental results:  Demonstrate effectiveness of algorithms on a highly challenging control problem.  Significantly extend the state of the art in autonomous helicopter flight. In particular, first completion of autonomous stationary forward flips, stationary sideways rolls, nose-in funnels and tail-in funnels. Complex tasks: hard to specify the reward function. STANFORD An Application of Reinforcement Helicopter Pieter Abbeel, Adam Coates, Key question: How to fly helicopter for data collection? How to ensure that entire flight envelope is covered by the data collection process? State-of-the-art: E 3 algorithm, Kearns and Singh (2002). (And its variants/extensions: Kearns and Koller, 1999; Kakade, Kearns and Langford, 2003; Brafman and Tennenholtz, 2002.) Can we avoid explicit exploration? Have good model of dynamics? NO “Explore” YES “Exploit” Expert human pilot flight (a 1, s 1, a 2, s 2, a 3, s 3, ….) Learn P sa (a 1, s 1, a 2, s 2, a 3, s 3, ….) Autonomous flight Learn P sa Dynamics Model P sa Reward Function R Reinforcement Learning Control policy  Take away message: In the apprenticeship learning setting, i.e., when we have an expert demonstration, we do not need explicit exploration to perform as well as the expert. Theorem. Assuming we have a polynomial number of teacher demonstrations, then the apprenticeship learning algorithm will return a policy that performs as well as the teacher within a polynomial number of iterations. [Abbeel & Ng, 2005 for more details.] Dynamics Model P sa Reward Function R Reinforcement Learning Control policy  Apprenticeship Learning for the Reward Function  Reward function can be very difficult to specify. E.g., for our helicopter control problem we have:  R(s) = c 1 * (position error) 2 + c 2 * (orientation error) 2 + c 3 * (velocity error) 2 + c 4 *(angular rate error) 2 + … + c 25 * (inputs) 2.  Difficult to specify the proper reward function for a given task.  Can we avoid the need to specify the reward function?  Our approach: [Abbeel & Ng, 2004] is based on inverse reinforcement learning [Ng & Russell, 2000].  Returns policy with performance as good as the expert as measured according to the expert’s unknown reward function in polynomial number of iterations.  Inverse RL Algorithm:  For t = 1,2,…  Inverse RL step: Estimate expert’s reward function R(s)= w T  (s) such that under R(s) the expert performs better than all previously found policies {  i }.  RL step: Compute optimal policy  t for the estimated reward w.  Related work:  Imitation learning: learn to predict expert’s actions as a function of states. Usually lacks strong performance guarantees. [E.g.,. Pomerleau, 1989; Sammut et al., 1992; Kuniyoshi et al., 1994; Demiris & Hayes, 1994; Amit & Mataric, 2002; Atkeson & Schaal, 1997; …]  Max margin planning, Ratliff et al., 2006. Reinforcement Learning and Apprenticeship Learning Data collection: aggressive exploration is dangerous

Stationary Rolls Stationary Flips STANFORD Tail-In Funnels Nose-In Funnels Learning to Autonomous Flight Morgan Quigley and Andrew Y. Ng Experimental Results Video available. Conclusion  Apprenticeship learning for the dynamics model avoids explicit exploration in our experiments.  Procedure based on apprenticeship learning (inverse RL) for the reward function gives performance similar to human pilots.  Our results significantly extend state of the art in autonomous helicopter flight: first autonomous completion of stationary flips and rolls, tail-in funnels and nose-in funnels. ADDITIONAL REFERENCES (SPECIFIC TO AUTONOMOUS HELICOPTER FLIGHT) [1] J. Bagnell and J. Schneider. Autonomous helicopter control using reinforcement learning policy search methods. In International Conference on Robotics and Automation. IEEE, 2001. [2] V. Gavrilets, I. Martinos, B. Mettler, and E. Feron. Control logic for automated aerobatic flight of miniature helicopter. In AIAA Guidance, Navigation and Control Conference, 2002. [3] M. La Civita. Integrated Modeling and Robust Control for Full-Envelope Flight of Robotic Helicopters. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, 2003. [4] M. La Civita, G. Papgeorgiou, W. C. Messner, and T. Kanade. Design and flight testing of a high-bandwidth H-infinity loop shaping controller for a robotic helicopter. Journal of Guidance, Control, and Dynamics, 29(2):485-494, March-April 2006. [5] B. Mettler, M. Tischler, and T. Kanade. System identification of small-size unmanned helicopter dynamics. In American Helicopter Society, 55 th Forum, 1999. [6] Jonathan M. Roberts, Peter I. Corke, and Gregg Buskey. Low-cost flight control system for a small autonomous helicopter. In IEEE Int’l Conf. On Robotics and Automation, 2003. [7] S. Saripalli, J. Montgomery, and G. Sukhatme. Visually-guided landing of an unmanned aerial vehicle, 2003. RL/Optimal Control  We use differential dynamic programming.  We penalize for high frequency controls.  We include integrated orientation errors in the cost.  (see paper for more details)

Apprenticeship Learning for the Dynamics Model Overview  Challenges in reinforcement learning for complex physical systems such as helicopters:  Data.

Similar presentations

Presentation on theme: "Apprenticeship Learning for the Dynamics Model Overview  Challenges in reinforcement learning for complex physical systems such as helicopters:  Data."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Apprenticeship Learning for the Dynamics Model Overview  Challenges in reinforcement learning for complex physical systems such as helicopters:  Data.

Similar presentations

Presentation on theme: "Apprenticeship Learning for the Dynamics Model Overview  Challenges in reinforcement learning for complex physical systems such as helicopters:  Data."— Presentation transcript:

Similar presentations

About project

Feedback