1 An Application of Reinforcement Learning to Aerobatic Helicopter Greg McChesney Texas Tech University Apr 08, 2009 CS5331: Autonomous.

Slides:



Advertisements
Similar presentations
Introductory Control Theory I400/B659: Intelligent robotics Kris Hauser.
Advertisements

David Rosen Goals  Overview of some of the big ideas in autonomous systems  Theme: Dynamical and stochastic systems lie at the intersection of mathematics.
Design of Experiments Lecture I
Markov Decision Process
Least squares CS1114
1. Algorithms for Inverse Reinforcement Learning 2
NONLINEAR BACKSTEPPING CONTROL WITH OBSERVER DESIGN FOR A 4 ROTORS HELICOPTER L. Mederreg, F. Diaz and N. K. M’sirdi LRV Laboratoire de Robotique de Versailles,
Adam Coates, Pieter Abbeel, and Andrew Y. Ng Stanford University ICML 2008 Learning for Control from Multiple Demonstrations TexPoint fonts used in EMF.
Learning Parameterized Maneuvers for Autonomous Helicopter Flight Jie Tang, Arjun Singh, Nimbus Goehausen, Pieter Abbeel UC Berkeley.
Robust control Saba Rezvanian Fall-Winter 88.
Probabilistic Robotics Course Presentation Outline 1. Introduction 2. The Bayes Filter 3. Non Parametric Filters 4. Gausian Filters 5. EKF Map Based Localization.
Nonlinear Optimization for Optimal Control
What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.
Markov Decision Processes
Apprenticeship learning for robotic control Pieter Abbeel Stanford University Joint work with Andrew Y. Ng, Adam Coates, Morgan Quigley.
Using Inaccurate Models in Reinforcement Learning Pieter Abbeel, Morgan Quigley and Andrew Y. Ng Stanford University.
Introduction to Kalman Filter and SLAM Ting-Wei Hsu 08/10/30.
Apprenticeship Learning by Inverse Reinforcement Learning Pieter Abbeel Andrew Y. Ng Stanford University.
An Application of Reinforcement Learning to Autonomous Helicopter Flight Pieter Abbeel, Adam Coates, Morgan Quigley and Andrew Y. Ng Stanford University.
Apprenticeship Learning for the Dynamics Model Overview  Challenges in reinforcement learning for complex physical systems such as helicopters:  Data.
May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing1 A Reinforcement Learning Method Based on Adaptive Simulated Annealing.
Ray. A. DeCarlo School of Electrical and Computer Engineering Purdue University, West Lafayette, IN Aditya P. Mathur Department of Computer Science Friday.
Octopus Arm Mid-Term Presentation Dmitry Volkinshtein & Peter Szabo Supervised by: Yaki Engel.
7. Experiments 6. Theoretical Guarantees Let the local policy improvement algorithm be policy gradient. Notes: These assumptions are insufficient to give.
Exploration and Apprenticeship Learning in Reinforcement Learning Pieter Abbeel and Andrew Y. Ng Stanford University.
Markov Decision Processes
Markov Decision Processes Value Iteration Pieter Abbeel UC Berkeley EECS TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Control & Robotics Lab  Presented By: Yishai Eilat & Arnon Sattinger  Instructor: Shie Mannor Project Presentation.
Pieter Abbeel and Andrew Y. Ng Reinforcement Learning and Apprenticeship Learning Pieter Abbeel and Andrew Y. Ng Stanford University.
Our acceleration prediction model Predict accelerations: f : learned from data. Obtain velocity, angular rates, position and orientation from numerical.
Camera Parameters and Calibration. Camera parameters From last time….
Environmental Boundary Tracking Using Multiple Autonomous Vehicles Mayra Cisneros & Denise Lewis Mentor: Martin Short July 16, 2008.
ROBOT MAPPING AND EKF SLAM
Class material vs. Lab material – Lab 2, 3 vs. 4,5, 6 BeagleBoard / TI / Digilent GoPro.
Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction.
Optimization-Based Full Body Control for the DARPA Robotics Challenge Siyuan Feng Mar
1 Robot Networking Greg McChesney Texas Tech University Apr 21, 2009 CS5331: Autonomous Mobile Robots.
Introduction Many decision making problems in real life
Landing a UAV on a Runway Using Image Registration Andrew Miller, Don Harper, Mubarak Shah University of Central Florida ICRA 2008.
Robotics. Foundational Methods Advanced Applications.
Apprenticeship Learning for Robotics, with Application to Autonomous Helicopter Flight Pieter Abbeel Stanford University Joint work with: Andrew Y. Ng,
Karman filter and attitude estimation Lin Zhong ELEC424, Fall 2010.
Apprenticeship Learning for Robotic Control Pieter Abbeel Stanford University Joint work with: Andrew Y. Ng, Adam Coates, J. Zico Kolter and Morgan Quigley.
CSE 3802 / ECE 3431 Numerical Methods in Scientific Computation
1 Markov Decision Processes Basics Concepts Alan Fern.
Fuzzy Reinforcement Learning Agents By Ritesh Kanetkar Systems and Industrial Engineering Lab Presentation May 23, 2003.
Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 6.1: Bayes Filter Jürgen Sturm Technische Universität München.
Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.
Spatio-Temporal Case-Based Reasoning for Behavioral Selection Maxim Likhachev and Ronald Arkin Mobile Robot Laboratory Georgia Tech.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Pioneers in Engineering, UC Berkeley Pioneers in Engineering Week 8: Sensors and Feedback.
Real-Time Simultaneous Localization and Mapping with a Single Camera (Mono SLAM) Young Ki Baik Computer Vision Lab. Seoul National University.
Lecture 5: Basic Dynamical Systems CS 344R: Robotics Benjamin Kuipers.
Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
City College of New York 1 John (Jizhong) Xiao Department of Electrical Engineering City College of New York Mobile Robot Control G3300:
Quadcopters A CEV Talk. Agenda Flight PreliminariesWhy Quadcopters The Quadcopter SystemStability: The NotionSensors and FusionControl AlgorithmsThe Way.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Flocks of Robots Coordinated Multi-robot Systems Dylan A. Shell Distributed AI Robotics Lab Department of Computer Science & Engineering Texas A&M University.
Get your software working before putting it on the robot!
School of Industrial and Systems Engineering, Georgia Institute of Technology 1 Defuzzification Filters and Applications to Power System Stabilization.
1 How the Visual Cortex Recognizes Objects, The Tale of the Standard Model Greg McChesney Texas Tech University Jan 23, 2009 CS5331:
A Crash Course in Reinforcement Learning
Reinforcement Learning (1)
Control Loops Nick Schatz FRC 3184.
Timothy Boger and Mike Korostelev
Course Logistics CS533: Intelligent Agents and Decision Making
Reinforcement Learning
Reinforcement Nisheeth 18th January 2019.
Reinforcement Learning (2)
Presentation transcript:

1 An Application of Reinforcement Learning to Aerobatic Helicopter Greg McChesney Texas Tech University Apr 08, 2009 CS5331: Autonomous Mobile Robots

Overview  Creating a robot that can fly autonomously  Software developed at Stanford as part of their AI lab  This paper is slightly outdated as many new maneuvers have been created. Apr 08, 2009 CS5331: Autonomous Mobile Robots2

Learning Approach  Apprenticeship Collect data from human trying maneuver (multiple times) Learn a model from the data Find controller than can simulate based on model Test on helicopter (pray it doesn’t crash) Apr 08, 2009 CS5331: Autonomous Mobile Robots3

Helicopters State  Position  Velocity  Angular Velocity  Controlled with 4 dimensions Cyclic pitch Tail rotor  Take gravity out when calculating the model Apr 08, 2009 CS5331: Autonomous Mobile Robots4

Controller Design  Use a Markov decision process  Sextuple (S,A,T,H,s(0),R) S-set of states A-set of actions (inputs) T-dynamic model-set of probability distributions for the next state H-horizon or number of time steps of interest s(0)-initial state R-reward function Apr 08, 2009 CS5331: Autonomous Mobile Robots5

Differential Dynamic Programming(DDP)  Compute the linear approximation  Compute the optimal solution to the linear quadratic regulator Must take into account error state Cost for change in input-needed in real testing Apr 08, 2009 CS5331: Autonomous Mobile Robots6

DDP-Continued  2 phases DDP to find open loop input sequence Use DDP again refining the inputs as a deviation from the nominal open-loop input sequence  Integral control-take into account wind and errors in the model Apr 08, 2009 CS5331: Autonomous Mobile Robots7

Rewards  24 features  Used inverse reinforcement learning  Rewards from inverse reinforcement usually did not produce correct result  Took inverse results and manually tuned them to get good results Apr 08, 2009 CS5331: Autonomous Mobile Robots8

Helicopter  Xcell Tempest  54” long  19” high  13 lbs  Two-stroke engine  Orientation sensors  GPS-doesn’t work during flips Apr 08, 2009 CS5331: Autonomous Mobile Robots9

Apr 08, 2009 CS5331: Autonomous Mobile Robots10

Flip Apr 08, 2009 CS5331: Autonomous Mobile Robots11

Roll Apr 08, 2009 CS5331: Autonomous Mobile Robots12

Tail-In Funnel Apr 08, 2009 CS5331: Autonomous Mobile Robots13

Nose-In Funnel Apr 08, 2009 CS5331: Autonomous Mobile Robots14

Questions  Motivations/Who pays for it I can see applications in the defense sector DARPA  Could more maneuvers be done just by changing some parameters? Probably not because the filter is learned based on a model so you would need to create a new model Apr 08, 2009 CS5331: Autonomous Mobile Robots15

More Questions  What's the relationship between reinforcement learning and MDP? Not Sure  Could a helicopter like this operate in the West Texas wind storms? Apr 08, 2009 CS5331: Autonomous Mobile Robots16

Fun Stuff  Videos: dxqn0fcnE dxqn0fcnE  Helicopter elicopterkits/1025_Spectra_G/1025_kit _main.asp elicopterkits/1025_Spectra_G/1025_kit _main.asp Apr 08, 2009 CS5331: Autonomous Mobile Robots17