Our acceleration prediction model Predict accelerations: f : learned from data. Obtain velocity, angular rates, position and orientation from numerical.

Slides:



Advertisements
Similar presentations
Formal Computational Skills
Advertisements

ABS Control Project Ondrej Ille Pre-bachelor Project.
COMP Robotics: An Introduction
Rigid Body Dynamics Jim Van Verth
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
1. Algorithms for Inverse Reinforcement Learning 2
« هو اللطیف » By : Atefe Malek. khatabi Spring 90.
PRINCIPLES OF FLIGHT CHAPTER 7 THE HELICOPTER.
1 An Application of Reinforcement Learning to Aerobatic Helicopter Greg McChesney Texas Tech University Apr 08, 2009 CS5331: Autonomous.
NONLINEAR BACKSTEPPING CONTROL WITH OBSERVER DESIGN FOR A 4 ROTORS HELICOPTER L. Mederreg, F. Diaz and N. K. M’sirdi LRV Laboratoire de Robotique de Versailles,
PEGASUS: A policy search method for large MDP’s and POMDP’s Andrew Ng, Michael Jordan Presented by: Geoff Levine.
Adam Coates, Pieter Abbeel, and Andrew Y. Ng Stanford University ICML 2008 Learning for Control from Multiple Demonstrations TexPoint fonts used in EMF.
Learning Parameterized Maneuvers for Autonomous Helicopter Flight Jie Tang, Arjun Singh, Nimbus Goehausen, Pieter Abbeel UC Berkeley.
Apprenticeship Learning for Robotic Control, with Applications to Quadruped Locomotion and Autonomous Helicopter Flight Pieter Abbeel Stanford University.
Learning from Demonstrations Jur van den Berg. Kalman Filtering and Smoothing Dynamics and Observation model Kalman Filter: – Compute – Real-time, given.
Dept. Of Mechanical and Nuclear Engineering, Penn State University Vehicle Dynamic Modeling for the Prediction and Prevention of Vehicle Rollover A Comparative,
Oklahoma State University Generative Graphical Models for Maneuvering Object Tracking and Dynamics Analysis Xin Fan and Guoliang Fan Visual Computing and.
Apprenticeship learning for robotic control Pieter Abbeel Stanford University Joint work with Andrew Y. Ng, Adam Coates, Morgan Quigley.
Tyre and Vehicle Model Identification using Identifying Kalman Filters
Apprenticeship Learning Pieter Abbeel Stanford University In collaboration with: Andrew Y. Ng, Adam Coates, J. Zico Kolter, Morgan Quigley, Dmitri Dolgov,
Learning First Order Markov Models for Control Pieter Abbeel and Andrew Y. Ng, Poster 48 Tuesday Consider modeling an autonomous RC-car’s dynamics from.
Using Inaccurate Models in Reinforcement Learning Pieter Abbeel, Morgan Quigley and Andrew Y. Ng Stanford University.
Ch. 7: Dynamics.
An Application of Reinforcement Learning to Autonomous Helicopter Flight Pieter Abbeel, Adam Coates, Morgan Quigley and Andrew Y. Ng Stanford University.
Apprenticeship Learning for the Dynamics Model Overview  Challenges in reinforcement learning for complex physical systems such as helicopters:  Data.
Forward Kinematics.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Discriminative Training of Kalman Filters P. Abbeel, A. Coates, M
7. Experiments 6. Theoretical Guarantees Let the local policy improvement algorithm be policy gradient. Notes: These assumptions are insufficient to give.
Manipulator Dynamics Amirkabir University of Technology Computer Engineering & Information Technology Department.
Exploration and Apprenticeship Learning in Reinforcement Learning Pieter Abbeel and Andrew Y. Ng Stanford University.
Single Point of Contact Manipulation of Unknown Objects Stuart Anderson Advisor: Reid Simmons School of Computer Science Carnegie Mellon University.
Pieter Abbeel and Andrew Y. Ng Reinforcement Learning and Apprenticeship Learning Pieter Abbeel and Andrew Y. Ng Stanford University.
Curve fit noise=randn(1,30); x=1:1:30; y=x+noise ………………………………… [p,s]=polyfit(x,y,1);
3.7. O THER G AME P HYSICS A PPROACHES Overview of other game engine physics approaches.
IRP Presentation Spring 2009 Andrew Erdman Chris Sande Taoran Li.
Colorado Center for Astrodynamics Research The University of Colorado STATISTICAL ORBIT DETERMINATION Project Report Unscented kalman Filter Information.
Lecture VII Rigid Body Dynamics CS274: Computer Animation and Simulation.
Adapting Simulated Behaviors For New Characters Jessica K. Hodgins and Nancy S. Pollard presentation by Barış Aksan.
Sérgio Ronaldo Barros dos Santos (ITA-Brazil)
Apprenticeship Learning for Robotics, with Application to Autonomous Helicopter Flight Pieter Abbeel Stanford University Joint work with: Andrew Y. Ng,
Karman filter and attitude estimation Lin Zhong ELEC424, Fall 2010.
Apprenticeship Learning for Robotic Control Pieter Abbeel Stanford University Joint work with: Andrew Y. Ng, Adam Coates, J. Zico Kolter and Morgan Quigley.
Experimental research in noise influence on estimation precision for polyharmonic model frequencies Natalia Visotska.
Approximate Dynamic Programming Methods for Resource Constrained Sensor Management John W. Fisher III, Jason L. Williams and Alan S. Willsky MIT CSAIL.
Advanced Computer Graphics Rigid Body Simulation Spring 2002 Professor Brogan.
Real-Time Simultaneous Localization and Mapping with a Single Camera (Mono SLAM) Young Ki Baik Computer Vision Lab. Seoul National University.
Quality of model and Error Analysis in Variational Data Assimilation François-Xavier LE DIMET Victor SHUTYAEV Université Joseph Fourier+INRIA Projet IDOPT,
1 Dynamics Differential equation relating input torques and forces to the positions (angles) and their derivatives. Like force = mass times acceleration.
Adaptive Control of A Spring-Mass Hopper İsmail Uyanık*, Uluç Saranlı § and Ömer Morgül* *Department of Electrical and Electronics Engineering § Department.
Robotics II Copyright Martin P. Aalund, Ph.D.
Static Equilibrium Physics 150/250 Center of Mass Types of Motion
Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.
CS274 Spring 01 Lecture 7 Copyright © Mark Meyer Lecture VII Rigid Body Dynamics CS274: Computer Animation and Simulation.
Advanced Games Development Game Physics CO2301 Games Development 1 Week 19.
Fuzzy Controller for Spacecraft Attitude Control CHIN-HSING CHENG SHENG-LI SHU Dept. of Electrical Engineering Feng-Chia University IEEE TRANSACTIONS ON.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Date of download: 6/6/2016 Copyright © ASME. All rights reserved. From: The Use of the Adjoint Method for Solving Typical Optimization Problems in Multibody.
Character Animation Forward and Inverse Kinematics
Multi-Policy Control of Biped Walking
Lecture Rigid Body Dynamics.
Improved Speed Estimation in Sensorless PM Brushless AC Drives
Date of download: 1/2/2018 Copyright © ASME. All rights reserved.
3.7. Other Game Physics Approaches
Inertial Measurement Unit (IMU) Basics
QUANSER Flight Control Systems Design 2DOF Helicopter 3DOF Helicopter 3DOF Hover 3DOF Gyroscope Quanser Education Solutions Powered by.
Matthias Faessler, Davide Falanga, and Davide Scaramuzza
Grab their Attention with Active Learning!
Presentation transcript:

Our acceleration prediction model Predict accelerations: f : learned from data. Obtain velocity, angular rates, position and orientation from numerical integration. Advantages No need to learn inertia from data. Constraints from physics are incorporated explicitly. The relation between state, inputs and accelerations is not cluttered by the change of coordinate frame, and thus easier to learn from data. Standard learning criteria Frequency domain fitting: requires a linear model, used in CIFER (industry standard). Minimize one-step prediction error: For f linear in state s and inputs u : f can be found by linear regression. Longer time-scale criterion Accuracy of simulation over longer time-scales is important for control. The following longer time-scale criterion was suggested in [Abbeel & Ng, 2004]: ( H : time-scale of interest) EM-algorithm for maximization is expensive in our continuous state-action space setting. We present a simple and fast algorithm for (approximately) minimizing the average squared error over a certain duration. Sketch of algorithmic idea (see paper for full algorithm) Model: One step prediction at time t : One step prediction at time t +1: Two step prediction at time t : Therefore, can approximate multiple-step dynamics by linear combination of one-step dynamics. Our algorithm iterates the following two steps: Compute estimate of s t + 1 given s t, u t, u t + 1 for current model A, B. Estimate Models in Prior Work Predict velocities and angular rates: f : learned from data. Obtain position and orientation from numerical integration. Shortcomings From physics we have: Body coordinate frame is different at every time step. This makes inertia highly non- linear in the state and very difficult to capture/learn from data. For most physical systems, forces and torques have a fairly simple relation to inputs and current state. This simplicity is lost by the change of coordinate frame. Rotation between body coordinate frames at times t and t +1 Accelerations First Autonomous Funnel Aerobatic maneuver. Method: model-based reinforcement learning. Simulator: Acceleration prediction. Longer time-scale criterion. Acknowledgments: control is joint work with Adam Coates, Ben Tse. (Paper forthcoming.) Video available. Overview Model-based reinforcement learning has been very successful. State-of-the-art: Reinforcement learning returns policies that fly well in simulation. Remaining helicopter failures typically caused by inaccurate simulation. Key technical challenge: Building an accurate simulator. Our approach: Encode all constraints known from physics. (Gravity, inertia, etc.) Learn only parts of model not determined by physics. Explicitly learn simulation that is predictive at long time-scales. Result Significantly improved helicopter model. First autonomous funnel (aerobatic maneuver) using our model. Learning Vehicular Dynamics, with Application to Modeling Helicopters Pieter Abbeel, Varun Ganapathi, Andrew Y. NgSTANFORDSTANFORD RC Helicopters Helicopter State and Inputs 12-D state: 8-D state: u 1, u 2 : The longitudinal (front-back) and latitudinal (left-right) cyclic pitch controls cause the helicopter to pitch forward/backward or roll sideways. u 3 : The tail rotor collective pitch control affects tail rotor thrust, and can be used to yaw (turn) the helicopter. u 4 : The main rotor collective pitch control affects the main rotor's thrust. Position Orientation: roll, pitch, yaw Velocity Angular rates Encode symmetries using body (=robot-centric) coordinates Body coordinate frame attached to helicopter Conclusion Key technical challenge for model-based reinforcement learning applied to helicopters: building an accurate simulator. Our approach By using acceleration-based approach, we can encode all constraints known from physics. (Gravity, inertia, etc.) Learn only parts of model not determined by physics. Explicitly learn simulation that is predictive at long time-scales. Result Significantly improved helicopter model. First autonomous funnel (aerobatic maneuver) using our model. Bergen Industrial Twin XCell Tempest Bergen Industrial Twin Simulator Accuracy Legend Linear model, one-step prediction error. Linear model, frequency domain fit with CIFER. Linear model, longer time scale prediction error. Acceleration model, one-step prediction error. Acceleration model, longer time scale prediction error. Observations Acceleration prediction model significantly better. Reasons: Captures gravity exactly. Captures inertia, thus side-slip effects in the data. Longer time scale criterion outperforms CIFER, which in turn outperforms the one-step criterion. Differences more significant for Tempest than for Bergen, since Bergen data is mostly around hover. XCell Tempest