Our acceleration prediction model Predict accelerations: f : learned from data. Obtain velocity, angular rates, position and orientation from numerical integration. Advantages No need to learn inertia from data. Constraints from physics are incorporated explicitly. The relation between state, inputs and accelerations is not cluttered by the change of coordinate frame, and thus easier to learn from data. Standard learning criteria Frequency domain fitting: requires a linear model, used in CIFER (industry standard). Minimize one-step prediction error: For f linear in state s and inputs u : f can be found by linear regression. Longer time-scale criterion Accuracy of simulation over longer time-scales is important for control. The following longer time-scale criterion was suggested in [Abbeel & Ng, 2004]: ( H : time-scale of interest) EM-algorithm for maximization is expensive in our continuous state-action space setting. We present a simple and fast algorithm for (approximately) minimizing the average squared error over a certain duration. Sketch of algorithmic idea (see paper for full algorithm) Model: One step prediction at time t : One step prediction at time t +1: Two step prediction at time t : Therefore, can approximate multiple-step dynamics by linear combination of one-step dynamics. Our algorithm iterates the following two steps: Compute estimate of s t + 1 given s t, u t, u t + 1 for current model A, B. Estimate Models in Prior Work Predict velocities and angular rates: f : learned from data. Obtain position and orientation from numerical integration. Shortcomings From physics we have: Body coordinate frame is different at every time step. This makes inertia highly non- linear in the state and very difficult to capture/learn from data. For most physical systems, forces and torques have a fairly simple relation to inputs and current state. This simplicity is lost by the change of coordinate frame. Rotation between body coordinate frames at times t and t +1 Accelerations First Autonomous Funnel Aerobatic maneuver. Method: model-based reinforcement learning. Simulator: Acceleration prediction. Longer time-scale criterion. Acknowledgments: control is joint work with Adam Coates, Ben Tse. (Paper forthcoming.) Video available. Overview Model-based reinforcement learning has been very successful. State-of-the-art: Reinforcement learning returns policies that fly well in simulation. Remaining helicopter failures typically caused by inaccurate simulation. Key technical challenge: Building an accurate simulator. Our approach: Encode all constraints known from physics. (Gravity, inertia, etc.) Learn only parts of model not determined by physics. Explicitly learn simulation that is predictive at long time-scales. Result Significantly improved helicopter model. First autonomous funnel (aerobatic maneuver) using our model. Learning Vehicular Dynamics, with Application to Modeling Helicopters Pieter Abbeel, Varun Ganapathi, Andrew Y. NgSTANFORDSTANFORD RC Helicopters Helicopter State and Inputs 12-D state: 8-D state: u 1, u 2 : The longitudinal (front-back) and latitudinal (left-right) cyclic pitch controls cause the helicopter to pitch forward/backward or roll sideways. u 3 : The tail rotor collective pitch control affects tail rotor thrust, and can be used to yaw (turn) the helicopter. u 4 : The main rotor collective pitch control affects the main rotor's thrust. Position Orientation: roll, pitch, yaw Velocity Angular rates Encode symmetries using body (=robot-centric) coordinates Body coordinate frame attached to helicopter Conclusion Key technical challenge for model-based reinforcement learning applied to helicopters: building an accurate simulator. Our approach By using acceleration-based approach, we can encode all constraints known from physics. (Gravity, inertia, etc.) Learn only parts of model not determined by physics. Explicitly learn simulation that is predictive at long time-scales. Result Significantly improved helicopter model. First autonomous funnel (aerobatic maneuver) using our model. Bergen Industrial Twin XCell Tempest Bergen Industrial Twin Simulator Accuracy Legend Linear model, one-step prediction error. Linear model, frequency domain fit with CIFER. Linear model, longer time scale prediction error. Acceleration model, one-step prediction error. Acceleration model, longer time scale prediction error. Observations Acceleration prediction model significantly better. Reasons: Captures gravity exactly. Captures inertia, thus side-slip effects in the data. Longer time scale criterion outperforms CIFER, which in turn outperforms the one-step criterion. Differences more significant for Tempest than for Bergen, since Bergen data is mostly around hover. XCell Tempest