Reinforcement Learning for Intelligent Control Presented at Chinese Youth Automation Conference National Academy of Science, Beijing, 8/22/05 George G.

Slides:

Advertisements

Similar presentations

Self tuning regulators

Advertisements

Lecture 20 Dimitar Stefanov. Microprocessor control of Powered Wheelchairs Flexible control; speed synchronization of both driving wheels, flexible control.

Perceptron Lecture 4.

Slides from: Doug Gray, David Poole

Neural networks Introduction Fitting neural networks

Optimization of Global Chassis Control Variables Josip Kasać*, Joško Deur*, Branko Novaković*, Matthew Hancock**, Francis Assadian** * University of Zagreb,

Performance oriented anti-windup for a class of neural network controlled systems G. Herrmann M. C. Turner and I. Postlethwaite Control and Instrumentation.

Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수

Genetic fuzzy controllers for uncertain systems Yonggon Lee and Stanislaw H. Żak Supported by National Science Foundation under grant ECS

Bond Graph Simulation of Bicycle Model

Neural Networks Marco Loog.

Controller Tuning: A Motivational Example

February 24, Final Presentation AAE Final Presentation Backstepping Based Flight Control Asif Hossain.

Course AE4-T40 Lecture 5: Control Apllication

CH 1 Introduction Prof. Ming-Shaung Ju Dept. of Mechanical Engineering NCKU.

Traction Control Michael Boersma Michael LaGrand 12/10/03.

D D L ynamic aboratory esign 5-Nov-04Group Meeting Accelerometer Based Handwheel State Estimation For Force Feedback in Steer-By-Wire Vehicles Joshua P.

MODEL REFERENCE ADAPTIVE CONTROL

Motion Control (wheeled robots)

1 CMPUT 412 Motion Control – Wheeled robots Csaba Szepesvári University of Alberta TexPoint fonts used in EMF. Read the TexPoint manual before you delete.

Spinning Out, With Calculus J. Christian Gerdes Associate Professor Mechanical Engineering Department Stanford University.

A Shaft Sensorless Control for PMSM Using Direct Neural Network Adaptive Observer Authors: Guo Qingding Luo Ruifu Wang Limei IEEE IECON 22 nd International.

System & Control Control theory is an interdisciplinary branch of engineering and mathematics, that deals with the behavior of dynamical systems. The desired.

Adaptive Critic Design for Aircraft Control Silvia Ferrari Advisor: Prof. Robert F. Stengel Princeton University FAA/NASA Joint University Program on Air.

Adapting Simulated Behaviors For New Characters Jessica K. Hodgins and Nancy S. Pollard presentation by Barış Aksan.

Book Adaptive control -astrom and witten mark

Computers on Cruise Control Creating Adaptive Systems with Control Theory Ricardo Portillo The University of Texas at El Paso

NW Computational Intelligence Laboratory Implementing DHP in Software: Taking Control of the Pole-Cart System Lars Holmstrom.

Optimal Nonlinear Neural Network Controllers for Aircraft Joint University Program Meeting October 10, 2001 Nilesh V. Kulkarni Advisors Prof. Minh Q. Phan.

1 Adaptive, Optimal and Reconfigurable Nonlinear Control Design for Futuristic Flight Vehicles Radhakant Padhi Assistant Professor Dept. of Aerospace Engineering.

Sérgio Ronaldo Barros dos Santos (ITA-Brazil)

Model Reference Adaptive Control (MRAC). MRAS The Model-Reference Adaptive system (MRAS) was originally proposed to solve a problem in which the performance.

Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.

Mechanical Engineering Department Automatic Control Dr. Talal Mandourah 1 Lecture 1 Automatic Control Applications: Missile control Behavior control Aircraft.

Non-Bayes classifiers. Linear discriminants, neural networks.

Akram Bitar and Larry Manevitz Department of Computer Science

Real-Time Simultaneous Localization and Mapping with a Single Camera (Mono SLAM) Young Ki Baik Computer Vision Lab. Seoul National University.

NW Computational Intelligence Laboratory 1 Designing A Contextually Discerning Controller By Lars Holmstrom.

A Robust Method for Lane Tracking Using RANSAC James Ian Vaughn Daniel Gicklhorn CS664 Computer Vision Cornell University Spring 2008.

Review: Neural Network Control of Robot Manipulators; Frank L. Lewis; 1996.

Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.

Professor : Ming – Shyan Wang Department of Electrical Engineering Southern Taiwan University Thesis progress report Sensorless Operation of PMSM Using.

By : Rohini H M USN : 2VX11LVS19.  This system includes sensors for measuring vehicle speed; steering input; relative displacement of the wheel assembly.

Basilio Bona DAUIN – Politecnico di Torino

Intro. ANN & Fuzzy Systems Lecture 3 Basic Definitions of ANN.

MECHANICAL and AEROSPACE ENGINEERING Active Reconfiguration for Performance Enhancement in Articulated Wheeled Vehicles Aliakbar Alamdari PhD Candidate.

A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.

A PID Neural Network Controller

Vision Based Automation of Steering using Artificial Neural Network Team Members: Sriganesh R. Prabhu Raj Kumar T. Senthil Prabu K. Raghuraman V. Guide:

Character Animation Forward and Inverse Kinematics

Presentation at NI Day April 2010 Lillestrøm, Norway

Multi-Policy Control of Biped Walking

Date of download: 10/22/2017 Copyright © ASME. All rights reserved.

Date of download: 10/26/2017 Copyright © ASME. All rights reserved.

Date of download: 11/5/2017 Copyright © ASME. All rights reserved.

第 3 章神经网络.

Soft Computing Applied to Finite Element Tasks

Reinforcement Learning for Intelligent Control Part 2 Presented at Chinese Youth Automation Conference National Academy of Science, Beijing, 8/22/05.

Controller Tuning: A Motivational Example

Roads and Bridges Central Laboratory University of Versailles

Artificial Intelligence Chapter 3 Neural Networks

Chapter 1 Introduction.

Digital Control Systems Waseem Gulsher

Vision based automated steering

Artificial Intelligence Chapter 3 Neural Networks

Artificial Intelligence Chapter 3 Neural Networks

Neuro-Computing Lecture 2 Single-Layer Perceptrons

Artificial Intelligence Chapter 3 Neural Networks

Artificial Intelligence Chapter 3 Neural Networks

Presentation transcript:

Reinforcement Learning for Intelligent Control Presented at Chinese Youth Automation Conference National Academy of Science, Beijing, 8/22/05 George G. Lendaris NW Computational Intelligence Laboratory Portland State University, Portland, OR

Basic Control Design Problem ControllerPlant Controller designer needs following: Problem domain specifications Design objectives / Criteria for “success” All available a priori information about Plant and Environment High Level Design Objective: “Intelligent” Control

In this talk, we focus on Adaptive Critic Method (ACM) A methodology for designing an (approximately) optimal controller for a given plant according to a stated criterion, via a learning process. ACM may be implemented using two neural networks (also Fuzzy systems): ---> one in role of controller, and ---> one in role of critic.

In Adaptive Critic method User provides the Design objectives / Criteria for “success ” through a Utility Function, U(t) (local cost). Then, a new utility function is defined (Bellman Eqn.), which is to be minimized: J(t) = Σ ∞ γ k U(t + k) [“cost to go”] k = 0 [We note: J(t) = U(t) + γJ(t + 1)  Bellman Recursion]

Family of Adaptive Critic Methods : The critic approximates either J( t ) or the gradient of J( t ) wrt state vector R(t) [ ∇ J(R)] Two members of this “family”: Heuristic Dynamic Programming (HDP) Critic approximates J( t ) (cf. “Q Learning”) Dual Heuristic Programming (DHP) Critic approximates ∇ J(R(t))≡λ(t) [Today, focus on DHP]

May describe DHP training process via two primary feedback loops:

Simple, 1-Hidden Layer, Feedforward Neural Network

Perform search over Criterion Function (J function) Surface in Weight (State) Space

Via Plant model Via Critic Via Plant model

Key ADP Process Parameters Specification of: state variables size of NN structure, connection pattern, and type of activation functions “gain” of weight update rules discount factor (gamma of Bellman Equation) “lesson plan” for training

Pole-Cart Benchmark Problem

Progress of training process under each of the strategies. [ Pole angles during the first 80 sec. of training. Note how fast Strategies 4a & 4b learn.]

Pole angles during part of test sequence. [Markings below and parallel to axis are plotting artifacts.] Controller Responses to Disturbances

Autonomous Vehicle Example (Two-axle terrestrial vehicle) Objective: Demonstrate design via DHP (“from scratch”) of a steering controller which effects a Change of Lanes (on a straight highway). Controller Task: Specify sequence of steering angles to effect accelerations needed to change orientations of vehicle velocity vector.

Design Scenarios -- general: Maintain approximately constant forward velocity during lane change. Control angular velocity of wheel. Thus, Controller needs 2 outputs: 1.) Steering angle 2.) Wheel rotation velocity

Major unknown for controller: Coefficient of Friction (cof) between tire/road at any point in time. It can change abruptly (based on road conditions). Requires Robustness and/or fast on-line adaptation.

Tire side-force/side-slip-angle as a function of COF

Assume on-board sensors for constructing state information Only 3 accelerometers needed: -- x direction accel. of chassis -- lateral accel. at rear axle -- lateral accel. at front axle Plus load cells at front & rear axles (to give estimate of c.g. location) [Simulation used analytic proxies]

Criteria for various Design Scenarios: 1. Velocity Error reduce to zero. 2. Velocity Rate limit rate of velocity corrections. 3. Y-direction Error reduce to zero. 4. Lateral front axle acceleration re. “comfort” design specification. 5. Friction Sense estimate closeness to knee of cof curves.

Utility Functions for three Design Scenarios: [different combinations of above criteria] 1.U(1,2,3) 2. U(1,2,3,5) 3. U(1,2,3,4,5) All applied to task of designing controller for autonomous 2-axle terrestrial vehicle.

Design Scenario 2. Add Criterion 5 (“friction sense”) in U2. This is intended to allow aggressive lane changes on dry pavement, PLUS make lane changes on icy road conditions as aggressively as the icy road will allow.

Expanded task for Autonomous Vehicle Controller: Additional road condition of encountering an ice patch while doing lane change; also, accommodate side- wind load disturbances.

Generalize Test Put ice patch at point where wheel angle is greatest in dry pavement case, retrained with higher coefficient in Utility function term to give more aggressive controller.

Using DHP Adaptive Critic Methods to Tune a Fuzzy Automobile Steering Controller Pre-Tuning Tracking Error: Post-Tuning Tracking Error (60% improvement)

System configuration during DHP Design of Airplane Controller Augmentation System

Pilot stick-x doublet signal (arbitrary scale in the Figure ), and roll-rate responses of 3 aircraft: LoFLYTE® w/Unaugmented control, LoFLYTE® w/Augmented Control, and LoFLYTE®*. (Note: Responses of latter two essentially coincide.)

Stick-x doublet : pilot’s stick signal vs. augmented signal (the latter is sent to aircraft actuators)

Roll-rate error (for above stick-x signal) between LoFLYTE®* and LoFLYTE® w/Unaugmented Control, and between LoFLYTE®* and LoFLYTE® w/Augmented Control signals

Pitch-rate error (for above stick-x signal) between LoFLYTE®* and LoFLYTE® w/Unaugmented Control, and between LoFLYTE®* and LoFLYTE® w/Augmented Control signals.

Yaw-rate error (for above stick-x signal) between LoFLYTE®* and LoFLYTE® w/Unaugmented Control, and between LoFLYTE®* and LoFLYTE® w/Augmented Control signals.

Augmentation commands for stick-y and pedal that the controller learned to provide to make the induced a) pitch (stick-y) and b) yaw (pedal) responses of LoFLYTE® match those of LoFLYTE®*

Controller Structure Lateral Longitudinal P(t) R(t) R(t+1) Critic Navigational Controller Maneuvering Controller Ř o (t) Ř a (t) δ e (t), δ t (t) δ a (t), δ r (t) 6 Degree of Freedom Simulation Model