NW Computational Intelligence Laboratory Implementing DHP in Software: Taking Control of the Pole-Cart System Lars Holmstrom.

Slides:



Advertisements
Similar presentations
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Advertisements

NEURAL NETWORKS Backpropagation Algorithm
NW Computational Intelligence Laboratory Experience-Based Surface-Discernment by a Quadruped Robot by Lars Holmstrom, Drew Toland, and George Lendaris.
Using Inaccurate Models in Reinforcement Learning Pieter Abbeel, Morgan Quigley and Andrew Y. Ng Stanford University.
Neural Network Based Control Dan Simon Cleveland State University 1.
November 19, 2009Introduction to Cognitive Science Lecture 20: Artificial Neural Networks I 1 Artificial Neural Network (ANN) Paradigms Overview: The Backpropagation.
Computational Intelligence Dr. Garrison Greenwood, Dr. George Lendaris and Dr. Richard Tymerski
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Goals of Adaptive Signal Processing Design algorithms that learn from training data Algorithms must have good properties: attain good solutions, simple.
7. Experiments 6. Theoretical Guarantees Let the local policy improvement algorithm be policy gradient. Notes: These assumptions are insufficient to give.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Introduction to Adaptive Digital Filters Algorithms
1 S ystems Analysis Laboratory Helsinki University of Technology Kai Virtanen, Tuomas Raivio, and Raimo P. Hämäläinen Systems Analysis Laboratory (SAL)
Chen Cai, Benjamin Heydecker Presentation for the 4th CREST Open Workshop Operation Research for Software Engineering Methods, London, 2010 Approximate.
© Yilmaz “Agent-Directed Simulation – Course Outline” 1 Course Outline Dr. Levent Yilmaz M&SNet: Auburn M&S Laboratory Computer Science &
Multiple-Layer Networks and Backpropagation Algorithms
© N. Kasabov Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering, MIT Press, 1996 INFO331 Machine learning. Neural networks. Supervised.
Swarm Intelligence 虞台文.
Natural Actor-Critic Authors: Jan Peters and Stefan Schaal Neurocomputing, 2008 Cognitive robotics 2008/2009 Wouter Klijn.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
The Physics of Computer Science By Matthew Pinney A Brief look at how physics is related to Computer Science and a sample application of problem solution.
1 Introduction to Neural Networks And Their Applications.
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Verve: A General Purpose Open Source Reinforcement Learning Toolkit Tyler Streeter, James Oliver, & Adrian Sannier ASME IDETC & CIE, September 13, 2006.
Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University
S ystems Analysis Laboratory Helsinki University of Technology Automated Solution of Realistic Near-Optimal Aircraft Trajectories Using Computational Optimal.
Akram Bitar and Larry Manevitz Department of Computer Science
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
Neural Networks Chapter 7
NW Computational Intelligence Laboratory 1 Designing A Contextually Discerning Controller By Lars Holmstrom.
Reinforcement Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Review: Neural Network Control of Robot Manipulators; Frank L. Lewis; 1996.
Inverse Kinematics for Robotics using Neural Networks. Authors: Sreenivas Tejomurtula., Subhash Kak
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
Chapter 8: Adaptive Networks
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Reinforcement Learning for Intelligent Control Presented at Chinese Youth Automation Conference National Academy of Science, Beijing, 8/22/05 George G.

Neural Networks 2nd Edition Simon Haykin
Chapter 6 Neural Network.
A field of study that encompasses computational techniques for performing tasks that require intelligence when performed by humans. Simulation of human.
1 Technological Educational Institute Of Crete Department Of Applied Informatics and Multimedia Intelligent Systems Laboratory.
Towards Adaptive Optimal Control of the Scramjet Inlet Nilesh V. Kulkarni Advisors: Prof. Minh Q. Phan Dartmouth College Prof. Robert F. Stengel Princeton.
Learning Analytics isn’t new Ways in which we might build on the long history of adaptive learning systems within contemporary online learning design Professor.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
Introduction to Machine Learning, its potential usage in network area,
Understanding Complex Systems May 15, 2007 Javier Alcazar, Ph.D.
Character Animation Forward and Inverse Kinematics
One-layer neural networks Approximation problems
Deep reinforcement learning
Reinforcement Learning for Intelligent Control Part 2 Presented at Chinese Youth Automation Conference National Academy of Science, Beijing, 8/22/05.
"Playing Atari with deep reinforcement learning."
Announcements Homework 3 due today (grace period through Friday)
Artificial Intelligence Chapter 3 Neural Networks
Dr. Unnikrishnan P.C. Professor, EEE
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
Artificial Intelligence Chapter 3 Neural Networks
Backpropagation.
Deep Reinforcement Learning
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
August 8, 2006 Danny Budik, Itamar Elhanany Machine Intelligence Lab
Classifier-Based Approximate Policy Iteration
Akram Bitar and Larry Manevitz Department of Computer Science
A Deep Reinforcement Learning Approach to Traffic Management
Presentation transcript:

NW Computational Intelligence Laboratory Implementing DHP in Software: Taking Control of the Pole-Cart System Lars Holmstrom

NW Computational Intelligence Laboratory Overview Provides a brief overview of Dual Heuristic Programming (DHP) Describes a software implementation of DHP for designing a non-linear controller for the pole-cart system Follows the methodology outlined in –Lendaris, G.G. & J.S. Neidhoefer, 2004, "Guidance in the Use of Adaptive Critics for Control" Ch.4 in "Handbook of Learning and Approximate Dynamic Programming", Si, et al, Eds., IEEE Press & Wiley Interscience, pp , 2004.

NW Computational Intelligence Laboratory DHP Foundations Reinforcement Learning –A process in which an agent learns behaviors through trial-and-error interactions with its environment, based on “reinforcement” signals acquired over time –As opposed to Supervised Learning in which an error signal based on the desired outcome of an action is known, reinforcement signals provide information about a “better” or “worse” action to take rather than the “best” one

NW Computational Intelligence Laboratory DHP Foundations (continued) Dynamic Programming –Provides a mathematical formalism for finding optimal solutions to control problems within a Markovian decision process –“Cost to Go” Function –Bellman’s Recursion

NW Computational Intelligence Laboratory DHP Foundations (continued) Adaptive Critics –An application of Reinforcement Learning for solving Dynamic Programming problems –The Critic is charged with the task of estimating J for a particular control policy π –The Critic’s knowledge about J, in turn, allows us to improve the control policy π –This process is iterated until the optimal J surface, J *, is found along with the associated optimal control policy π*

NW Computational Intelligence Laboratory DHP Architecture

NW Computational Intelligence Laboratory Weight Update Calculation for the Action Network

NW Computational Intelligence Laboratory Calculating the Critic Targets

NW Computational Intelligence Laboratory The Pole Cart Problem The dynamical system (plant) consists of a cart on a length of track with an inverted pendulum attached to it. The control problem is to balance the inverted pendulum while keeping the cart near the center of the track by applying a horizontal force to the cart. Pole Cart Animation

NW Computational Intelligence Laboratory Simulating the Plant

NW Computational Intelligence Laboratory Calculating the Instantaneous Derivative

NW Computational Intelligence Laboratory Iterating One Step In Time

NW Computational Intelligence Laboratory Iterating the Model Over a Trajectory

NW Computational Intelligence Laboratory Running the Simulation

NW Computational Intelligence Laboratory Calculating the Model Jacobians Analytically Numerical approximation Backpropagation

NW Computational Intelligence Laboratory Defining a Utility Function The utility function, along with the plant dynamics, define the optimal control policy For this example, I will choose Note: there is no penalty for effort, horizontal velocity (the cart), or angular velocity (the pole)

NW Computational Intelligence Laboratory Setting Up the DHP Training Loop For each training iteration (step in time) –Measure the current state –Calculate the control to apply –Calculate the control Jacobian –Iterate the model –Calculate the model Jacobian –Calculate the utility derivative –Calculate the present lambda –Calculate the future lambda –Calculate the reinforcement signal for the controller –Train the controller –Calculate the desired target for the critic –Train the critic

NW Computational Intelligence Laboratory Defining an Experiment Define the neural network architecture for action and critic networks Define the constants to be used for the model Set up the lesson plan –Define incremental steps in the learning process Set us a test plan

NW Computational Intelligence Laboratory Defining an Experiment in the DHP Toolkit

NW Computational Intelligence Laboratory Training Step 1 : 2 Degrees

NW Computational Intelligence Laboratory Training Step 2 : -5 Degrees

NW Computational Intelligence Laboratory Training Step 2 : 15 Degrees

NW Computational Intelligence Laboratory Training Step 2 : -30 Degrees

NW Computational Intelligence Laboratory Testing Step 2 : 20 Degrees

NW Computational Intelligence Laboratory Testing Step 2 : 30 Degrees

NW Computational Intelligence Laboratory Software Availability This software is available to anyone who would like to make use of it We also have software available for performing backpropagation through time (BPTT) experiments Set up an appointment with me or come in during my office hours to get more information about the software