Presentation is loading. Please wait.

Presentation is loading. Please wait.

NW Computational Intelligence Laboratory Implementing DHP in Software: Taking Control of the Pole-Cart System Lars Holmstrom.

Similar presentations


Presentation on theme: "NW Computational Intelligence Laboratory Implementing DHP in Software: Taking Control of the Pole-Cart System Lars Holmstrom."— Presentation transcript:

1 NW Computational Intelligence Laboratory Implementing DHP in Software: Taking Control of the Pole-Cart System Lars Holmstrom

2 NW Computational Intelligence Laboratory Overview Provides a brief overview of Dual Heuristic Programming (DHP) Describes a software implementation of DHP for designing a non-linear controller for the pole-cart system Follows the methodology outlined in –Lendaris, G.G. & J.S. Neidhoefer, 2004, "Guidance in the Use of Adaptive Critics for Control" Ch.4 in "Handbook of Learning and Approximate Dynamic Programming", Si, et al, Eds., IEEE Press & Wiley Interscience, pp. 97-124, 2004.

3 NW Computational Intelligence Laboratory DHP Foundations Reinforcement Learning –A process in which an agent learns behaviors through trial-and-error interactions with its environment, based on “reinforcement” signals acquired over time –As opposed to Supervised Learning in which an error signal based on the desired outcome of an action is known, reinforcement signals provide information about a “better” or “worse” action to take rather than the “best” one

4 NW Computational Intelligence Laboratory DHP Foundations (continued) Dynamic Programming –Provides a mathematical formalism for finding optimal solutions to control problems within a Markovian decision process –“Cost to Go” Function –Bellman’s Recursion

5 NW Computational Intelligence Laboratory DHP Foundations (continued) Adaptive Critics –An application of Reinforcement Learning for solving Dynamic Programming problems –The Critic is charged with the task of estimating J for a particular control policy π –The Critic’s knowledge about J, in turn, allows us to improve the control policy π –This process is iterated until the optimal J surface, J *, is found along with the associated optimal control policy π*

6 NW Computational Intelligence Laboratory DHP Architecture

7 NW Computational Intelligence Laboratory Weight Update Calculation for the Action Network

8 NW Computational Intelligence Laboratory Calculating the Critic Targets

9 NW Computational Intelligence Laboratory The Pole Cart Problem The dynamical system (plant) consists of a cart on a length of track with an inverted pendulum attached to it. The control problem is to balance the inverted pendulum while keeping the cart near the center of the track by applying a horizontal force to the cart. Pole Cart Animation

10 NW Computational Intelligence Laboratory Simulating the Plant

11 NW Computational Intelligence Laboratory Calculating the Instantaneous Derivative

12 NW Computational Intelligence Laboratory Iterating One Step In Time

13 NW Computational Intelligence Laboratory Iterating the Model Over a Trajectory

14 NW Computational Intelligence Laboratory Running the Simulation

15 NW Computational Intelligence Laboratory Calculating the Model Jacobians Analytically Numerical approximation Backpropagation

16 NW Computational Intelligence Laboratory Defining a Utility Function The utility function, along with the plant dynamics, define the optimal control policy For this example, I will choose Note: there is no penalty for effort, horizontal velocity (the cart), or angular velocity (the pole)

17 NW Computational Intelligence Laboratory Setting Up the DHP Training Loop For each training iteration (step in time) –Measure the current state –Calculate the control to apply –Calculate the control Jacobian –Iterate the model –Calculate the model Jacobian –Calculate the utility derivative –Calculate the present lambda –Calculate the future lambda –Calculate the reinforcement signal for the controller –Train the controller –Calculate the desired target for the critic –Train the critic

18 NW Computational Intelligence Laboratory Defining an Experiment Define the neural network architecture for action and critic networks Define the constants to be used for the model Set up the lesson plan –Define incremental steps in the learning process Set us a test plan

19 NW Computational Intelligence Laboratory Defining an Experiment in the DHP Toolkit

20 NW Computational Intelligence Laboratory Training Step 1 : 2 Degrees

21 NW Computational Intelligence Laboratory Training Step 2 : -5 Degrees

22 NW Computational Intelligence Laboratory Training Step 2 : 15 Degrees

23 NW Computational Intelligence Laboratory Training Step 2 : -30 Degrees

24 NW Computational Intelligence Laboratory Testing Step 2 : 20 Degrees

25 NW Computational Intelligence Laboratory Testing Step 2 : 30 Degrees

26 NW Computational Intelligence Laboratory Software Availability This software is available to anyone who would like to make use of it We also have software available for performing backpropagation through time (BPTT) experiments Set up an appointment with me or come in during my office hours to get more information about the software


Download ppt "NW Computational Intelligence Laboratory Implementing DHP in Software: Taking Control of the Pole-Cart System Lars Holmstrom."

Similar presentations


Ads by Google