ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning November 3, 2010.

ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning November 3, 2010 Dr. Itamar Arel College of Engineering Department of Electrical Engineering and Computer Science The University of Tennessee Fall 2010

Outline Recap on RNNs Implementation and usage issues with RTRL
Computational complexity and resources required Vanishing gradient problem Apprenticeship learning

RNNs are potentially much stronger than FFNN
Recap on RNNs RNNs are potentially much stronger than FFNN Can capture temporal dependencies Embed complex state representation (i.e. memory) Models of discrete-time dynamic systems They are (very) complex to train TDNN – limited performance based on window RTRL – calculates a dynamic gradient on-line

RTRL reviewed RTRL is a gradient descent based method
It relies on sensitivities expressing the impact of any weight wij on the activation of neuron k. The algorithm then consists of computing weight changes Let’s look at the resources involved …

Implementing RTRL – computations involved
The key component in RTRL is the sensitivities matrix Must be calculated for each neuron RTRL, however, is NOT local … Can the calculations be efficiently distributed? N 3 N N 4

Implementing RTRL – storage requirements
Let’s assume a fully-connected network of N neurons Memory resources Weights matrix, wij  N 2 Activations, yk  N Sensitivity matrix  N 3 Total memory requirements O(N 3) Let’s go over an example: Let’s assume we have 1000 neurons in the system Each value requires 20 bits to represent  ~20 Gb of storage!!

Possible solutions – static subgrouping
Zipser et. al (1989) suggested static grouping of neurons Relaxing the “fully-connected” requirement Has backing in neuroscience Average “branching factor” in the brain ~ 1000 Reduced the complexity by simply leaving out elements of the sensitivity matrix based upon subgrouping of neurons Neurons are subgrouped arbitrarily Sensitivities between groups are ignored All connections still exist in the forward path If g is the number of subgroups then … Storage is O(N3/g2 ) Computational speedup is g3 Communications  each node communicates with N/g nodes

Possible solutions – static subgrouping (cont.)
Zipser’s empirical tests indicate that these networks can solve many of the problems full RTRL solves One caveat of the subgrouped RTRL training is that each subnet must have at least one unit for which a target exists (since gradient information is not exchanged between groups) Others have proposed dynamic subgrouping Subgrouping based on maximal gradient information Not realistic for hardware realization Open research question: how to calculate gradient without the O(N3) storage requirement?

Truncated Real Time Recurrent Learning (TRTRL)
Motivation: To obtain a scalable version of the RTRL algorithm while minimizing performance degradation How? Limit the sensitivities of each neuron to its ingress (incoming) and egress (outgoing) links

Performing Sensitivity Calculations in TRTRL
For all nodes that are not in the output set, the egress sensitivity values for node i are calculated by imposing k=j in the original RTRL sensitivity equation, such that Similarly, the ingress sensitivity values for node j are given by For output neurons, a nonzero sensitivity element must exist in order to update the weights

Resource Requirements of TRTRL
The network structure remains the same with TRTRL, only the calculation of sensitivities is reduced Significant reduction in resource requirements … Computational load for each neuron drops to from O(N3) to O(2KN), where K denotes the number of output neurons Total computational complexity is now O(2KN2) Storage requirements drop from O(N3) to O(N2) Example revisited: For N=100, 10 outputs  100k multiplications and only 20kB of storage!

Further TRTRL Improvements – Clustering of Neurons
TRTRL introduced localization and memory improvement Clustered TRTRL adds scalability by reducing the number of long connection lines between processing elements Input Output

Test case #1: Frequency Doubler
Input: sin(x), target output sin(2x) Both networks had 12 neurons

Vanishing Gradient Problem
Recap on goals: Find temporal dependencies in data with a RNN The idea behind RTRL: when an error value is found, apply it to inputs seen an indefinite number of epochs ago In 1994 (Bengio et. al) it has been shown that both BPTT and RTRL suffer from the problem of vanishing gradient information When using gradient based training rules, the “error signal” that is applied to previous inputs tends to vanish Because of this, long-term dependencies in the data are often overlooked Short-term memory is ok, long-term (>10 epochs) – lost

Vanishing Gradient Problem (cont.)
xt yt st RNN A learning error yields gradients on outputs, and therefore on the state variables st Since the weights (parameters) are shared across time

What is Apprenticeship Learning
Many times we want to train an agent based on a reference controller Riding a bicycle Flying a plane Starting from scratch may take a very long time Particularly for large state/action spaces May cost a lot (e.g. helicopter crashing) Process: Train agent on reference controller Evaluate trained agent Improve trained agent Note: reference controller can be anything (e.g. heuristic controller for Car Race problem)

Formalizing Apprenticeship Learning
Let’s assume we have a reference policy p from which we want our agent to learn We would first like to learn the (approx.) value function, Vp Once we have Vp , we can try an improve it based on the policy improvement theorem, i.e. By following the original policy greedily we obtain a better policy! In practice, many issues should be considered such as state space coverage and exploration/exploitation Train on zero exploration, then explore gradually …

ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning November 3, 2010.

Similar presentations

Presentation on theme: "ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning November 3, 2010."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning November 3, 2010.

Similar presentations

Presentation on theme: "ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning November 3, 2010."— Presentation transcript:

Similar presentations

About project

Feedback