Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning Dr. Itamar Arel.

Similar presentations


Presentation on theme: "1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning Dr. Itamar Arel."— Presentation transcript:

1 1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning Dr. Itamar Arel College of Engineering Department of Electrical Engineering and Computer Science The University of Tennessee Fall 2010 November 3, 2010

2 ECE 517: Reinforcement Learning in AI 2 Outline Recap on RNNs Implementation and usage issues with RTRL Computational complexity and resources required Computational complexity and resources required Vanishing gradient problem Apprenticeship learning

3 ECE 517: Reinforcement Learning in AI 3 Recap on RNNs RNNs are potentially much stronger than FFNN Can capture temporal dependencies Can capture temporal dependencies Embed complex state representation (i.e. memory) Embed complex state representation (i.e. memory) Models of discrete-time dynamic systems Models of discrete-time dynamic systems They are (very) complex to train TDNN – limited performance based on window TDNN – limited performance based on window RTRL – calculates a dynamic gradient on-line RTRL – calculates a dynamic gradient on-line

4 ECE 517: Reinforcement Learning in AI 4 RTRL reviewed RTRL is a gradient descent based method It relies on sensitivities expressing the impact of any weight w ij on the activation of neuron k. The algorithm then consists of computing weight changes Let’s look at the resources involved …

5 ECE 517: Reinforcement Learning in AI 5 Implementing RTRL – computations involved The key component in RTRL is the sensitivities matrix Must be calculated for each neuron RTRL, however, is NOT local … Can the calculations be efficiently distributed? N 3N 3N 3N 3N N 4N 4N 4N 4

6 ECE 517: Reinforcement Learning in AI 6 Implementing RTRL – storage requirements Let’s assume a fully-connected network of N neurons Memory resources Weights matrix, w ij  N 2 Weights matrix, w ij  N 2 Activations, y k  N Activations, y k  N Sensitivity matrix  N 3 Sensitivity matrix  N 3 Total memory requirements O( N 3 ) Total memory requirements O( N 3 ) Let’s go over an example: Let’s assume we have 1000 neurons in the system Let’s assume we have 1000 neurons in the system Each value requires 20 bits to represent Each value requires 20 bits to represent  ~20 Gb of storage!!  ~20 Gb of storage!!

7 ECE 517: Reinforcement Learning in AI 7 Possible solutions – static subgrouping Zipser et. al (1989) suggested static grouping of neurons Relaxing the “fully-connected” requirement Has backing in neuroscience Has backing in neuroscience Average “branching factor” in the brain ~ 1000 Average “branching factor” in the brain ~ 1000 Reduced the complexity by simply leaving out elements of the sensitivity matrix based upon subgrouping of neurons Neurons are subgrouped arbitrarily Neurons are subgrouped arbitrarily Sensitivities between groups are ignored Sensitivities between groups are ignored All connections still exist in the forward path All connections still exist in the forward path If g is the number of subgroups then … Storage is O( N 3 /g 2 ) Storage is O( N 3 /g 2 ) Computational speedup is g 3 Computational speedup is g 3 Communications  each node communicates with N/g nodes Communications  each node communicates with N/g nodes

8 ECE 517: Reinforcement Learning in AI 8 Possible solutions – static subgrouping (cont.) Zipser’s empirical tests indicate that these networks can solve many of the problems full RTRL solves One caveat of the subgrouped RTRL training is that each subnet must have at least one unit for which a target exists (since gradient information is not exchanged between groups) Others have proposed dynamic subgrouping Subgrouping based on maximal gradient information Subgrouping based on maximal gradient information Not realistic for hardware realization Not realistic for hardware realization Open research question: how to calculate gradient without the O( N 3 ) storage requirement?

9 ECE 517: Reinforcement Learning in AI Truncated Real Time Recurrent Learning (TRTRL) Motivation: To obtain a scalable version of the RTRL algorithm while minimizing performance degradation How? Limit the sensitivities of each neuron to its ingress (incoming) and egress (outgoing) links

10 ECE 517: Reinforcement Learning in AI Performing Sensitivity Calculations in TRTRL For all nodes that are not in the output set, the egress sensitivity values for node i are calculated by imposing k=j in the original RTRL sensitivity equation, such that Similarly, the ingress sensitivity values for node j are given by For output neurons, a nonzero sensitivity element must exist in order to update the weights

11 ECE 517: Reinforcement Learning in AI The network structure remains the same with TRTRL, only the calculation of sensitivities is reduced Significant reduction in resource requirements … Computational load for each neuron drops to from O(N 3 ) to O(2KN), where K denotes the number of output neurons Total computational complexity is now O(2KN 2 ) Storage requirements drop from O(N 3 ) to O(N 2 ) Example revisited: For N=100, 10 outputs  100k multiplications and only 20kB of storage! Resource Requirements of TRTRL

12 ECE 517: Reinforcement Learning in AI 12 Further TRTRL Improvements – Clustering of Neurons TRTRL introduced localization and memory improvement Clustered TRTRL adds scalability by reducing the number of long connection lines between processing elements Input Output

13 ECE 517: Reinforcement Learning in AI Test case #1: Frequency Doubler Input: sin(x), target output sin(2x) Both networks had 12 neurons

14 ECE 517: Reinforcement Learning in AI 14 Vanishing Gradient Problem Recap on goals: Find temporal dependencies in data with a RNN Find temporal dependencies in data with a RNN The idea behind RTRL: when an error value is found, apply it to inputs seen an indefinite number of epochs ago The idea behind RTRL: when an error value is found, apply it to inputs seen an indefinite number of epochs ago In 1994 (Bengio et. al) it has been shown that both BPTT and RTRL suffer from the problem of vanishing gradient information When using gradient based training rules, the “error signal” that is applied to previous inputs tends to vanish When using gradient based training rules, the “error signal” that is applied to previous inputs tends to vanish Because of this, long-term dependencies in the data are often overlooked Short-term memory is ok, long-term (>10 epochs) – lost Short-term memory is ok, long-term (>10 epochs) – lost

15 ECE 517: Reinforcement Learning in AI 15 Vanishing Gradient Problem (cont.) A learning error yields gradients on outputs, and therefore on the state variables A learning error yields gradients on outputs, and therefore on the state variables s t Since the weights (parameters) are shared across time RNN xtxtxtxt ytytytyt stststst

16 ECE 517: Reinforcement Learning in AI 16 What is Apprenticeship Learning Many times we want to train an agent based on a reference controller Riding a bicycle Riding a bicycle Flying a plane Flying a plane Starting from scratch may take a very long time Particularly for large state/action spaces Particularly for large state/action spaces May cost a lot (e.g. helicopter crashing) Process: Train agent on reference controller Train agent on reference controller Evaluate trained agent Evaluate trained agent Improve trained agent Improve trained agent Note: reference controller can be anything (e.g. heuristic controller for Car Race problem)

17 ECE 517: Reinforcement Learning in AI 17 Formalizing Apprenticeship Learning Let’s assume we have a reference policy  from which we want our agent to learn We would first like to learn the (approx.) value function, V  Once we have V  , we can try an improve it based on the policy improvement theorem, i.e. By following the original policy greedily we obtain a better policy! In practice, many issues should be considered such as state space coverage and exploration/exploitation Train on zero exploration, then explore gradually … Train on zero exploration, then explore gradually …


Download ppt "1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning Dr. Itamar Arel."

Similar presentations


Ads by Google