1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning Dr. Itamar Arel.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Bioinspired Computing Lecture 16
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.
Experiments We measured the times(s) and number of expanded nodes to previous heuristic using BFBnB. Dynamic Programming Intuition. All DAGs must have.
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Adaptive Resonance Theory (ART) networks perform completely unsupervised learning. Their competitive learning algorithm is similar to the first (unsupervised)
Artificial Spiking Neural Networks
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Radial Basis Functions
Tirgul 9 Amortized analysis Graph representation.
The Stagecoach Problem
On the Construction of Energy- Efficient Broadcast Tree with Hitch-hiking in Wireless Networks Source: 2004 International Performance Computing and Communications.
Neural Networks Marco Loog.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.
Chapter 6: Multilayer Neural Networks
November 30, 2010Neural Networks Lecture 20: Interpolative Associative Memory 1 Associative Networks Associative networks are able to store a set of patterns.
MAE 552 Heuristic Optimization Instructor: John Eddy Lecture #31 4/17/02 Neural Networks.
A Multi-Agent Learning Approach to Online Distributed Resource Allocation Chongjie Zhang Victor Lesser Prashant Shenoy Computer Science Department University.
CS 4700: Foundations of Artificial Intelligence
Neural Networks Lecture 17: Self-Organizing Maps
November 21, 2012Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms III 1 Learning in the BPN Gradients of two-dimensional functions:
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Flow Models and Optimal Routing. How can we evaluate the performance of a routing algorithm –quantify how well they do –use arrival rates at nodes and.
Topology Design for Service Overlay Networks with Bandwidth Guarantees Sibelius Vieira* Jorg Liebeherr** *Department of Computer Science Catholic University.
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
1 Dr. Itamar Arel College of Engineering Electrical Engineering & Computer Science Department The University of Tennessee Fall 2009 August 24, 2009 ECE-517:
© Copyright 2004 ECE, UM-Rolla. All rights reserved A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 6: Optimality Criterion in MDPs Dr. Itamar Arel College of Engineering Department.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 11: Temporal Difference Learning (cont.), Eligibility Traces Dr. Itamar Arel College.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Akram Bitar and Larry Manevitz Department of Computer Science
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.
Adaptive Hopfield Network Gürsel Serpen Dr. Gürsel Serpen Associate Professor Electrical Engineering and Computer Science Department University of Toledo.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 8: Dynamic Programming – Value Iteration Dr. Itamar Arel College of Engineering Department.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 20: Approximate & Neuro Dynamic Programming, Policy Gradient Methods Dr. Itamar Arel.
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
Chapter 8: Adaptive Networks
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.
Chapter 6 Neural Network.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
Intro. ANN & Fuzzy Systems Lecture 11. MLP (III): Back-Propagation.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
Intelligent Information System Lab
Lecture 11. MLP (III): Back-Propagation
Hidden Markov Models Part 2: Algorithms
of the Artificial Neural Networks.
Capabilities of Threshold Neurons
October 6, 2011 Dr. Itamar Arel College of Engineering
LECTURE 15: REESTIMATION, EM AND MIXTURES
Fundamentals of Neural Networks Dr. Satinder Bal Gupta
Artificial Neural Networks
A Dynamic System Analysis of Simultaneous Recurrent Neural Network
Artificial Neural Networks
Computer Vision Lecture 19: Object Recognition III
August 8, 2006 Danny Budik, Itamar Elhanany Machine Intelligence Lab
CS621: Artificial Intelligence Lecture 18: Feedforward network contd
ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning November 3, 2010.
November 1, 2010 Dr. Itamar Arel College of Engineering
Presentation transcript:

1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning Dr. Itamar Arel College of Engineering Department of Electrical Engineering and Computer Science The University of Tennessee Fall 2010 November 3, 2010

ECE 517: Reinforcement Learning in AI 2 Outline Recap on RNNs Implementation and usage issues with RTRL Computational complexity and resources required Computational complexity and resources required Vanishing gradient problem Apprenticeship learning

ECE 517: Reinforcement Learning in AI 3 Recap on RNNs RNNs are potentially much stronger than FFNN Can capture temporal dependencies Can capture temporal dependencies Embed complex state representation (i.e. memory) Embed complex state representation (i.e. memory) Models of discrete-time dynamic systems Models of discrete-time dynamic systems They are (very) complex to train TDNN – limited performance based on window TDNN – limited performance based on window RTRL – calculates a dynamic gradient on-line RTRL – calculates a dynamic gradient on-line

ECE 517: Reinforcement Learning in AI 4 RTRL reviewed RTRL is a gradient descent based method It relies on sensitivities expressing the impact of any weight w ij on the activation of neuron k. The algorithm then consists of computing weight changes Let’s look at the resources involved …

ECE 517: Reinforcement Learning in AI 5 Implementing RTRL – computations involved The key component in RTRL is the sensitivities matrix Must be calculated for each neuron RTRL, however, is NOT local … Can the calculations be efficiently distributed? N 3N 3N 3N 3N N 4N 4N 4N 4

ECE 517: Reinforcement Learning in AI 6 Implementing RTRL – storage requirements Let’s assume a fully-connected network of N neurons Memory resources Weights matrix, w ij  N 2 Weights matrix, w ij  N 2 Activations, y k  N Activations, y k  N Sensitivity matrix  N 3 Sensitivity matrix  N 3 Total memory requirements O( N 3 ) Total memory requirements O( N 3 ) Let’s go over an example: Let’s assume we have 1000 neurons in the system Let’s assume we have 1000 neurons in the system Each value requires 20 bits to represent Each value requires 20 bits to represent  ~20 Gb of storage!!  ~20 Gb of storage!!

ECE 517: Reinforcement Learning in AI 7 Possible solutions – static subgrouping Zipser et. al (1989) suggested static grouping of neurons Relaxing the “fully-connected” requirement Has backing in neuroscience Has backing in neuroscience Average “branching factor” in the brain ~ 1000 Average “branching factor” in the brain ~ 1000 Reduced the complexity by simply leaving out elements of the sensitivity matrix based upon subgrouping of neurons Neurons are subgrouped arbitrarily Neurons are subgrouped arbitrarily Sensitivities between groups are ignored Sensitivities between groups are ignored All connections still exist in the forward path All connections still exist in the forward path If g is the number of subgroups then … Storage is O( N 3 /g 2 ) Storage is O( N 3 /g 2 ) Computational speedup is g 3 Computational speedup is g 3 Communications  each node communicates with N/g nodes Communications  each node communicates with N/g nodes

ECE 517: Reinforcement Learning in AI 8 Possible solutions – static subgrouping (cont.) Zipser’s empirical tests indicate that these networks can solve many of the problems full RTRL solves One caveat of the subgrouped RTRL training is that each subnet must have at least one unit for which a target exists (since gradient information is not exchanged between groups) Others have proposed dynamic subgrouping Subgrouping based on maximal gradient information Subgrouping based on maximal gradient information Not realistic for hardware realization Not realistic for hardware realization Open research question: how to calculate gradient without the O( N 3 ) storage requirement?

ECE 517: Reinforcement Learning in AI Truncated Real Time Recurrent Learning (TRTRL) Motivation: To obtain a scalable version of the RTRL algorithm while minimizing performance degradation How? Limit the sensitivities of each neuron to its ingress (incoming) and egress (outgoing) links

ECE 517: Reinforcement Learning in AI Performing Sensitivity Calculations in TRTRL For all nodes that are not in the output set, the egress sensitivity values for node i are calculated by imposing k=j in the original RTRL sensitivity equation, such that Similarly, the ingress sensitivity values for node j are given by For output neurons, a nonzero sensitivity element must exist in order to update the weights

ECE 517: Reinforcement Learning in AI The network structure remains the same with TRTRL, only the calculation of sensitivities is reduced Significant reduction in resource requirements … Computational load for each neuron drops to from O(N 3 ) to O(2KN), where K denotes the number of output neurons Total computational complexity is now O(2KN 2 ) Storage requirements drop from O(N 3 ) to O(N 2 ) Example revisited: For N=100, 10 outputs  100k multiplications and only 20kB of storage! Resource Requirements of TRTRL

ECE 517: Reinforcement Learning in AI 12 Further TRTRL Improvements – Clustering of Neurons TRTRL introduced localization and memory improvement Clustered TRTRL adds scalability by reducing the number of long connection lines between processing elements Input Output

ECE 517: Reinforcement Learning in AI Test case #1: Frequency Doubler Input: sin(x), target output sin(2x) Both networks had 12 neurons

ECE 517: Reinforcement Learning in AI 14 Vanishing Gradient Problem Recap on goals: Find temporal dependencies in data with a RNN Find temporal dependencies in data with a RNN The idea behind RTRL: when an error value is found, apply it to inputs seen an indefinite number of epochs ago The idea behind RTRL: when an error value is found, apply it to inputs seen an indefinite number of epochs ago In 1994 (Bengio et. al) it has been shown that both BPTT and RTRL suffer from the problem of vanishing gradient information When using gradient based training rules, the “error signal” that is applied to previous inputs tends to vanish When using gradient based training rules, the “error signal” that is applied to previous inputs tends to vanish Because of this, long-term dependencies in the data are often overlooked Short-term memory is ok, long-term (>10 epochs) – lost Short-term memory is ok, long-term (>10 epochs) – lost

ECE 517: Reinforcement Learning in AI 15 Vanishing Gradient Problem (cont.) A learning error yields gradients on outputs, and therefore on the state variables A learning error yields gradients on outputs, and therefore on the state variables s t Since the weights (parameters) are shared across time RNN xtxtxtxt ytytytyt stststst

ECE 517: Reinforcement Learning in AI 16 What is Apprenticeship Learning Many times we want to train an agent based on a reference controller Riding a bicycle Riding a bicycle Flying a plane Flying a plane Starting from scratch may take a very long time Particularly for large state/action spaces Particularly for large state/action spaces May cost a lot (e.g. helicopter crashing) Process: Train agent on reference controller Train agent on reference controller Evaluate trained agent Evaluate trained agent Improve trained agent Improve trained agent Note: reference controller can be anything (e.g. heuristic controller for Car Race problem)

ECE 517: Reinforcement Learning in AI 17 Formalizing Apprenticeship Learning Let’s assume we have a reference policy  from which we want our agent to learn We would first like to learn the (approx.) value function, V  Once we have V  , we can try an improve it based on the policy improvement theorem, i.e. By following the original policy greedily we obtain a better policy! In practice, many issues should be considered such as state space coverage and exploration/exploitation Train on zero exploration, then explore gradually … Train on zero exploration, then explore gradually …