ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning November 3, 2010.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Perceptron Lecture 4.
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Topology Design for Service Overlay Networks with Bandwidth Guarantees Sibelius Vieira* Jorg Liebeherr** *Department of Computer Science Catholic University.
1 Dr. Itamar Arel College of Engineering Electrical Engineering & Computer Science Department The University of Tennessee Fall 2009 August 24, 2009 ECE-517:
© Copyright 2004 ECE, UM-Rolla. All rights reserved A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 6: Optimality Criterion in MDPs Dr. Itamar Arel College of Engineering Department.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 11: Temporal Difference Learning (cont.), Eligibility Traces Dr. Itamar Arel College.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning Dr. Itamar Arel.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Adaptive Hopfield Network Gürsel Serpen Dr. Gürsel Serpen Associate Professor Electrical Engineering and Computer Science Department University of Toledo.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 8: Dynamic Programming – Value Iteration Dr. Itamar Arel College of Engineering Department.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 20: Approximate & Neuro Dynamic Programming, Policy Gradient Methods Dr. Itamar Arel.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
Chapter 8: Adaptive Networks
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
Lecture 12. Outline of Rule-Based Classification 1. Overview of ANN 2. Basic Feedforward ANN 3. Linear Perceptron Algorithm 4. Nonlinear and Multilayer.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
Neural networks.
Chapter 4 Dynamical Behavior of Processes Homework 6 Construct an s-Function model of the interacting tank-in-series system and compare its simulation.
Fall 2004 Backpropagation CS478 - Machine Learning.
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
Chapter 4 Dynamical Behavior of Processes Homework 6 Construct an s-Function model of the interacting tank-in-series system and compare its simulation.
State Space Representation
A Study of Group-Tree Matching in Large Scale Group Communications
Real Neurons Cell structures Cell body Dendrites Axon
Intro to NLP and Deep Learning
Intelligent Information System Lab
Neural Networks 2 CS446 Machine Learning.
CS621: Artificial Intelligence
Machine Learning Today: Reading: Maria Florina Balcan
Lecture 11. MLP (III): Back-Propagation
Hidden Markov Models Part 2: Algorithms
Outline Single neuron case: Nonlinear error correcting learning
of the Artificial Neural Networks.
Emre O. Neftci  iScience  Volume 5, Pages (July 2018) DOI: /j.isci
State Space Analysis UNIT-V.
Other Classification Models: Recurrent Neural Network (RNN)
Capabilities of Threshold Neurons
Image Coding and Compression
October 6, 2011 Dr. Itamar Arel College of Engineering
Machine Learning: Lecture 4
LECTURE 15: REESTIMATION, EM AND MIXTURES
Financial Data Modelling
Machine Learning: UNIT-2 CHAPTER-1
Fundamentals of Neural Networks Dr. Satinder Bal Gupta
Artificial Neural Networks
A Dynamic System Analysis of Simultaneous Recurrent Neural Network
Artificial Neural Networks
Dr. Unnikrishnan P.C. Professor, EEE
Computer Vision Lecture 19: Object Recognition III
August 8, 2006 Danny Budik, Itamar Elhanany Machine Intelligence Lab
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
David Kauchak CS158 – Spring 2019
Implementation of Learning Systems
CS621: Artificial Intelligence Lecture 18: Feedforward network contd
CS623: Introduction to Computing with Neural Nets (lecture-11)
CSC 578 Neural Networks and Deep Learning
November 1, 2010 Dr. Itamar Arel College of Engineering
Akram Bitar and Larry Manevitz Department of Computer Science
Overall Introduction for the Lecture
Presentation transcript:

ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning November 3, 2010 Dr. Itamar Arel College of Engineering Department of Electrical Engineering and Computer Science The University of Tennessee Fall 2010

Outline Recap on RNNs Implementation and usage issues with RTRL Computational complexity and resources required Vanishing gradient problem Apprenticeship learning

RNNs are potentially much stronger than FFNN Recap on RNNs RNNs are potentially much stronger than FFNN Can capture temporal dependencies Embed complex state representation (i.e. memory) Models of discrete-time dynamic systems They are (very) complex to train TDNN – limited performance based on window RTRL – calculates a dynamic gradient on-line

RTRL reviewed RTRL is a gradient descent based method It relies on sensitivities expressing the impact of any weight wij on the activation of neuron k. The algorithm then consists of computing weight changes Let’s look at the resources involved …

Implementing RTRL – computations involved The key component in RTRL is the sensitivities matrix Must be calculated for each neuron RTRL, however, is NOT local … Can the calculations be efficiently distributed? N 3 N N 4

Implementing RTRL – storage requirements Let’s assume a fully-connected network of N neurons Memory resources Weights matrix, wij  N 2 Activations, yk  N Sensitivity matrix  N 3 Total memory requirements O(N 3) Let’s go over an example: Let’s assume we have 1000 neurons in the system Each value requires 20 bits to represent  ~20 Gb of storage!!

Possible solutions – static subgrouping Zipser et. al (1989) suggested static grouping of neurons Relaxing the “fully-connected” requirement Has backing in neuroscience Average “branching factor” in the brain ~ 1000 Reduced the complexity by simply leaving out elements of the sensitivity matrix based upon subgrouping of neurons Neurons are subgrouped arbitrarily Sensitivities between groups are ignored All connections still exist in the forward path If g is the number of subgroups then … Storage is O(N3/g2 ) Computational speedup is g3 Communications  each node communicates with N/g nodes

Possible solutions – static subgrouping (cont.) Zipser’s empirical tests indicate that these networks can solve many of the problems full RTRL solves One caveat of the subgrouped RTRL training is that each subnet must have at least one unit for which a target exists (since gradient information is not exchanged between groups) Others have proposed dynamic subgrouping Subgrouping based on maximal gradient information Not realistic for hardware realization Open research question: how to calculate gradient without the O(N3) storage requirement?

Truncated Real Time Recurrent Learning (TRTRL) Motivation: To obtain a scalable version of the RTRL algorithm while minimizing performance degradation How? Limit the sensitivities of each neuron to its ingress (incoming) and egress (outgoing) links

Performing Sensitivity Calculations in TRTRL For all nodes that are not in the output set, the egress sensitivity values for node i are calculated by imposing k=j in the original RTRL sensitivity equation, such that Similarly, the ingress sensitivity values for node j are given by For output neurons, a nonzero sensitivity element must exist in order to update the weights

Resource Requirements of TRTRL The network structure remains the same with TRTRL, only the calculation of sensitivities is reduced Significant reduction in resource requirements … Computational load for each neuron drops to from O(N3) to O(2KN), where K denotes the number of output neurons Total computational complexity is now O(2KN2) Storage requirements drop from O(N3) to O(N2) Example revisited: For N=100, 10 outputs  100k multiplications and only 20kB of storage!

Further TRTRL Improvements – Clustering of Neurons TRTRL introduced localization and memory improvement Clustered TRTRL adds scalability by reducing the number of long connection lines between processing elements Input Output

Test case #1: Frequency Doubler Input: sin(x), target output sin(2x) Both networks had 12 neurons

Vanishing Gradient Problem Recap on goals: Find temporal dependencies in data with a RNN The idea behind RTRL: when an error value is found, apply it to inputs seen an indefinite number of epochs ago In 1994 (Bengio et. al) it has been shown that both BPTT and RTRL suffer from the problem of vanishing gradient information When using gradient based training rules, the “error signal” that is applied to previous inputs tends to vanish Because of this, long-term dependencies in the data are often overlooked Short-term memory is ok, long-term (>10 epochs) – lost

Vanishing Gradient Problem (cont.) xt yt st RNN A learning error yields gradients on outputs, and therefore on the state variables st Since the weights (parameters) are shared across time

What is Apprenticeship Learning Many times we want to train an agent based on a reference controller Riding a bicycle Flying a plane Starting from scratch may take a very long time Particularly for large state/action spaces May cost a lot (e.g. helicopter crashing) Process: Train agent on reference controller Evaluate trained agent Improve trained agent Note: reference controller can be anything (e.g. heuristic controller for Car Race problem)

Formalizing Apprenticeship Learning Let’s assume we have a reference policy p from which we want our agent to learn We would first like to learn the (approx.) value function, Vp Once we have Vp , we can try an improve it based on the policy improvement theorem, i.e. By following the original policy greedily we obtain a better policy! In practice, many issues should be considered such as state space coverage and exploration/exploitation Train on zero exploration, then explore gradually …