Bernard Ans, Stéphane Rousset, Robert M. French & Serban Musca (European Commission grant HPRN-CT-1999-00065) Preventing Catastrophic Interference in.

Slides:



Advertisements
Similar presentations
Pattern Association.
Advertisements

1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Introduction to Neural Networks Computing
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Deep Learning and Neural Nets Spring 2015
CSC321: Neural Networks Lecture 3: Perceptrons
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Simple Neural Nets For Pattern Classification
Recurrent Neural Networks
Pattern Association A pattern association learns associations between input patterns and output patterns. One of the most appealing characteristics of.
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions WK5 – Dynamic.
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
November 21, 2012Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms III 1 Learning in the BPN Gradients of two-dimensional functions:
Image Compression Using Neural Networks Vishal Agrawal (Y6541) Nandan Dubey (Y6279)
Neural Networks William Lai Chris Rowlett. What are Neural Networks? A type of program that is completely different from functional programming. Consists.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Radial Basis Function (RBF) Networks
Radial Basis Function Networks
Neural networks - Lecture 111 Recurrent neural networks (II) Time series processing –Networks with delayed input layer –Elman network Cellular networks.
© Negnevitsky, Pearson Education, Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works Introduction, or.
Multiple-Layer Networks and Backpropagation Algorithms
Neural Networks Architecture Baktash Babadi IPM, SCS Fall 2004.
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
1 Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6) Asst. Prof. Dr. Sukanya Pongsuparb Dr. Srisupa Palakvangsa Na Ayudhya Dr. Benjarath.
Robert M. French LEAD-CNRS UMR 5022 Dijon, France Why you will remember this talk and a Neural Network would not... Or The Problem (and a Solution to)
Artificial Neural Network Supervised Learning دكترمحسن كاهاني
NEURAL NETWORKS FOR DATA MINING
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Intelligence Techniques Multilayer Perceptrons.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
BACKPROPAGATION: An Example of Supervised Learning One useful network is feed-forward network (often trained using the backpropagation algorithm) called.
Multi-Layer Perceptron
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
Neural Network Basics Anns are analytical systems that address problems whose solutions have not been explicitly formulated Structure in which multiple.
PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKS Joe Bradish CS5802 Fall 2015.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
SUPERVISED LEARNING NETWORK
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Chapter 6 Neural Network.
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
ECE 471/571 - Lecture 16 Hopfield Network 11/03/15.
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
Neural networks.
Multiple-Layer Networks and Backpropagation Algorithms
Complementary Learning Systems
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
RNNs: An example applied to the prediction task
Supervised Learning in ANNs
ECE 471/571 - Lecture 15 Hopfield Network 03/29/17.
James L. McClelland SS 100, May 31, 2011
Backpropagation in fully recurrent and continuous networks
Intro to NLP and Deep Learning
RNNs: Going Beyond the SRN in Language Prediction
Artificial Neural Network & Backpropagation Algorithm
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.
Backpropagation.
Machine Learning: Lecture 4
Machine Learning: UNIT-2 CHAPTER-1
Learning linguistic structure with simple recurrent neural networks
RNNs: Going Beyond the SRN in Language Prediction
A Dynamic System Analysis of Simultaneous Recurrent Neural Network
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
Presentation transcript:

Bernard Ans, Stéphane Rousset, Robert M. French & Serban Musca (European Commission grant HPRN-CT ) Preventing Catastrophic Interference in Multiple-Sequence Learning Using Coupled Reverberating Elman Networks

The Problem of Multiple-Sequence Learning Real cognition requires the ability to learn sequences of patterns (or actions). (This is why SRN’s – Elman Networks – were originally developed.) But learning sequences really means being able to learn multiple sequences without the most recently learned ones erasing the previously learned ones. Catastrophic interference is a serious problem for the sequential learning of individual patterns. It is far worse when multiple sequences of patterns have to be learned consecutively.

The Solution We have developed a “dual-network” system using coupled Elman networks that completely solves this problem. These two separate networks exchange information by means of “reverberated pseudopatterns.”

Pseudopatterns f(x) Assume a network-in-a-box learns a series of patterns produced by a function f(x). These original patterns are no longer available. How can you approximate f(x)?

Random Input

Random Input 1 1 0Associated output

Random Input 1 1 0Associated output This creates a pseudopattern:  1 :  1 1 0

A large enough collection of these pseudopatterns:  1 :   2 :   3 :   4 :  Etc will approximate the originally learned function.

Transferring information from Net 1 to Net 2 with pseudopatterns target input Random input Associated output Net 1 Net 2

Information transfer by pseudopatterns in dual-network systems New information is presented to one network (Net 1). Pseudopatterns are generated by Net 2 where previously learned information is stored. Net 1 then trains not only on the new pattern(s) to be learned, but also on the pseudopatterns produced by Net 2. Once Net 1 has learned the new information, it generates (lots of) pseudopatterns that train Net 2 This is why we say that information is continually transferred between the two networks by means of pseudopatterns.

Are all pseudopatterns created equal? No. Even though the simple dual-network system (i.e., new learning in one network; long-term storage in the other) using simple pseudopatterns does eliminate catastrophic interference, we can do better using “reverberated” pseudopatterns.

Building a Network that uses “reverberated” pseudopatterns. Input layer Hidden layer Output layer Start with a standard backpropagation network

Input layer Hidden layer Output layer Add an autoassociator

A new pattern to be learned, P: Input  Target, will be learned as shown below. Input Target Input

What are “reverberated pseudopatterns” and how are they generated?

We start with a random input î 0, feed it through the network and collect the output on the autoassociative side of the network.. This output is fed back into the input layer (“reverberated”) and, again, the output on the autoassociative side is collected. This is done R times.

After R reverberations, we associate the reverberated input and the “target” output. This forms the reverberated pseudopattern:

This dual-network approach using reverberated pseudopattern information transfer between the two networks effectively overcomes catastrophic interference in multiple-pattern learning Net 2 Storage network Net 1 New-learning network

But what about multiple-sequence learning? Elman networks are designed to learn sequences of patterns. But they forget catastrophically when they attempt to learn multiple sequences. Can we generalize the dual-network, reverberated pseudopattern technique to dual Elman networks and eliminate catastrophic interference in multiple-sequence learning? Yes

Elman networks (a.k.a. Simple Recurrent Networks) Copy hidden unit activations from previous time-step Standard input S(t) Context H(t-1) Hidden H(t) S(t+1) Learning a sequence S(1), S(2), …, S(n).

A “Reverberated Simple Recurrent Network” (RSRN): an Elman network with an autoassociative part

RSRN technique for sequentially learning two sequences A(t) and B(t). Net 1 learns A(t) completely. Reverberated pseudopattern transfer to Net 2. Net 1 makes one weight-change pass through B(t). Net 2 generates a few “static” reverberated pseudopatterns Net 1 does one learning epoch on these pseudopatterns from Net 2. Continue until Net 1 has learned B(t). Test how well Net 1 has retained A(t).

Two sequences to be learned: A(0), A(1), … A(10) and B(0), B(1), … B(10) Net 1 Net 2 Net 1 learns (completely) sequence A(0), A(1), …, A(10)

Net 1 Net Net 1 produces 10,000 pseudopatterns, : Input Teacher Transferring the learning to Net 2

Net 1 Net 2 feedforward Teacher Input Transferring the learning to Net 2

Net 1 Net 2 Backprop weight change For each of the 10,000 pseudopatterns produced by Net 1, Net 2 makes 1 FF-BP pass Input Teacher Transferring the learning to Net 2

Learning B(0), B(1), … B(10) by NET 1 Net 1 Net 2 1. Net 1 does ONE learning epoch on sequence B(0), B(1), …, B(10) 2. Net 2 generates a few pseudopatterns  NET 2 3. Net 1 does one FF-BP pass on each  NET 2

Net 1 Net 2 1. Net 1 does ONE learning epoch on sequence B(0), B(1), …, B(10) 2. Net 2 generates a few pseudopatterns  NET 2 Learning B(0), B(1), … B(10) by NET 1 3. Net 1 does one FF-BP pass on each  NET 2 Continue until Net 1 has learned B(0), B(1), …, B(10)

Sequences chosen Twenty-two distinct random binary vectors of length 100 are created. Half of these vectors are used to produce the first ordered sequence of items, A, denoted by A(0), A(1), …, A(10). The remaining 11 vectors are used to create a second sequence of items, B, denoted by B(0), B(1), …, B(10). In order to introduce a degree of ambiguity into each sequence (so that a simple BP network would not be able to learn them), we modify each sequence so that A(8) = A(5) and B(5) = B(1).

Test method First, sequence A is completely learned by the network. Then sequence B is learned. During the course of learning, we monitor at regular intervals how much of sequence A has been forgotten by the network.

Normal Elman networks: Catastrophic forgetting (a): Learning of sequence B (after having previously learned sequence A). By 450 epochs (an epoch corresponds to one pass through the entire sequence), sequence B has been completely learned. (b): The number of incorrect units (out of 100) for each serial position of sequence A during learning of sequence B. After 450 epochs, the SRN has, for all intents and purposes, completely forgotten the previously learned sequence A

Dual-RSRN’s: Catastrophic forgetting is eliminated Recall performance for sequences B and A during learning of sequence B by a dual-network RSRN. (a): By 400 epochs, the second sequence B has been completely learned. (b): The previously learned sequence A shows virtually no forgetting. Catastrophic forgetting of the previously learned sequence A has been completely overcome.

Normal Elman Network: Massive forgetting % Error on Sequence A Dual RSRN: No forgetting of Sequence A Seq. B being learned

Cognitive/Neurobiological plausibility? The brain, somehow, does not forget catastrophically. Separating new learning from previously learned information seems necessary. McClelland, McNaughton, O’Reilly (1995) have suggested the hippocampal-neocortical separation may be Nature’s way of solving this problem. Pseudopattern transfer is not so far-fetched if we accept results that claim that neo-cortical memory consolidation, is due, at least in part, to REM sleep.

Conclusions The RSRN reverberating dual-network architecture (Ans & Rousset, 1997, 2000) can be generalized to sequential learning of multiple temporal sequences. When learning multiple sequences of patterns, interleaving simple reverberated input-output pseudopatterns, each of which reflect the entire previously learned sequence(s), reduces (or eliminates entirely) forgetting of the initially learned sequence(s).