Robert M. French LEAD-CNRS UMR 5022 Dijon, France Why you will remember this talk and a Neural Network would not... Or The Problem (and a Solution to)

Slides:

Advertisements

Similar presentations

Artificial Intelligence 12. Two Layer ANNs

Advertisements

Bioinspired Computing Lecture 16

Computer Organization and Architecture

CSCI 4717/5717 Computer Architecture

1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)

Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.

B.Macukow 1 Lecture 12 Neural Networks. B.Macukow 2 Neural Networks for Matrix Algebra Problems.

Learning linguistic structure with simple recurrent networks February 20, 2013.

Artificial Neural Networks - Introduction -

Machine Learning Neural Networks

Simple Neural Nets For Pattern Classification

Recurrent Neural Networks

Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.

1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.

Connectionist Modeling Some material taken from cspeech.ucd.ie/~connectionism and Rich & Knight, 1991.

Bernard Ans, Stéphane Rousset, Robert M. French & Serban Musca (European Commission grant HPRN-CT ) Preventing Catastrophic Interference in.

CHAPTER 11 Back-Propagation Ming-Feng Yeh.

November 21, 2012Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms III 1 Learning in the BPN Gradients of two-dimensional functions:

Neural Networks William Lai Chris Rowlett. What are Neural Networks? A type of program that is completely different from functional programming. Consists.

Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.

Radial Basis Function (RBF) Networks

Radial Basis Function Networks

Rohit Ray ESE 251. What are Artificial Neural Networks? ANN are inspired by models of the biological nervous systems such as the brain Novel structure.

Traffic Sign Recognition Using Artificial Neural Network Radi Bekker

© Negnevitsky, Pearson Education, Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works Introduction, or.

Classification Part 3: Artificial Neural Networks

Artificial Neural Networks

Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.

Artificial Neural Network Theory and Application Ashish Venugopal Sriram Gollapalli Ulas Bardak.

Neural Networks Architecture Baktash Babadi IPM, SCS Fall 2004.

Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.

Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10

Copyright 2004 Compsim LLC The Right Brain Architecture of a Holonic Manufacturing System Application of KEEL ® Technology to Holonic Manufacturing Systems.

Artificial Neural Network Supervised Learning دكترمحسن كاهاني

NEURAL NETWORKS FOR DATA MINING

Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.

Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.

Connectionist Models of Language Development: Grammar and the Lexicon Steve R. Howell McMaster University, 1999.

1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.

Modelling Language Evolution Lecture 1: Introduction to Learning Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Cyclic Redundancy Check (CRC).  An error detection mechanism in which a special number is appended to a block of data in order to detect any changes.

Neural Networks in Computer Science n CS/PY 231 Lab Presentation # 1 n January 14, 2005 n Mount Union College.

Introduction to Neural Networks and Example Applications in HCI Nick Gentile.

Neural Network Basics Anns are analytical systems that address problems whose solutions have not been explicitly formulated Structure in which multiple.

1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.

Over-Trained Network Node Removal and Neurotransmitter-Inspired Artificial Neural Networks By: Kyle Wray.

Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.

Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22

Chapter 18 Connectionist Models

EEE502 Pattern Recognition

What It Is To Be Conscious: Exploring the Plausibility of Consciousness in Deep Learning Computers Senior Project – Philosophy and Computer Science ID.

Artificial Neural Networks (ANN). Artificial Neural Networks First proposed in 1940s as an attempt to simulate the human brain’s cognitive learning processes.

Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.

Chapter 6 Neural Network.

Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.

Complementary Learning Systems

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.

RNNs: An example applied to the prediction task

James L. McClelland SS 100, May 31, 2011

Backpropagation in fully recurrent and continuous networks

RNNs: Going Beyond the SRN in Language Prediction

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.

Machine Learning: Lecture 4

Machine Learning: UNIT-2 CHAPTER-1

Artificial Intelligence 12. Two Layer ANNs

RNNs: Going Beyond the SRN in Language Prediction

A Dynamic System Analysis of Simultaneous Recurrent Neural Network

CSC321: Neural Networks Lecture 11: Learning in recurrent networks

CSC 578 Neural Networks and Deep Learning

Presentation transcript:

Robert M. French LEAD-CNRS UMR 5022 Dijon, France Why you will remember this talk and a Neural Network would not... Or The Problem (and a Solution to) Catastrophic Interference in Neural Networks

Organization of this talk What is catastrophic forgetting and why does it occur? A dual-network technique using pseudopattern information transfer for overcoming it for multiple pattern-learning. What is “pseudopattern information transfer”? What are “reverberated” pseudopatterns? What about sequence-learning? What about learning multiple sequences? Applications, theoretical questions and future directions

The problem NEURAL NETWORKS FORGET CATASTROPHICALLY Learning new information can may completely destroy previously learned information. This makes sequential learning — i.e., learning one thing after another, the way humans learn — impossible. Can sequential learning be modeled so that: i) New information does not interfere catastrophically with already-learned information? and ii) Without keeping previously learned items around?

Barnes-Underwood (1959) forgetting paradigm Subjects learn List A-B (non-word/word pairs): pled – table splog –book bim – car milt – bag etc. until they have all pairs learned. Then they learn List A-C (same non-words, different real word) pled –rock splog –cow bim – square milt – clock etc. different target items here first set of target items same items

How humans do on this task

How artificial neural networks (backprop) do on this task

WHY does this occur? Answer: Overlap of internal representations “Catastrophic forgetting is a direct consequence of the overlap of distributed representations and can be reduced by reducing this overlap.” (French, 1991)

How can we reduce this overlap of internal representations? Also, this seems to be the solution discovered by the brain. Hippocampus – Neocortex. (McClelland, McNaughton, & O’Reilly, 1995) Two separate networks in continual interaction: one for long-term storage, one for immediate processing of new patterns. (French, 1997; etc.) Answer:

Implementation We have implemented a “dual-network” system using coupled networks that completely solves this problem (French, 1997; Ans & Rousset, 1997, 2000; Ans, Rousset, French, & Musca, 2002, in press). These two separate networks exchange information by means of “reverberated pseudopatterns.”

Pseudopatterns? Reverberated pseudopatterns? What?

Pseudopatterns f(x) Assume a network-in-a-box learns a series of patterns produced by a function f(x). These original patterns are no longer available. How can you approximate f(x)?

Random Input

Random Input 1 1 0Associated output

Random Input 1 1 0Associated output This creates a pseudopattern:  1 :  1 1 0

A large enough collection of these pseudopatterns:  1 :   2 :   3 :   4 :  Etc will approximate the originally learned function.

Transferring information from Net 1 to Net 2 with pseudopatterns target input Random input Associated output Net 1 Net 2

Learning new information in Net 1 with pseudopatterns from Net target random input New input Target Net 1 Net New pattern to learn:  010 Net 1

target random input New input Net 1 Net Etc. Target

This is how information is continually transferred between the two networks by means of pseudopatterns.

Sequential Learning using the dual-network approach Sequential learning of 20 patterns – one after the other (French, 1997)

On to reverberated pseudopatterns... Even though the simple dual-network system (i.e., new learning in one network; long-term storage in the other) using simple pseudopatterns does eliminate catastrophic interference, we can do better using “reverberated” pseudopatterns.

Building a Network that uses “reverberated” pseudopatterns. Input layer Hidden layer Output layer Start with a standard backpropagation network

Input layer Hidden layer Output layer Add an autoassociator

A new pattern to be learned, P: Input  Target, will be learned as shown below. Input Target Input

We start with a random input î 0, feed it through the network and collect the output on the autoassociative side of the network.. This output is fed back into the input layer (“reverberated”) and, again, the output on the autoassociative side is collected. This is done R times.

After R reverberations, we associate the reverberated input and the “target” output. This forms the reverberated pseudopattern:

This dual-network approach using reverberated pseudopattern information transfer between the two networks effectively overcomes catastrophic interference in multiple-pattern learning Net 2 Storage network Net 1 New-learning network

But what about multiple-sequence learning? Elman networks are designed to learn sequences of patterns. But they forget catastrophically when they attempt to learn multiple sequences. Can we generalize the dual-network, reverberated pseudopattern technique to dual Elman networks and eliminate catastrophic interference in multiple-sequence learning? Yes

The Problem of Multiple-Sequence Learning Real cognition requires the ability to learn sequences of patterns (or actions). (This is why SRN’s – Elman Networks – were originally developed.) But learning sequences really means being able to learn multiple sequences without the most recently learned ones erasing the previously learned ones. Catastrophic interference is a serious problem for the sequential learning of individual patterns. It is far worse when multiple sequences of patterns have to be learned consecutively.

Elman networks (a.k.a. Simple Recurrent Networks) Copy hidden unit activations from previous time-step Standard input S(t) Context H(t-1) Hidden H(t) S(t+1) Learning a sequence S(1), S(2), …, S(n).

A “Reverberated Simple Recurrent Network” (RSRN): an Elman network with an autoassociative part

RSRN technique for sequentially learning two sequences A(t) and B(t). Net 1 learns A(t) completely. Reverberated pseudopattern transfer to Net 2. Net 1 makes one weight-change pass through B(t). Net 2 generates a single “static” reverberated pseudopattern Net 1 does one learning epoch on this pseudopattern from Net 2. Continue until Net 1 has learned B(t). Test how well Net 1 has retained A(t).

Two sequences to be learned: A(0), A(1), … A(10) and B(0), B(1), … B(10) Net 1 Net 2 Net 1 learns (completely) sequence A(0), A(1), …, A(10)

Net 1 Net : Input Teacher Transferring the learning to Net 2

Net 1 Net 2 feedforward Teacher Input Transferring the learning to Net 2

Net 1 Net 2 Backprop weight change Repeat for 10,000 pseudopatterns produced by Net Input Teacher Transferring the learning to Net 2

Learning B(0), B(1), … B(10) by NET 1 Net 1 Net 2 1. Net 1 does ONE learning epoch on sequence B(0), B(1), …, B(10) 2. Net 2 generates ONE pseudopattern:  NET 2 3. Net 1 does one FF-BP pass on  NET 2

Net 1 Net 2 1. Net 1 does ONE learning epoch on sequence B(0), B(1), …, B(10) 2. Net 2 generates ONE pseudopattern:  NET 2 Learning B(0), B(1), … B(10) by NET 1 3. Net 1 does one FF-BP pass on  NET 2 Continue until Net 1 has learned B(0), B(1), …, B(10)

Sequences chosen Twenty-two distinct random binary vectors of length 100 are created. Half of these vectors are used to produce the first ordered sequence of items, A, denoted by A(0), A(1), …, A(10). The remaining 11 vectors are used to create a second sequence of items, B, denoted by B(0), B(1), …, B(10). In order to introduce a degree of ambiguity into each sequence (so that a simple BP network would not be able to learn them), we modify each sequence so that A(5) = A(8) and B(1) = B(5).

Test method First, sequence A is completely learned by the network. Then sequence B is learned. During the course of learning, we monitor at regular intervals how much of sequence A has been forgotten by the network.

Normal Elman networks: Catastrophic forgetting (height of bars equals how much forgetting has occurred). By 450 epochs sequence B has been completely learned. However, the SRN has, for all intents and purposes, completely forgotten the previously learned sequence A

As Sequence B is being learned, recall performance for Sequence A in the Dual-RSRN model By 400 epochs, the second sequence B has been completely learned. The previously learned sequence A shows virtually no forgetting. Forgetting – not just catastrophic forgetting – of the previously learned sequence A has been completely overcome.

Normal Elman Network: Massive forgetting % Error on Sequence A Dual RSRN: No forgetting of Sequence A

Cognitive/Neurobiological plausibility? The brain, somehow, does not forget catastrophically. Separating new learning from previously learned information seems necessary. McClelland, McNaughton, O’Reilly (1995) have suggested the hippocampal-neocortical separation may be Nature’s way of solving this problem. Pseudopattern transfer is not so far-fetched if we accept results that claim that neo-cortical memory consolidation, is due, at least in part, to REM sleep.

Prediction of the model : "Recall rebound" 2-network RSRN Number of presentations of the new sequence Old sequences (% correct) Empirical data: Recall rebound confirmed Humans Old sequences (% correct) Number of presentations of the new sequence

Learning a new language: initial drop in performance for the first language, followed by regaining of initial levels of performance. Learning a new piece of music Learning new motor activities, etc. Examples of the "recall rebound" in the real world

In case you missed it... What is so interesting about the RSRN procedure is that by means of a number of “static” input-output patterns (pseudopatterns), we can transfer sequential information into another network. In other words, a Sequence: A-B-C-D-B-E-C-F-G of actions, words, patterns, etc. can be transferred by means of a set of I/O patterns. OK, cute. But why is this so interesting?

Attention, all roboticists! Consider a group of robots exploring their world R4R5R3R1R2 R1 is learning the sequence of actions to open a door. R2 “ “ “ “ “ “ “ open a window. R3 “ “ “ “ “ “ “ pick up a block.

But R1 is continually broadcasting pseudopatterns that will be picked up by the other robots and interleaved with the sequence they are learning. Thus, all robots within transmission range of R1 will learn how to open a door, without ever having actually opened a door. Similarly, all robots (including R1) learn what the other robots have learned by picking up their pseudopatterns. Efficient Robot communication

Assume that there is a long sequence where each letter represents an action: A-B-C-B-A-S-T-B-S-Q-S-A-D-B Efficient Parallel Learning R1 learns this and transmits to R2 learns this and transmits to R3 learns this and transmits to R4 R4 will have then learned the entire sequence!

Other research issues Is this the optimal way to generate pseudopatterns? Consider the following function learned by an ANN:

III There are really two parts to the function.

III too many pseudopatterns here not enough pseudopatterns here Uniform distribution of pseudopatterns

III A much better distribution that uses feedback (Holland, 1975, 1992)

Noise, dynamical agents and psychological theory The apparent ubiquity of 1/f noise … may inform and guide development of psychological theory (e.g., Gilden, 2001; Van Orden et al., 2003; Wagenmakers, Farrell, & Ratcliff, in press; Ward, 2002). One claim made on the basis of such findings is that 1/f noise reflects the emergent global dynamics of locally interacting agents. (Farrell, Wagenmakers, Ratcliff, submitted)

Conclusions For engineers: The dual-network architecture using pseudopattern information transfer solves the problem of catastrophic forgetting in artificial neural networks, both for multiple-pattern learning and for multiple-sequence learning. For cognitive psychologists and neuroscientists: Rather than being a problem, there is a good chance that evolution has found a way to allow the brain has also turned noise to its advantage as an efficient mechanism of information transfer.