Complementary Learning Systems

Complementary Learning Systems
McClelland, McNaughton & O’Reilly, 1995 McClelland & Goddard, 1996 Anthony Cate March 22, 2001

Hippocampal Amnesic Syndrome
Patient HM Bilateral medial temporal lobectomy Still alive

Patient HM - Deficits Explicit memory for events and episodes New facts (arbitrary) Paired Associate Learning “locomotive-dishtowel”

Patient HM – Preserved Abilities
Motor skill acquisition Implicit learning - priming Memories from before damage, with a qualification…

Temporally Graded Retrograde Amnesia

The sea monster in your head:

Hippocampal Anatomy Archicortex Phylogenetically older than neocortex, fewer layers Much smaller than neocortex, especially in humans

Inputs from diffuse areas of neocortex converge on hippocampus (via the Entorhinal Cortex)

Hippocampus forms a simple circuit:

Hippocampus encodes arbitrary conjunctions

Hippocampus reproduces patterns during sleep

The McC, McN & O’R model Information processing takes place via the propagation of activation among neurons in the neocortical system

The McC, McN & O’R model Implicit learning results from small changes to connections among neurons active in each episode of processing Single processing episodes produce item-specific effects Skills arise through accumulation of episodes

The McC, McN & O’R model New arbitrary associations are based on learning that takes place in the hippocampal system that occurs alongside with processing occurs in the cortex Bidirectional connections between neocortex and hippocampal system Large connection changes in hippocampus for fast learning Recall of new associations depends on pattern completion in the hippocampus

The McC, McN & O’R model Consolidation of a memory occurs through the accumulation of small weight changes in neocortical connections These changes are spread over time They result from repeated reinstatements of the hippocampal memory to the neocortex This may happen when a memory is retrieved from the hippocampal system, and during sleep.

Discovery of Structure
How the cortex learns about the world: Discovery of Structure via Interleaved Learning Rumelhart 1990 Train a multi-layer network on a set of propositions about things in the world

An intuitive view of conceptual structure:

Rumelhart 1990

How does the model discover this structure?
The concept representation units exploit similarities between patterns The most efficient way to reduce error is by grouping inputs with similar outputs on the same concept unit

With interleaved learning at a slow rate, concept unit representations differentiate

These patterns of activation describe a hierarchical structure!

Points from Rumelhart’s model
Interleaved learning allows a network to discover a structure in an environment of patterns A “hierarchical” relationship can be found in this structure based on the graded similarity of particular patterns.

Rumelhart’s model depends on:
Interleaved learning Slow learning rate A set of inputs that overlap in similarity What if learning had to take place in a situation where none of these applied?

Failures of this architecture (Why a neocortex alone is inufficient):
One-shot learning (=high learning rate) Focused learning (=non-interleaved training)

Catastrophic Interference!

Modeling Paired Associate Learning
McCloskey & Cohen 1989 AB-AC paradigm: List A: List B: List C: locomotive dishtowel seltzer table pinecone headphones weasel jacket waterfall … … …

A simple model for paired associate learning:

The model completely fails to retain the AB associations once the AC associations are learned.

Catastrophic interference is a general phenomenon that occurs in complex models as well.
Training a network on a set of patterns with backprop and similar algorithms guarantees optimal weights for that set Guarantees nothing about other patterns

Catastrophic interference in Rumelhart’s model:

Catastrophic interference can be overcome by changing a model’s architecture or the set of training patterns Use patterns with less overlap Weight changes for each pattern won’t affect performance of network for other patterns

But maybe we don’t want to make these changes.
Problem with less overlapping patterns: No way to extract structure, because by definition no structure in pattern set Structure = covariations

Another problem with less overlap:
Less generalization A novel input will never be very similar to trained patterns, so the network cannot produce an appropriate output We never see exactly the same thing twice

How arbitrary are real world exceptions and events?
“Pint:” most of the phonemes are regular JFK assassination example: most of what we know about an event is drawn from existing associations in memory

The role of the hippocampus
Encodes patterns in sparse, non-overlapping manner Trains cortex by reinstating this pattern repeatedly over time.

A simple model: In this model, hippocampal & cortical representations are same Consolidation rate “C” determines rate at which hippocampal units feed patterns to cortical units

The role of the hippocampus
Sparse coding allows a high learning rate, since learning one pattern won’t interfere with weights for another pattern

A problem with sparse coding:
There are many fewer cells in the hippocampus than in the neocortex How can the hippocampus encode patterns in a way that is both sparse and compressed?

Sparse coding and hidden unit activity:
Coarse coding Sparse coding Really sparse coding

Really, really sparse coding:

Random, conjunctive coding
k-winners-take-all: the weights between the k most active units and a randomly selected hidden unit are increased

Implementing this in a model of the hippocampal system:
Assume that cortex employs componential coding -- efficient, good for generalization: Letters from 9x9 pixel array can be encoded with 13 feature units, instead of 81 pixel units

Much overlap between patterns
With componetial coding, about 34% of the 13 input units will be active to represent a given letter With the 9x9=81, 1 pixel = 1 hidden unit system, 28% of input units active Much overlap between patterns This is good for generalization Bad for pattern separation

Really, really sparse coding:
Have every possible combination of 3 input units correspond to 1 hidden unit For an input consisting of 5 pixel letters, over 10 million such triples! A 5 letter input would activate about 230,000 of these triples Only 2% of possible input triples active – much more sparse

Sparse hippocampal representation
“Compressed” represention from cortex Cortical representation

How this model maps onto anatomy:

Plausibility of this encoding scheme:
Many more cells in Dental Gyrus (hippocampus proper) than in Entorhinal Cortex Autoassocitor (via recurrent collaterals) and Hebbian pattern associator also present in basic hippocampal circuit

In short: Representations in cortex compressed via connections with Entorhinal Cortex These coarse, componential representations are made sparse (random, conjunctive coding) via connections with hippocampus proper First compress, then sparsify

Problems with these models:
Does the hippocampus really encode all kinds of arbitrary associations, or just spatial maps? Cortical learning implies that only a prototype stored in memory -- no information about individual training events in cortex

In summary: Important pairs of concepts:
Interleaved/Focused (training) Slow/Fast (learning rates) Coarse/Sparse (representations)

Complementary Learning Systems

Similar presentations

Presentation on theme: "Complementary Learning Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Complementary Learning Systems

Similar presentations

Presentation on theme: "Complementary Learning Systems"— Presentation transcript:

Similar presentations

About project

Feedback