Computation and cognition: Unsupervised learning

Computation and cognition: Unsupervised learning
Oren Shriki

Information Maximization and Critical Brain Dynamics

Information Processing in the Brain Involves Feedforward, Recurrent and Feedback Processing
Input layer Internal representations Output layer

Question What is the role of recurrent interactions in sensory processing?

Network Architecture Input Layer W K Output Layer

Dynamics of the recurrent network
F.F input Recurrent input

Computational Task Maximize the mutual information between the (steady-state) output s and the input x with respect to W and K.

The Objective Function
Mutual Information Output Entropy Sensitivity Matrix

The Objective Function
Sensitivity Matrix

Geometrical Interpretation of the Objective Function
M-dimensional N-dimensional INPUT OUTPUT s x Geometrically, the target is to maximize the volume change in the transformation. This improves discrimination between similar inputs.

Learning Rules Feedforward Connections: Recurrent Connections: ,
For N=M and no recurrent interactions the network performs ICA (Independent Component Analysis). Shriki, Sompolinsky and Lee, 2001

Implementing the Learning Rules
For each data sample: a) Compute the total feedforward input to each neuron. b) Run a simulation of the recurrent network. c) Calculate the contribution for the update of W and K. 1. 2. Update W and K using the learning rules.

Example: 3D Representation of 2D Inputs
Input distribution:

A 3D View of the Transformation Performed by the Network
Two different views of the resulting surface The transformation tries to span the dynamic range of each output neuron in order to achieve the maximum entropy representation.

The Basic Message of this Talk
Maximizing mutual information Maximizing sensitivity (susceptibility) Recurrent networks Network parameters organize to operate near a critical point

Role of Recurrent Connections in Early Visual Processing

Analytical Results

A Simple Model of a Hypercolumn
“Retina”/ ”LGN” “Cortex”

Input Distribution and Feedforward Connections
The inputs are characterized by their orientation and contrast. The orientations are uniformly distributed. The contrast distribution is Gaussian (we denote the mean contrast by r). Orientation Contrast

The inputs are characterized by their orientation and contrast. The orientations are uniformly distributed. The contrast distribution is Gaussian (we denote the mean contrast by r). The feedforward connections are set such that the input to each neuron has a cosine tuning curve around a certain preferred orientation (PO). The feedforward connections are fixed throughout the learning stage.

Angle = “orientation” Radius = “contrast” The rows of W are unit vectors, uniformly distributed over all possible angles. Thus, the input to each neuron has a cosine tuning curve.

Analytical Results At the limit of low contrasts (r0) the optimal pattern of interactions can be calculated analytically. The profile of interactions as a function of distance in PO’s is predicted to have a cosine shape. The amplitude of the cosine profile is:

From normal amplification to hallucinations
At the optimal amplitude the network dynamics experiences a transition to a state of ‘Spontaneous symmetry breaking’ – For higher amplitudes, a profile of activity with a finite magnitude is formed even when the input is effectively uniform (with some random noise). The peak orientation is arbitrary – depending on the noise. This can be thought of as a “hallucination”.

The dynamics of a recurrent network with symmetric interactions is governed by an energy function. The shape of the energy function changes at the transition: energy function Energy sy sx k1<kc k1>kc

The Network Operates Near a Critical Point
A dynamical system is said to be at a critical state if it operates on the border between two qualitatively different types of behavior. Physical systems at a critical point have maximal susceptibility (sensitivity) to external inputs. Here, the network optimizes susceptibility and tends to operate near a critical point.

Numerical Results

Training with r=0.9: Input Amplification

r = 0.9 Pattern of Interactions
The amplitude of the interaction profile is ~5 (normalized by the number of neurons).

r = 0.9 Objective Function and Convergence Time

Training with r=0.1: Input Amplification

r = 0.1 Pattern of Interactions
The amplitude of the interaction profile is ~8 (normalized by the number of neurons).

Larger learning rate!

Training the Network with Natural Images
[OnStart -> forest image plus Title Our visual system has evolved to pick up relevant visual information quikcly and efficiently, but with minimal assumptions about exactly what we see (since this could lead to frequent hallucinations!). [ENTER 2nd forest image] An important assumption that visual systems can make is that the statistical nature of natural scenes is fairly stable. How can a visual system, either natural or synthetic, extract maximum information from a visual scene most efficiently? [ENTER] Infomax ICA, developed under ONR funding by Bell and Sejnowski, does just this. Infomax ICA is a neural network approach to blind signal processing that seeks to maximize the total information (in Shannon’s sense) in its output channels, given its input. This is equivalent to minimizing the mutual information contained in pairs of outputs. Applied to image patches from natural scenes like these by Tony Bell and others, [ENTER] ICA derives maximally informative sets of visual patch filters that strongly resemble the receptive fields of primary visual neurons. [ENTER]

Network Architecture “Retina”/ ”LGN” W K “Cortex”

Feedforward Filters Evolve to be Gabor-like Filters

(The filters were set manually)
Feedforward Filters (The filters were set manually)

Pattern of Interactions

Critical Slowing Down σ - a scaling parameter that multiplies the recurrent interaction matrix

Near-Criticality as a Universal Computational Principle
Operating near critical points may be a general principle used by recurrent networks in order to increase their sensitivity to external inputs. Critical systems have universal properties which do not depend on the specific microscopic details (e.g., details of single neuron dynamics and synaptic interactions). Several scenarios (e.g., high plasticity) can lead networks to become super-critical, which may manifest as hallucinations and more generally as neurological or neuropsychiatric phenomena.

One small step for the synapses – one giant leap for the network dynamics.

Application of the Model to Tinnitus
W K

Input attenuation leads to hallucinations in the deprived frequency range
Before: hallucination After: Spont. Activity = 0.5

Application of the Model to Synaesthesia
Module 1 Module 2 Input Output

Definitions of Synaesthesia
“For people with synesthesia, everyday activities (e.g., reading, listening to music) trigger unusual sensations (e.g. colours, tastes). These sensations are not triggered in everyone, but for synesthetes, they’re immediate, spontaneous, and tend to be consistent over time”. Baron-Cohen: Stimulation of one sensory modality automatically triggers a perception in a second sensory modality, in the absence of any stimulation in the second modality.

A Simple Network Model for Studying the Conditions for Cross-talk
Unit 1 Unit 2

Analytical Results

“Sensory Deprivation” Scenario
Unit 1 Unit 2

Cross-talk No Cross-talk Cross-talk

Unit 1 Unit 2

“Sensory Flooding” Scenario
Cross-talk No Crosstalk Cross-talk

What happens if we increase the learning rate?

“High Plasticity” Scenario
Cross-talk

Numerical Results

Two Coupled Hypercolumns
Input Hypercolumn 1 Hypercolumn 2 Output

“No Synaesthesia” Conditions
Pattern of Interactions r1 = 0.2 r2 = 0.2 Learning rate =

“No Synaesthesia” Conditions
Inducer-Concurrent Map

Pattern of Interactions r1 = 0.2 r2 = Learning rate =

Pattern of Interactions r1 = 0.2 r2 = 0.2 Learning rate = 0.002

The Phase of the Mapping is Arbitrary

Near-Criticality as a Universal Computational Principle
The correlation between sensory discrimination and general intelligence leads to the hypothesis that criticality may also play a role in the neural dynamics of networks underlying intelligence and cognitive control. (not a logical necessity, just a hypothesis….) Such a system would have a rich repertoire of representations and good ability to discriminate among representations. Criticality also lends itself to self similar representations (scaling).

Criticality and other NN architectures
Reservoir Computing: A random recurrent network generates a high-dimensional representation of inputs. Linear readout achieves very good performance (function approximation / decisions). Critical reservoir networks lead to better performance.

Criticality and other NN architectures
Deep Learning: Attempts to model high-level abstractions in data using architectures composed of multiple non-linear transformations. Abstract, higher level concepts are being learned from lower level ones. Hypothesis: Stacking infomax recurrent networks will produce a deep learning network which will operate near criticality.

Computation and cognition: Unsupervised learning

Similar presentations

Presentation on theme: "Computation and cognition: Unsupervised learning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computation and cognition: Unsupervised learning

Similar presentations

Presentation on theme: "Computation and cognition: Unsupervised learning"— Presentation transcript:

Similar presentations

About project

Feedback