Computation and cognition: Unsupervised learning Oren Shriki
Information Maximization and Critical Brain Dynamics
Information Processing in the Brain Involves Feedforward, Recurrent and Feedback Processing Input layer Internal representations Output layer
Question What is the role of recurrent interactions in sensory processing?
Network Architecture Input Layer W K Output Layer
Dynamics of the recurrent network F.F input Recurrent input
Computational Task Maximize the mutual information between the (steady-state) output s and the input x with respect to W and K.
The Objective Function Mutual Information Output Entropy Sensitivity Matrix
The Objective Function Sensitivity Matrix
Geometrical Interpretation of the Objective Function M-dimensional N-dimensional INPUT OUTPUT s x Geometrically, the target is to maximize the volume change in the transformation. This improves discrimination between similar inputs.
Learning Rules Feedforward Connections: Recurrent Connections: , For N=M and no recurrent interactions the network performs ICA (Independent Component Analysis). Shriki, Sompolinsky and Lee, 2001
Implementing the Learning Rules For each data sample: a) Compute the total feedforward input to each neuron. b) Run a simulation of the recurrent network. c) Calculate the contribution for the update of W and K. 1. 2. Update W and K using the learning rules.
Example: 3D Representation of 2D Inputs Input distribution:
A 3D View of the Transformation Performed by the Network Two different views of the resulting surface The transformation tries to span the dynamic range of each output neuron in order to achieve the maximum entropy representation.
The Basic Message of this Talk Maximizing mutual information Maximizing sensitivity (susceptibility) Recurrent networks Network parameters organize to operate near a critical point
Role of Recurrent Connections in Early Visual Processing
Analytical Results
A Simple Model of a Hypercolumn “Retina”/ ”LGN” “Cortex”
Input Distribution and Feedforward Connections The inputs are characterized by their orientation and contrast. The orientations are uniformly distributed. The contrast distribution is Gaussian (we denote the mean contrast by r). Orientation Contrast
Input Distribution and Feedforward Connections The inputs are characterized by their orientation and contrast. The orientations are uniformly distributed. The contrast distribution is Gaussian (we denote the mean contrast by r). The feedforward connections are set such that the input to each neuron has a cosine tuning curve around a certain preferred orientation (PO). The feedforward connections are fixed throughout the learning stage.
Input Distribution and Feedforward Connections Angle = “orientation” Radius = “contrast” The rows of W are unit vectors, uniformly distributed over all possible angles. Thus, the input to each neuron has a cosine tuning curve.
Analytical Results At the limit of low contrasts (r0) the optimal pattern of interactions can be calculated analytically. The profile of interactions as a function of distance in PO’s is predicted to have a cosine shape. The amplitude of the cosine profile is:
From normal amplification to hallucinations At the optimal amplitude the network dynamics experiences a transition to a state of ‘Spontaneous symmetry breaking’ – For higher amplitudes, a profile of activity with a finite magnitude is formed even when the input is effectively uniform (with some random noise). The peak orientation is arbitrary – depending on the noise. This can be thought of as a “hallucination”.
The dynamics of a recurrent network with symmetric interactions is governed by an energy function. The shape of the energy function changes at the transition: energy function Energy sy sx k1<kc k1>kc
The Network Operates Near a Critical Point A dynamical system is said to be at a critical state if it operates on the border between two qualitatively different types of behavior. Physical systems at a critical point have maximal susceptibility (sensitivity) to external inputs. Here, the network optimizes susceptibility and tends to operate near a critical point.
Numerical Results
Training with r=0.9: Input Amplification
r = 0.9 Pattern of Interactions The amplitude of the interaction profile is ~5 (normalized by the number of neurons).
r = 0.9 Objective Function and Convergence Time
Training with r=0.1: Input Amplification
r = 0.1 Pattern of Interactions The amplitude of the interaction profile is ~8 (normalized by the number of neurons).
r = 0.1 Objective Function and Convergence Time
r = 0.1 Objective Function and Convergence Time Larger learning rate!
Training the Network with Natural Images [OnStart -> forest image plus Title Our visual system has evolved to pick up relevant visual information quikcly and efficiently, but with minimal assumptions about exactly what we see (since this could lead to frequent hallucinations!). [ENTER 2nd forest image] An important assumption that visual systems can make is that the statistical nature of natural scenes is fairly stable. How can a visual system, either natural or synthetic, extract maximum information from a visual scene most efficiently? [ENTER] Infomax ICA, developed under ONR funding by Bell and Sejnowski, does just this. Infomax ICA is a neural network approach to blind signal processing that seeks to maximize the total information (in Shannon’s sense) in its output channels, given its input. This is equivalent to minimizing the mutual information contained in pairs of outputs. Applied to image patches from natural scenes like these by Tony Bell and others, [ENTER] ICA derives maximally informative sets of visual patch filters that strongly resemble the receptive fields of primary visual neurons. [ENTER]
Network Architecture “Retina”/ ”LGN” W K “Cortex”
Feedforward Filters Evolve to be Gabor-like Filters
Feedforward Filters Evolve to be Gabor-like Filters
(The filters were set manually) Feedforward Filters (The filters were set manually)
Pattern of Interactions
Pattern of Interactions
Critical Slowing Down σ - a scaling parameter that multiplies the recurrent interaction matrix
Near-Criticality as a Universal Computational Principle Operating near critical points may be a general principle used by recurrent networks in order to increase their sensitivity to external inputs. Critical systems have universal properties which do not depend on the specific microscopic details (e.g., details of single neuron dynamics and synaptic interactions). Several scenarios (e.g., high plasticity) can lead networks to become super-critical, which may manifest as hallucinations and more generally as neurological or neuropsychiatric phenomena.
One small step for the synapses – one giant leap for the network dynamics.
Application of the Model to Tinnitus W K
Input attenuation leads to hallucinations in the deprived frequency range Before: hallucination After: Spont. Activity = 0.5
Application of the Model to Synaesthesia Module 1 Module 2 Input Output
Definitions of Synaesthesia “For people with synesthesia, everyday activities (e.g., reading, listening to music) trigger unusual sensations (e.g. colours, tastes). These sensations are not triggered in everyone, but for synesthetes, they’re immediate, spontaneous, and tend to be consistent over time”. Baron-Cohen: Stimulation of one sensory modality automatically triggers a perception in a second sensory modality, in the absence of any stimulation in the second modality.
A Simple Network Model for Studying the Conditions for Cross-talk Unit 1 Unit 2
Analytical Results
“Sensory Deprivation” Scenario Unit 1 Unit 2
“Sensory Deprivation” Scenario Cross-talk No Cross-talk Cross-talk
“Sensory Deprivation” Scenario Unit 1 Unit 2
“Sensory Flooding” Scenario Cross-talk No Crosstalk Cross-talk
What happens if we increase the learning rate?
“High Plasticity” Scenario Cross-talk
Numerical Results
Two Coupled Hypercolumns Input Hypercolumn 1 Hypercolumn 2 Output
“No Synaesthesia” Conditions Pattern of Interactions r1 = 0.2 r2 = 0.2 Learning rate = 0.0006
“No Synaesthesia” Conditions Inducer-Concurrent Map
“Sensory Deprivation” Scenario Pattern of Interactions r1 = 0.2 r2 = 0.05 Learning rate = 0.0006
“Sensory Deprivation” Scenario Inducer-Concurrent Map
“High Plasticity” Scenario Pattern of Interactions r1 = 0.2 r2 = 0.2 Learning rate = 0.002
“High Plasticity” Scenario Inducer-Concurrent Map
The Phase of the Mapping is Arbitrary Inducer-Concurrent Map
Near-Criticality as a Universal Computational Principle The correlation between sensory discrimination and general intelligence leads to the hypothesis that criticality may also play a role in the neural dynamics of networks underlying intelligence and cognitive control. (not a logical necessity, just a hypothesis….) Such a system would have a rich repertoire of representations and good ability to discriminate among representations. Criticality also lends itself to self similar representations (scaling).
Criticality and other NN architectures Reservoir Computing: A random recurrent network generates a high-dimensional representation of inputs. Linear readout achieves very good performance (function approximation / decisions). Critical reservoir networks lead to better performance.
Criticality and other NN architectures Deep Learning: Attempts to model high-level abstractions in data using architectures composed of multiple non-linear transformations. Abstract, higher level concepts are being learned from lower level ones. Hypothesis: Stacking infomax recurrent networks will produce a deep learning network which will operate near criticality.