Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computation and cognition: Unsupervised learning

Similar presentations


Presentation on theme: "Computation and cognition: Unsupervised learning"— Presentation transcript:

1 Computation and cognition: Unsupervised learning
Oren Shriki

2 Information Maximization and Critical Brain Dynamics

3 Information Processing in the Brain Involves Feedforward, Recurrent and Feedback Processing
Input layer Internal representations Output layer

4 Question What is the role of recurrent interactions in sensory processing?

5 Network Architecture Input Layer W K Output Layer

6 Dynamics of the recurrent network
F.F input Recurrent input

7 Computational Task Maximize the mutual information between the (steady-state) output s and the input x with respect to W and K.

8 The Objective Function
Mutual Information Output Entropy Sensitivity Matrix

9 The Objective Function
Sensitivity Matrix

10 Geometrical Interpretation of the Objective Function
M-dimensional N-dimensional INPUT OUTPUT s x Geometrically, the target is to maximize the volume change in the transformation. This improves discrimination between similar inputs.

11 Learning Rules Feedforward Connections: Recurrent Connections: ,
For N=M and no recurrent interactions the network performs ICA (Independent Component Analysis). Shriki, Sompolinsky and Lee, 2001

12 Implementing the Learning Rules
For each data sample: a) Compute the total feedforward input to each neuron. b) Run a simulation of the recurrent network. c) Calculate the contribution for the update of W and K. 1. 2. Update W and K using the learning rules.

13 Example: 3D Representation of 2D Inputs
Input distribution:

14 A 3D View of the Transformation Performed by the Network
Two different views of the resulting surface The transformation tries to span the dynamic range of each output neuron in order to achieve the maximum entropy representation.

15 The Basic Message of this Talk
Maximizing mutual information Maximizing sensitivity (susceptibility) Recurrent networks Network parameters organize to operate near a critical point

16 Role of Recurrent Connections in Early Visual Processing

17 Analytical Results

18 A Simple Model of a Hypercolumn
“Retina”/ ”LGN” “Cortex”

19 Input Distribution and Feedforward Connections
The inputs are characterized by their orientation and contrast. The orientations are uniformly distributed. The contrast distribution is Gaussian (we denote the mean contrast by r). Orientation Contrast

20 Input Distribution and Feedforward Connections
The inputs are characterized by their orientation and contrast. The orientations are uniformly distributed. The contrast distribution is Gaussian (we denote the mean contrast by r). The feedforward connections are set such that the input to each neuron has a cosine tuning curve around a certain preferred orientation (PO). The feedforward connections are fixed throughout the learning stage.

21 Input Distribution and Feedforward Connections
Angle = “orientation” Radius = “contrast” The rows of W are unit vectors, uniformly distributed over all possible angles. Thus, the input to each neuron has a cosine tuning curve.

22 Analytical Results At the limit of low contrasts (r0) the optimal pattern of interactions can be calculated analytically. The profile of interactions as a function of distance in PO’s is predicted to have a cosine shape. The amplitude of the cosine profile is:

23 From normal amplification to hallucinations
At the optimal amplitude the network dynamics experiences a transition to a state of ‘Spontaneous symmetry breaking’ – For higher amplitudes, a profile of activity with a finite magnitude is formed even when the input is effectively uniform (with some random noise). The peak orientation is arbitrary – depending on the noise. This can be thought of as a “hallucination”.

24 The dynamics of a recurrent network with symmetric interactions is governed by an energy function. The shape of the energy function changes at the transition: energy function Energy sy sx k1<kc k1>kc

25 The Network Operates Near a Critical Point
A dynamical system is said to be at a critical state if it operates on the border between two qualitatively different types of behavior. Physical systems at a critical point have maximal susceptibility (sensitivity) to external inputs. Here, the network optimizes susceptibility and tends to operate near a critical point.

26 Numerical Results

27 Training with r=0.9: Input Amplification

28 r = 0.9 Pattern of Interactions
The amplitude of the interaction profile is ~5 (normalized by the number of neurons).

29 r = 0.9 Objective Function and Convergence Time

30 Training with r=0.1: Input Amplification

31 r = 0.1 Pattern of Interactions
The amplitude of the interaction profile is ~8 (normalized by the number of neurons).

32 r = 0.1 Objective Function and Convergence Time

33 r = 0.1 Objective Function and Convergence Time
Larger learning rate!

34 Training the Network with Natural Images
[OnStart -> forest image plus Title Our visual system has evolved to pick up relevant visual information quikcly and efficiently, but with minimal assumptions about exactly what we see (since this could lead to frequent hallucinations!). [ENTER 2nd forest image] An important assumption that visual systems can make is that the statistical nature of natural scenes is fairly stable. How can a visual system, either natural or synthetic, extract maximum information from a visual scene most efficiently? [ENTER] Infomax ICA, developed under ONR funding by Bell and Sejnowski, does just this. Infomax ICA is a neural network approach to blind signal processing that seeks to maximize the total information (in Shannon’s sense) in its output channels, given its input. This is equivalent to minimizing the mutual information contained in pairs of outputs. Applied to image patches from natural scenes like these by Tony Bell and others, [ENTER] ICA derives maximally informative sets of visual patch filters that strongly resemble the receptive fields of primary visual neurons. [ENTER]

35 Network Architecture “Retina”/ ”LGN” W K “Cortex”

36 Feedforward Filters Evolve to be Gabor-like Filters

37 Feedforward Filters Evolve to be Gabor-like Filters

38 (The filters were set manually)
Feedforward Filters (The filters were set manually)

39 Pattern of Interactions

40 Pattern of Interactions

41 Critical Slowing Down σ - a scaling parameter that multiplies the recurrent interaction matrix

42 Near-Criticality as a Universal Computational Principle
Operating near critical points may be a general principle used by recurrent networks in order to increase their sensitivity to external inputs. Critical systems have universal properties which do not depend on the specific microscopic details (e.g., details of single neuron dynamics and synaptic interactions). Several scenarios (e.g., high plasticity) can lead networks to become super-critical, which may manifest as hallucinations and more generally as neurological or neuropsychiatric phenomena.

43 One small step for the synapses – one giant leap for the network dynamics.

44 Application of the Model to Tinnitus
W K

45 Input attenuation leads to hallucinations in the deprived frequency range
Before: hallucination After: Spont. Activity = 0.5

46 Application of the Model to Synaesthesia
Module 1 Module 2 Input Output

47 Definitions of Synaesthesia
“For people with synesthesia, everyday activities (e.g., reading, listening to music) trigger unusual sensations (e.g. colours, tastes). These sensations are not triggered in everyone, but for synesthetes, they’re immediate, spontaneous, and tend to be consistent over time”. Baron-Cohen: Stimulation of one sensory modality automatically triggers a perception in a second sensory modality, in the absence of any stimulation in the second modality.

48

49

50 A Simple Network Model for Studying the Conditions for Cross-talk
Unit 1 Unit 2

51 Analytical Results

52 “Sensory Deprivation” Scenario
Unit 1 Unit 2

53 “Sensory Deprivation” Scenario
Cross-talk No Cross-talk Cross-talk

54 “Sensory Deprivation” Scenario
Unit 1 Unit 2

55 “Sensory Flooding” Scenario
Cross-talk No Crosstalk Cross-talk

56 What happens if we increase the learning rate?

57 “High Plasticity” Scenario
Cross-talk

58 Numerical Results

59 Two Coupled Hypercolumns
Input Hypercolumn 1 Hypercolumn 2 Output

60 “No Synaesthesia” Conditions
Pattern of Interactions r1 = 0.2 r2 = 0.2 Learning rate =

61 “No Synaesthesia” Conditions
Inducer-Concurrent Map

62 “Sensory Deprivation” Scenario
Pattern of Interactions r1 = 0.2 r2 = Learning rate =

63 “Sensory Deprivation” Scenario
Inducer-Concurrent Map

64 “High Plasticity” Scenario
Pattern of Interactions r1 = 0.2 r2 = 0.2 Learning rate = 0.002

65 “High Plasticity” Scenario
Inducer-Concurrent Map

66 The Phase of the Mapping is Arbitrary
Inducer-Concurrent Map

67

68

69 Near-Criticality as a Universal Computational Principle
The correlation between sensory discrimination and general intelligence leads to the hypothesis that criticality may also play a role in the neural dynamics of networks underlying intelligence and cognitive control. (not a logical necessity, just a hypothesis….) Such a system would have a rich repertoire of representations and good ability to discriminate among representations. Criticality also lends itself to self similar representations (scaling).

70 Criticality and other NN architectures
Reservoir Computing: A random recurrent network generates a high-dimensional representation of inputs. Linear readout achieves very good performance (function approximation / decisions). Critical reservoir networks lead to better performance.

71 Criticality and other NN architectures
Deep Learning: Attempts to model high-level abstractions in data using architectures composed of multiple non-linear transformations. Abstract, higher level concepts are being learned from lower level ones. Hypothesis: Stacking infomax recurrent networks will produce a deep learning network which will operate near criticality.


Download ppt "Computation and cognition: Unsupervised learning"

Similar presentations


Ads by Google