Computation and cognition: Unsupervised learning

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

An Information-Maximization Approach to Blind Separation and Blind Deconvolution A.J. Bell and T.J. Sejnowski Computational Modeling of Intelligence (Fri)
5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Artificial neural networks:
Artificial Neural Networks - Introduction -
Artificial Neural Networks - Introduction -
黃文中 Preview 2 3 The Saliency Map is a topographically arranged map that represents visual saliency of a corresponding visual scene. 4.
Reading population codes: a neural implementation of ideal observers Sophie Deneve, Peter Latham, and Alexandre Pouget.
Un Supervised Learning & Self Organizing Maps Learning From Examples
November 9, 2010Neural Networks Lecture 16: Counterpropagation 1 Unsupervised Learning So far, we have only looked at supervised learning, in which an.
Levels in Computational Neuroscience Reasonably good understanding (for our purposes!) Poor understanding Poorer understanding Very poorer understanding.
Information Theory and Learning
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Un Supervised Learning & Self Organizing Maps Learning From Examples
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Unsupervised learning
How to do backpropagation in a brain
Introduction to Mathematical Methods in Neurobiology: Dynamical Systems Oren Shriki 2009 Modeling Conductance-Based Networks by Rate Models 1.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
The search for organizing principles of brain function Needed at multiple levels: synapse => cell => brain area (cortical maps) => hierarchy of areas.
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
Low Level Visual Processing. Information Maximization in the Retina Hypothesis: ganglion cells try to transmit as much information as possible about the.
NEURAL NETWORKS FOR DATA MINING
黃文中 Introduction The Model Results Conclusion 2.
Projects: 1.Predictive coding in balanced spiking networks (Erwan Ledoux). 2.Using Canonical Correlation Analysis (CCA) to analyse neural data (David Schulz).
Chapter 7. Network models Firing rate model for neuron as a simplification for network analysis Neural coordinate transformation as an example of feed-forward.
Artificial Neural Networks Students: Albu Alexandru Deaconescu Ionu.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Colour and Texture. Extract 3-D information Using Vision Extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition.
Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.
Understanding early visual coding from information theory By Li Zhaoping Lecture at EU advanced course in computational neuroscience, Arcachon, France,
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Independent Component Analysis features of Color & Stereo images Authors: Patrik O. Hoyer Aapo Hyvarinen CIS 526: Neural Computation Presented by: Ajay.
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
1 Nonlinear models for Natural Image Statistics Urs Köster & Aapo Hyvärinen University of Helsinki.
Machine Learning Supervised Learning Classification and Regression
Big data classification using neural network
Unsupervised Learning of Video Representations using LSTMs
Self-Organizing Network Model (SOM) Session 11
Optical RESERVOIR COMPUTING
CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.
Deep Feedforward Networks
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
Neural Networks.
Cognitive Computing…. Computational Neuroscience
Artificial Intelligence (CS 370D)
Classification / Regression Neural Networks 2
Chapter 3. Artificial Neural Networks - Introduction -
Xaq Pitkow, Dora E. Angelaki  Neuron 
Neuro-Computing Lecture 4 Radial Basis Function Network
Presented by Rhee, Je-Keun
Perceptron as one Type of Linear Discriminants
EE513 Audio Signals and Systems
Artificial Intelligence Lecture No. 28
Brendan K. Murphy, Kenneth D. Miller  Neuron 
Introduction to Radial Basis Function Networks
Sparseness and Expansion in Sensory Representations
Parametric Methods Berlin Chen, 2005 References:
Curse of Dimensionality
Information Processing by Neuronal Populations Chapter 5 Measuring distributed properties of neural representations beyond the decoding of local variables:
Patrick Kaifosh, Attila Losonczy  Neuron 
Prediction Networks Prediction A simple example (section 3.7.3)
by Khaled Nasr, Pooja Viswanathan, and Andreas Nieder
Toward a Great Class Project: Discussion of Stoianov & Zorzi’s Numerosity Model Psych 209 – 2019 Feb 14, 2019.
Volume 74, Issue 1, Pages (April 2012)
How to win big by thinking straight about relatively trivial problems
Patrick Kaifosh, Attila Losonczy  Neuron 
Presentation transcript:

Computation and cognition: Unsupervised learning Oren Shriki

Information Maximization and Critical Brain Dynamics

Information Processing in the Brain Involves Feedforward, Recurrent and Feedback Processing Input layer Internal representations Output layer

Question What is the role of recurrent interactions in sensory processing?

Network Architecture Input Layer W K Output Layer

Dynamics of the recurrent network F.F input Recurrent input

Computational Task Maximize the mutual information between the (steady-state) output s and the input x with respect to W and K.

The Objective Function Mutual Information Output Entropy Sensitivity Matrix

The Objective Function Sensitivity Matrix

Geometrical Interpretation of the Objective Function M-dimensional N-dimensional INPUT OUTPUT s x Geometrically, the target is to maximize the volume change in the transformation. This improves discrimination between similar inputs.

Learning Rules Feedforward Connections: Recurrent Connections: , For N=M and no recurrent interactions the network performs ICA (Independent Component Analysis). Shriki, Sompolinsky and Lee, 2001

Implementing the Learning Rules For each data sample: a) Compute the total feedforward input to each neuron. b) Run a simulation of the recurrent network. c) Calculate the contribution for the update of W and K. 1. 2. Update W and K using the learning rules.

Example: 3D Representation of 2D Inputs Input distribution:

A 3D View of the Transformation Performed by the Network Two different views of the resulting surface The transformation tries to span the dynamic range of each output neuron in order to achieve the maximum entropy representation.

The Basic Message of this Talk Maximizing mutual information Maximizing sensitivity (susceptibility) Recurrent networks Network parameters organize to operate near a critical point

Role of Recurrent Connections in Early Visual Processing

Analytical Results

A Simple Model of a Hypercolumn “Retina”/ ”LGN” “Cortex”

Input Distribution and Feedforward Connections The inputs are characterized by their orientation and contrast. The orientations are uniformly distributed. The contrast distribution is Gaussian (we denote the mean contrast by r). Orientation Contrast

Input Distribution and Feedforward Connections The inputs are characterized by their orientation and contrast. The orientations are uniformly distributed. The contrast distribution is Gaussian (we denote the mean contrast by r). The feedforward connections are set such that the input to each neuron has a cosine tuning curve around a certain preferred orientation (PO). The feedforward connections are fixed throughout the learning stage.

Input Distribution and Feedforward Connections Angle = “orientation” Radius = “contrast” The rows of W are unit vectors, uniformly distributed over all possible angles. Thus, the input to each neuron has a cosine tuning curve.

Analytical Results At the limit of low contrasts (r0) the optimal pattern of interactions can be calculated analytically. The profile of interactions as a function of distance in PO’s is predicted to have a cosine shape. The amplitude of the cosine profile is:

From normal amplification to hallucinations At the optimal amplitude the network dynamics experiences a transition to a state of ‘Spontaneous symmetry breaking’ – For higher amplitudes, a profile of activity with a finite magnitude is formed even when the input is effectively uniform (with some random noise). The peak orientation is arbitrary – depending on the noise. This can be thought of as a “hallucination”.

The dynamics of a recurrent network with symmetric interactions is governed by an energy function. The shape of the energy function changes at the transition: energy function Energy sy sx k1<kc k1>kc

The Network Operates Near a Critical Point A dynamical system is said to be at a critical state if it operates on the border between two qualitatively different types of behavior. Physical systems at a critical point have maximal susceptibility (sensitivity) to external inputs. Here, the network optimizes susceptibility and tends to operate near a critical point.

Numerical Results

Training with r=0.9: Input Amplification

r = 0.9 Pattern of Interactions The amplitude of the interaction profile is ~5 (normalized by the number of neurons).

r = 0.9 Objective Function and Convergence Time

Training with r=0.1: Input Amplification

r = 0.1 Pattern of Interactions The amplitude of the interaction profile is ~8 (normalized by the number of neurons).

r = 0.1 Objective Function and Convergence Time

r = 0.1 Objective Function and Convergence Time Larger learning rate!

Training the Network with Natural Images [OnStart -> forest image plus Title Our visual system has evolved to pick up relevant visual information quikcly and efficiently, but with minimal assumptions about exactly what we see (since this could lead to frequent hallucinations!). [ENTER 2nd forest image] An important assumption that visual systems can make is that the statistical nature of natural scenes is fairly stable. How can a visual system, either natural or synthetic, extract maximum information from a visual scene most efficiently? [ENTER] Infomax ICA, developed under ONR funding by Bell and Sejnowski, does just this. Infomax ICA is a neural network approach to blind signal processing that seeks to maximize the total information (in Shannon’s sense) in its output channels, given its input. This is equivalent to minimizing the mutual information contained in pairs of outputs. Applied to image patches from natural scenes like these by Tony Bell and others, [ENTER] ICA derives maximally informative sets of visual patch filters that strongly resemble the receptive fields of primary visual neurons. [ENTER]

Network Architecture “Retina”/ ”LGN” W K “Cortex”

Feedforward Filters Evolve to be Gabor-like Filters

Feedforward Filters Evolve to be Gabor-like Filters

(The filters were set manually) Feedforward Filters (The filters were set manually)

Pattern of Interactions

Pattern of Interactions

Critical Slowing Down σ - a scaling parameter that multiplies the recurrent interaction matrix

Near-Criticality as a Universal Computational Principle Operating near critical points may be a general principle used by recurrent networks in order to increase their sensitivity to external inputs. Critical systems have universal properties which do not depend on the specific microscopic details (e.g., details of single neuron dynamics and synaptic interactions). Several scenarios (e.g., high plasticity) can lead networks to become super-critical, which may manifest as hallucinations and more generally as neurological or neuropsychiatric phenomena.

One small step for the synapses – one giant leap for the network dynamics.

Application of the Model to Tinnitus W K

Input attenuation leads to hallucinations in the deprived frequency range Before: hallucination After: Spont. Activity = 0.5

Application of the Model to Synaesthesia Module 1 Module 2 Input Output

Definitions of Synaesthesia “For people with synesthesia, everyday activities (e.g., reading, listening to music) trigger unusual sensations (e.g. colours, tastes). These sensations are not triggered in everyone, but for synesthetes, they’re immediate, spontaneous, and tend to be consistent over time”. Baron-Cohen: Stimulation of one sensory modality automatically triggers a perception in a second sensory modality, in the absence of any stimulation in the second modality.

A Simple Network Model for Studying the Conditions for Cross-talk Unit 1 Unit 2

Analytical Results

“Sensory Deprivation” Scenario Unit 1 Unit 2

“Sensory Deprivation” Scenario Cross-talk No Cross-talk Cross-talk

“Sensory Deprivation” Scenario Unit 1 Unit 2

“Sensory Flooding” Scenario Cross-talk No Crosstalk Cross-talk

What happens if we increase the learning rate?

“High Plasticity” Scenario Cross-talk

Numerical Results

Two Coupled Hypercolumns Input Hypercolumn 1 Hypercolumn 2 Output

“No Synaesthesia” Conditions Pattern of Interactions r1 = 0.2 r2 = 0.2 Learning rate = 0.0006

“No Synaesthesia” Conditions Inducer-Concurrent Map

“Sensory Deprivation” Scenario Pattern of Interactions r1 = 0.2 r2 = 0.05 Learning rate = 0.0006

“Sensory Deprivation” Scenario Inducer-Concurrent Map

“High Plasticity” Scenario Pattern of Interactions r1 = 0.2 r2 = 0.2 Learning rate = 0.002

“High Plasticity” Scenario Inducer-Concurrent Map

The Phase of the Mapping is Arbitrary Inducer-Concurrent Map

Near-Criticality as a Universal Computational Principle The correlation between sensory discrimination and general intelligence leads to the hypothesis that criticality may also play a role in the neural dynamics of networks underlying intelligence and cognitive control. (not a logical necessity, just a hypothesis….) Such a system would have a rich repertoire of representations and good ability to discriminate among representations. Criticality also lends itself to self similar representations (scaling).

Criticality and other NN architectures Reservoir Computing: A random recurrent network generates a high-dimensional representation of inputs. Linear readout achieves very good performance (function approximation / decisions). Critical reservoir networks lead to better performance.

Criticality and other NN architectures Deep Learning: Attempts to model high-level abstractions in data using architectures composed of multiple non-linear transformations. Abstract, higher level concepts are being learned from lower level ones. Hypothesis: Stacking infomax recurrent networks will produce a deep learning network which will operate near criticality.