The University of Manchester Introducción al análisis del código neuronal con métodos de la teoría de la información Dr Marcelo A Montemurro

Slides:



Advertisements
Similar presentations
Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Advertisements

Applied Algorithmics - week7
Probabilistic models Haixu Tang School of Informatics.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
1 Testing the Efficiency of Sensory Coding with Optimal Stimulus Ensembles C. K. Machens, T. Gollisch, O. Kolesnikova, and A.V.M. Herz Presented by Tomoki.
Shin Ishii Nara Institute of Science and Technology
BCS547 Neural Encoding.
Chapter 6 Information Theory
Visual Recognition Tutorial
For stimulus s, have estimated s est Bias: Cramer-Rao bound: Mean square error: Variance: Fisher information How good is our estimate? (ML is unbiased:
Middle Term Exam 03/04, in class. Project It is a team work No more than 2 people for each team Define a project of your own Otherwise, I will assign.
For a random variable X with distribution p(x), entropy is given by H[X] = -  x p(x) log 2 p(x) “Information” = mutual information: how much knowing the.
Fundamental limits in Information Theory Chapter 10 :
Information Theory Eighteenth Meeting. A Communication Model Messages are produced by a source transmitted over a channel to the destination. encoded.
Information Theory Rong Jin. Outline  Information  Entropy  Mutual information  Noisy channel model.
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Information Theory and Security
Noise, Information Theory, and Entropy
Noise, Information Theory, and Entropy
Basic Concepts in Information Theory
Some basic concepts of Information Theory and Entropy
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
STATISTIC & INFORMATION THEORY (CSNB134)
2. Mathematical Foundations
Information Theory & Coding…
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
1 CY1B2 Statistics Aims: To introduce basic statistics. Outcomes: To understand some fundamental concepts in statistics, and be able to apply some probability.
Tahereh Toosi IPM. Recap 2 [Churchland and Abbott, 2012]
All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer.
Random Sampling, Point Estimation and Maximum Likelihood.
Theory of Probability Statistics for Business and Economics.
Information Coding in noisy channel error protection:-- improve tolerance of errors error detection: --- indicate occurrence of errors. Source.
Inferring Decision Trees Using the Minimum Description Length Principle J. R. Quinlan and R. L. Rivest Information and Computation 80, , 1989.
(Important to algorithm analysis )
Basic Concepts of Encoding Codes, their efficiency and redundancy 1.
Channel Capacity.
Synergy, redundancy, and independence in population codes, revisited. or Are correlations important? Peter Latham* and Sheila Nirenberg † *Gatsby Computational.
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory I AI-lab
Summer 2004CS 4953 The Hidden Art of Steganography A Brief Introduction to Information Theory  Information theory is a branch of science that deals with.
Introduction to Digital and Analog Communication Systems
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Coding Theory Efficient and Reliable Transfer of Information
Physics 270 – Experimental Physics. Let say we are given a functional relationship between several measured variables Q(x, y, …) x ±  x and x ±  y What.
Web page: Textbook. Abbott and Dayan. Homework and grades Office Hours.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
Mathematical Foundations Elementary Probability Theory Essential Information Theory Updated 11/11/2005.
Source Coding Efficient Data Representation A.J. Han Vinck.
BCS547 Neural Decoding.
Ka-fu Wong © 2003 Chap 6- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
Data Stream Algorithms Lower Bounds Graham Cormode
Expected values of discrete Random Variables. The function that maps S into S X in R and which is denoted by X(.) is called a random variable. The name.
Entropy (YAC- Ch. 6)  Introduce the thermodynamic property called Entropy (S)  Entropy is defined using the Clausius inequality  Introduce the Increase.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
1 Erklären Sie den Unterschied zwischen P(t,s), P(s,t), P(t|s) und P(s|t). Gibt es hier Gleichheiten zwischen irgendwelchen dieser Terme? Wann gilt P(s,t)
Basic Concepts of Information Theory A measure of uncertainty. Entropy. 1.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Mutual Information, Joint Entropy & Conditional Entropy
Mutual Information and Channel Capacity Multimedia Security.
Essential Probability & Statistics (Lecture for CS397-CXZ Algorithms in Bioinformatics) Jan. 23, 2004 ChengXiang Zhai Department of Computer Science University.
SEAC-3 J.Teuhola Information-Theoretic Foundations Founder: Claude Shannon, 1940’s Gives bounds for:  Ultimate data compression  Ultimate transmission.
Neural Codes. Neuronal codes Spiking models: Hodgkin Huxley Model (brief repetition) Reduction of the HH-Model to two dimensions (general) FitzHugh-Nagumo.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
Information Theory Information Suppose that we have the source alphabet of q symbols s 1, s 2,.., s q, each with its probability p(s i )=p i. How much.
1 Neural Codes. 2 Neuronal Codes – Action potentials as the elementary units voltage clamp from a brain cell of a fly.
CH 8. Image Compression 8.1 Fundamental 8.2 Image compression models
Context-based Data Compression
Chapter 5: Sampling Distributions
Presentation transcript:

The University of Manchester Introducción al análisis del código neuronal con métodos de la teoría de la información Dr Marcelo A Montemurro

Information theory

Entropy Suppose there is a source that produces symbols, taken from a given alphabet Assume also that there is a certain probability distribution, with support over the alphabet, that determines that outcome of the source (for the moment we assume iid sources).

Probability of observing the outcome i Normalisation of a probability distribution We define the ‘surprise’ of event i as Empirical determination of a probability There are n i outcomes of event i in a total of N trials. Then if N>>1 [bits]

headstails p(heads)=0.5 p(tails)=0.5 What is the average surprise? Average of a random variable Example

Then the average surprise is Entropy For our coin,

Frequency of letters in English text p(a)=0.082; p(e)=0.127; p(q)=0.001 Surprise of letter ‘e’ Surprise of letter ‘q’ Example

If all the letters appeared with the same probability, then and Which is larger than for the real distribution. It can be shown that the entropy attains its maximum value for a uniform distribution.

Imagine a loaded die that produces always the same outcome What is the surprise of each outcome? What is the average surprise?

What if the dice is fair? What is the surprise of each outcome? What is the average surprise?

In general, the less uniform a distribution (less random) the lower is the entropy

In general, for the independent binary variable case

Thus for a noiseless communication system the entropy quantifies the amount of information that can be encoded in the signal Signal with low entropy -> low information Signal with high entropy -> high information a b 0 1 p(a) p(b) Noiseless channel

trial 1 trial 2 trial 3 Stimulus 1 (3) (4) (3) (2) trial 1 trial 2 trial 3 Stimulus 2 (5) (6) (5) (4) However, many real systems, like neurons, have a noisy output Because of the noise, a new variability has to be taken into account. On the one hand, we have the variability of the stimulus (good variability); on the other we have the variability created by the noise (bad variability) How to handle this more complex problem? How can we quantify information in the presence of noise in the channel?

noisy channel receiver transmitter X Y p(Y|X)

a b 0 1 p(a) p(b) Noiseless channel a b 0 1 p(a) p(b) Noisy channel

stimulus s response r P(r|s) Probabilistic dictionary The amount of information about the stimulus encoded in the neural response is quantified by the Mutual Information I(S;R) In general Mutual Information quantifies how much can be known about one variable by looking at the other. It can be computed from real data by characterising the stimulus-response statistics.

Mutual Information Response entropy: variability of the whole response Noise entropy: variability of the response at fixed stimulus

a b 0 1 Noisy binary channel Stimulus={a, b} p(S)={p(a),p(b)} Response={0,1} p(R)={p(0),p(1)} P(R|S)= Stimulus Response Probabilistic dictionary

Simple example a b 0 1 p(S)={0.5, 0.5}

Let us first find p(R)={p(0), p(1)} We must find p(0) and p(1) then

Now we can find the entropies to compute the information

Then, to compute the information we just take the difference between the two entropies a b 0 1

What is the meaning of information?

Response entropy: variability of the whole response Noise entropy: variability of the response at fixed stimulus

Stimulus entropy: variability of the whole stimulus Noise entropy: variability of the stimulus at fixed response

Meaning 1 : Number of yes/no questions to indentify the stimulus Stimulus 1Response 1 Stimulus 2Response 2 Stimulus 3Response 3 P(S)=1/4 H(S)H(S) Stimulus 4Response 4 Before observing the responses,questions need to be asked on average When a response is observed,questions need to be asked on average a) Deterministic responses H(S)=2

Stimulus 1Response 1 Response 2 Response 3 P(S)=1/2 H(S)H(S) Stimulus 2 Before observing the responses,questions need to be asked on average When a response is observed,questions need to be asked on average b) Overlapping responses H(S)=1 Information measures the reduction in uncertainty about the stimulus, after the responses are observed

trial 1 trial 2 trial 3 Stimulus 1 (3) (4) (3) (2) trial 1 trial 2 trial 3 Stimulus 2 (5) (6) (5) (4) Meaning 2: upper bound to the number of messages that can be transmitted through a communication channel Question: what is the number of stimuli n that can be encoded in the neural response such that their responses do not overlap? responses to S1 all responses responses to S2 responses to S3 responses to S4

Typical sequences also What is the probability of a given sequence? A typical sequence is such that every symbol appears e number of times equal to its average Then the probability of a typical sequence will be Taking logs ThenIs the probability of each typical sequence Example:

Is the probability of each typical sequence. What is the probability all typical sequences? First, how many typical sequences are there? If is the number of all typical sequences, then the total probability is Example: When we have k symbols If the sequences are very long,we can compute (using Stirling’s approximation: log(n!)=n log (n)-n) and

Question: what is the number of stimuli n that can be encoded in the neural response such that their responses do not overlap? responses to S1 all responses responses to S2 responses to S3 responses to S4

Simple explanation there are typically 2 H(R) responses that could generated by the stimulus However, due to the ‘noise’ fluctuations in the response a number 2 H(R|S) of different responses that can be attributed to the same stimulus 2 H(R)2 H(R) 2 H(R|S)2 H(R|S) Then, how many stimuli can be reliably encoded in the neural response? Therefore, finding that a neuron transmits n bits of information within a behaviourally relevant time window, means that there are potentially 2 n different stimuli that can be discriminated only on the basis of the neuron’s response.

How do we estimate information in a neural system?

External stimulus Sensory system Spike trains Encoding T [ms] L … … … 0110 S 1 stimuli trials per stimulus S 2 S 3 Stimulus conditions P(r|s) N s S r=(r 1, r 2, …, r L ) Each stimulus is presented with probability P(s) T=L Δt

P(r|t) P(r) Response probability conditional to the stimulus (at fixed time t) Unconditional response probability Trials P(r|t) Time window T Bin of size ∆t Response entropy: variability of the whole response Noise entropy: variability of the response at fixed time Mutual Information quantifies how much variability is left after subtracting the effect of noise. It is measured in bits (Meaning 3)

To measure P(r|s) we need to estimate up to 2 L -1 parameters from the data The statistical errors in the estimation of P(r|s) lead to a systematic bias in the entropies Number of response ‘words’ with non zero probability For N s >>1 we can obtain a first order approximation to the bias With N=N s S Bias in the information estimation Miller, A G. Info. Theory in Psychology (1955)

The response is more random. Responses are more uniformly spread over possible response words The response is less random. Responses are more concentrated over a few response words large so bias is large Small, so bias is small

Because of the bias the information is overestimated -Bias [H(R|S)]>-Bias[H(R)] Bias [H(R)-H(R|S)]>0 Adapted from Panzeri at al J. Neurophysiol 2007

A lower bound to the information For words of length L, we need to estimate at least 2^L parameters from the data! Independent model In general Using the independent model we can compute To estimate this probability we need only 2L parameters! This entropy is much less biased

Trial Trial Trial r 1 r 2 r 3 r 4 Shuffling Trial Trial Trial r 1 r 2 r 3 r 4 There is an alternative way of estimating the entropy of the independent model. Instead of neglecting the correlations by computing the marginals, we simply destroy them in the original dataset.

a) b) Essentially because shuffling creates a larger number of response words with non zero probability

Log 2 (trials) Information [bits] I I sh Δ I Δ I Log 2 (trials) Information [bits]  I  I sh  Δ I  Δ I Montemurro et al Neural Computation (2007) Now we propose the following estimator fro the entropy

Further improvements can be achieved with extrapolation methods We have N trials. We then get estimates of the entropy for different subsets of trials: N/2, and N/4 This gives 3 estimation of the information: I 1, I 2, and I 4 Up to 2 nd order this is the equation of a parabola in 1/N. Quadratic extrapolation

The practical Efficiency of neural code of the H1 neuron of the fly

Experiment was done: right before sunset, at midday, and right after sunset The same visual seen was presented times. 1)Examine the data 2)Generate rasters for the three conditions 3)Compute the time varying firing rate, allowing for different binnings. 4)Compute spike-count information as a function of window size 5)Compute spike-time information as a function of window size 6)Determine the maximum response word length for which the estimation is accurate 7)Compute the efficiency of the code: e=I(R,S)/H(R)=1-H(R|S)/H(R) 8) Discuss