BCS547 Neural Encoding.

Slides:



Advertisements
Similar presentations
What is the neural code? Puchalla et al., What is the neural code? Encoding: how does a stimulus cause the pattern of responses? what are the responses.
Advertisements

Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
Random Variables ECE460 Spring, 2012.
Spike Train Statistics Sabri IPM. Review of spike train  Extracting information from spike trains  Noisy environment:  in vitro  in vivo  measurement.
Random Variable A random variable X is a function that assign a real number, X(ζ), to each outcome ζ in the sample space of a random experiment. Domain.
An Introductory to Statistical Models of Neural Data SCS-IPM به نام خالق ناشناخته ها.
Neurophysics Part 1: Neural encoding and decoding (Ch 1-4) Stimulus to response (1-2) Response to stimulus, information in spikes (3-4) Part 2: Neurons.
1 Testing the Efficiency of Sensory Coding with Optimal Stimulus Ensembles C. K. Machens, T. Gollisch, O. Kolesnikova, and A.V.M. Herz Presented by Tomoki.
What is the language of single cells? What are the elementary symbols of the code? Most typically, we think about the response as a firing rate, r(t),
Shin Ishii Nara Institute of Science and Technology
The University of Manchester Introducción al análisis del código neuronal con métodos de la teoría de la información Dr Marcelo A Montemurro
Chapter 6 Information Theory
Reading population codes: a neural implementation of ideal observers Sophie Deneve, Peter Latham, and Alexandre Pouget.
For stimulus s, have estimated s est Bias: Cramer-Rao bound: Mean square error: Variance: Fisher information How good is our estimate? (ML is unbiased:
Background Knowledge Brief Review on Counting,Counting, Probability,Probability, Statistics,Statistics, I. TheoryI. Theory.
Fundamental limits in Information Theory Chapter 10 :
Probability theory Much inspired by the presentation of Kren and Samuelsson.
2015/6/15VLC 2006 PART 1 Introduction on Video Coding StandardsVLC 2006 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
Discrete Event Simulation How to generate RV according to a specified distribution? geometric Poisson etc. Example of a DEVS: repair problem.
Spike-triggering stimulus features stimulus X(t) multidimensional decision function spike output Y(t) x1x1 x2x2 x3x3 f1f1 f2f2 f3f3 Functional models of.
Machine Learning CMPT 726 Simon Fraser University
ECE 776 Information Theory Capacity of Fading Channels with Channel Side Information Andrea J. Goldsmith and Pravin P. Varaiya, Professor Name: Dr. Osvaldo.
Spike Train decoding Summary Decoding of stimulus from response –Two choice case Discrimination ROC curves –Population decoding MAP and ML estimators.
2015/7/12VLC 2008 PART 1 Introduction on Video Coding StandardsVLC 2008 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Review of Probability.
Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
1 Performance Evaluation of Computer Systems By Behzad Akbari Tarbiat Modares University Spring 2009 Introduction to Probabilities: Discrete Random Variables.
STATISTIC & INFORMATION THEORY (CSNB134)
2. Mathematical Foundations
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
1. Entropy as an Information Measure - Discrete variable definition Relationship to Code Length - Continuous Variable Differential Entropy 2. Maximum Entropy.
Tahereh Toosi IPM. Recap 2 [Churchland and Abbott, 2012]
Population Coding Alexandre Pouget Okinawa Computational Neuroscience Course Okinawa, Japan November 2004.
§4 Continuous source and Gaussian channel
TELECOMMUNICATIONS Dr. Hugh Blanton ENTC 4307/ENTC 5307.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
1 / 41 Inference and Computation with Population Codes 13 November 2012 Inference and Computation with Population Codes Alexandre Pouget, Peter Dayan,
Theory of Probability Statistics for Business and Economics.
Biological Modeling of Neural Networks Week 8 – Noisy input models: Barrage of spike arrivals Wulfram Gerstner EPFL, Lausanne, Switzerland 8.1 Variation.
Neural coding (1) LECTURE 8. I.Introduction − Topographic Maps in Cortex − Synesthesia − Firing rates and tuning curves.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Channel Capacity.
Projects: 1.Predictive coding in balanced spiking networks (Erwan Ledoux). 2.Using Canonical Correlation Analysis (CCA) to analyse neural data (David Schulz).
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory I AI-lab
Web page: Textbook. Abbott and Dayan. Homework and grades Office Hours.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
Mathematical Foundations Elementary Probability Theory Essential Information Theory Updated 11/11/2005.
BCS547 Neural Decoding.
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
Expected values of discrete Random Variables. The function that maps S into S X in R and which is denoted by X(.) is called a random variable. The name.
1 Lecture 7 System Models Attributes of a man-made system. Concerns in the design of a distributed system Communication channels Entropy and mutual information.
Probability and Distributions. Deterministic vs. Random Processes In deterministic processes, the outcome can be predicted exactly in advance Eg. Force.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Eizaburo Doi, CNS meeting at CNBC/CMU, 2005/09/21 Redundancy in the Population Code of the Retina Puchalla, Schneidman, Harris, and Berry (2005)
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
Bayesian Perception.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
Theoretical distributions: the Normal distribution.
Statistical methods in NLP Course 2 Diana Trandab ă ț
Statistical methods in NLP Course 2
CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.
2016 METHODS IN COMPUTATIONAL NEUROSCIENCE COURSE
Appendix A: Probability Theory
Outline Introduction Signal, random variable, random process and spectra Analog modulation Analog to digital conversion Digital transmission through baseband.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

BCS547 Neural Encoding

Introduction to computational neuroscience 10/01 Neural encoding 17/01 Neural decoding   24/01 Low level vision 31/01 Object recognition 7/02 Bayesian Perception 21/02 Sensorimotor transformations

Neural Encoding

What’s a code? Example Deterministic code: A ->10 B -> 01 If you see the string 01, recovering the encoded letter (B) is easy.

Noise and coding Two of the hardest problems with coding come from: Non invertible codes (e.g., two values get mapped onto the same code) Noise

Example Noisy code: A -> 01 with p=0.8 -> 10 with p=0.2 B -> 01 with p=0.3 -> 10 with p=0.7 Now, given the string 01, it’s no longer obvious what the encoded letter is…

What types of codes and noises are found in the nervous system?

q: Direction of motion Receptive field Code: number of spikes 10 Response Stimulus

q: Direction of motion Receptive field 10 Trial 1 7 Trial 2 4 Trial 3 8 4 Trial 1 Trial 2 Trial 3 Trial 4 Stimulus

Variance of the noise, si(0)2 Variance, si(q)2, can depend on the input Mean activity fi(0) Tuning curve fi(q) Encoded variable (q)

Tuning curves and noise The activity (# of spikes per second) of a neuron can be written as: where fi(q) is the mean activity of the neuron (the tuning curve) and ni is a noise with zero mean. If the noise is gaussian, then:

Probability distributions and activity The noise is a random variable which can be characterized by a conditional probability distribution, P(ni|q). Since the activity of a neuron is the sum of a deterministic term, fi(q), and the noise, it is also a random variable with a conditional probability distribution, P(ai| q). The distributions of the activity and the noise differ only by their means (E[ni]=0, E[ai]=fi(q)).

P(ai|q=0) P(ai|q=-60) Activity distribution P(ai|q=-60)

Examples of activity distributions Gaussian noise with fixed variance Gaussian noise with variance equal to the mean

Poisson activity (or noise): The Poisson distribution works only for discrete random variables. However, the mean, fi(q), does not have to be an integer. The variance of a Poisson distribution is equal to its mean.

Comparison of Poisson vs Gaussian noise with variance equal to the mean 0.09 0.08 0.07 0.06 0.05 Probability 0.04 0.03 0.02 0.01 20 40 60 80 100 120 140 Activity (spike/sec)

Poisson noise and renewal process We bin time into small intervals, dt. Then, for each interval, we toss a coin with probability, P(head) =p. If we get a head, we record a spike. For small p, the number of spikes per second follows a Poisson distribution with mean p/dt spikes/second (e.g., p=0.01, dt=1ms, mean=10 spikes/sec).

Properties of a Poisson process The number of events follows a Poisson distribution (in particular the variance should be equal to the mean) A Poisson process does not care about the past, i.e., at a given time step, the outcome of the coin toss is independent of the past. As a result, the inter-event intervals follow an exponential distribution (Caution: this is not a good marker of a Poisson process)

Poisson process and spiking The inter spike interval (ISI) distribution is indeed close to an exponential except for short intervals (refractory period) and for bursting neurons. (CV close to 1, Softy and Koch, fig. 1 and 3) The variance in the spike count is proportional to the mean but the the constant of proportionality is 1.2 instead of 1 and there is spontaneous activity. (Softy and Koch, fig. 5)

Open Questions Is this Poisson variability really noise? Where does it come from? Hard question because dendrites integrate their inputs and average out the noise (Softky and Koch)

Non-Answers It’s probably not in the sensory inputs It’s not the spike initiation mechanism (Mainen and Sejnowski) It’s not the stochastic nature of ionic channels It’s probably not the unreliable synapses.

Possible Answers Neurons embedded in a recurrent network with sparse connectivity tend to fire with statistics close to Poisson (Van Vreeswick and Sompolinski) Random walk model (Shadlen and Newsome; Troyer and Miller)

Problems with the random walk model The ratio variance over mean is still smaller than the one measured in vivo (0.8 vs 1.2) It’s unstable over several layers! (this is likely to be a problem for any mechanisms…) Noise injection in real neurons fails to produce the predicted variability (Stevens, Zador)

Other sources of noise and uncertainty Shot noise in the retina Physical noise in the stimulus Uncertainty inherent to the stimulus (e.g. the aperture problem)

An alternative: information theory Beyond tuning curves Tuning curves are often non invariant under stimulus changes (e.g. motion tuning curves for blobs vs bars) Deal poorly with time varying stimulus Assume a rate code An alternative: information theory

Information Theory (Shannon)

Definitions: Entropy Entropy: Measures the degree of uncertainty Minimum number of bits required to encode a random variable

P(X=1) = p P(X=0) = 1- p Entropy (bits) Probability (p) 1 0.9 0.8 0.7 0.6 Entropy (bits) 0.5 0.4 0.3 0.2 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Probability (p)

Maximum entropy is achieved for flat probability distributions, i. e Maximum entropy is achieved for flat probability distributions, i.e., for distributions in which events are equally likely. For a given variance, the normal distribution is the one with maximum entropy

Entropy of Spike Trains A spike train can be turned into a binary vector by discretizing time into small bins (1ms or so). Computing the entropy of the spike train amounts to computing the entropy of the binary vector. 1 1 1 1 1 1 1

Definition: Conditional Entropy Uncertainty due to noise: How uncertain is X knowing Y?

Example H(Y|X) is equal to zero if the mapping from X to Y is deterministic and many to one. Ex: Y = X for odd X Y= X+1 for even X X=1, Y is equal to 2. H(Y|X)=0 Y=4, X is either 4 or 3. H(X|Y)>0

Example In general, H(X|Y)H(Y|X), except for an invertible and deterministic mapping, in which case H(X|Y)=H(Y|X)=0 Ex: Y= X+1 for all X Y=2, X is equal to 1. H(X|Y)=0 X=1, Y is equal to 2. H(Y|X)=0

Example If Y=f(X)+noise, H(Y|X) and H(X|Y) are strictly greater than zero Ex: Y is the firing rate of a noisy neuron, X is the orientation of a line: ai=fi(q)+ni. Knowing the firing rate does not tell you for sure what the orientation is, H(X|Y)= H(q|ai)>0.

Definition: Joint Entropy Special case. X and Y independent

Independent Variables If X and Y are independent, then knowing Y tells you nothing about X. In other words, knowing Y does not reduce the uncertainty about X, i.e., H(X)=H(X|Y). It follows that:

Entropy of Spike Trains For a given firing rate, the maximum entropy is achieved by a Poisson process because they generate the most unpredictable sequence of spikes .

Definition: Mutual Information Independent variables: H(Y|X)=H(Y)

Data Processing Inequality Computation and information transmission can only decrease mutual information: If Z=f(Y), I(Z,X)  I(X,Y) In other words, computation can only decrease information or change its format.

KL distance Mutual information can be rewritten as: This distance is zero when P(X,Y)=P(X)P(Y), i.e., when X and Y are independent.

Measuring entropy from data Consider a population of 100 neurons firing for 100ms with 1 ms time bins. Each data point is a 100x100 binary vector. The number of possible data point is 2100x100. To compute the entropy we need to estimate a probability distribution over all these states…Hopeless?…

Direct Method Fortunately, in general, only a fraction of all possible states actually occurs Direct method: evaluate P(A) and P(A|q) directly from the the data. Still require tons of data but not 2100x100…

Upper Bound Assume all distributions are gaussian Recover SNR in the Fourier domain using simple averaging Compute information from the SNR (box 3) No need to recover the full P(R) and P(R|S) because gaussian distributions are fully characterized by their mean and variance.

Lower Bound Estimate a variable from the neuronal responses and compute the mutual information between the estimate and the stimulus (easy if the estimate follows a gaussian distribution) The data processing inequality guarantees that this is a lower bound on information. It gives an idea of how well the estimated variable is encoded.

Mutual information in spikes Among temporal processes, Poisson processes are the one with highest entropy because time bins are independent from one another. The entropy of a Poisson spike train vector is the sum of the individual time bins, which is best you can achieve.

Mutual information in spikes A deterministic Poisson process is the best way to transmit information with spikes Spike trains are indeed close to Poisson BUT they are not deterministic, i.e., they vary from trial to trial even for a fixed input. Even worse, the conditional entropy is huge because the noise follows a poisson distribution

The choice of stimulus Neurons are known to be selective to particular features. In information theory terms, this means that two sets of stimuli with the same entropy do necessarily lead to the same amount of mutual information in the response of a neuron. Natural stimuli often lead to larger mutual information (which makes sense since they are more likely)

Information Theory: Pro Assumption free: does not assume any particular code Read-out free: does not depend on a read-out method (direct method) It can be used to identify the features best encoded by a neurons

Information theory: Con’s Does not tell you how to read out the code: the code might be unreadable by the rest of the nervous system. Data intensive: needs TONS of data

Animal system Method Bits per second spik e H igh-fr eq. cutoff (Neur on) (efficiency) or limiting Stim ulus timing Fl y visual 10 Lo w r 6 4 1 2 ms (H1) Motion 15 Dir ect 81 — 0.7 37 er and 36 (HS, graded potential) upper 104 Monk 16 5.5 0.6 100 (ar ea MT) dir 12 1.5 Fr og auditor 38 Noise 46 1.4 750 Hz (Auditor ner v ) Call 133 ( 20%) call 7.8 90%) Salamander 50 3.2 1.6 (22%) (Ganglion cells) Random spots Crick et cer cal 40 294 > 500 (Sensor aff ent) 50%) Mechanical motion 51 75–220 0.6–3.1 500–1000 Wind noise 11,38 8–80 A vg = 100–400 (10-2 10-3) Electric fish Absolute 0–200 0–1.2 200 (P-aff lo Amplitude modulation natur e neur os c i e n ce • v o lume 2 no 11 • no v e mb er 1999