Neural Coding Through The Ages February 1, 2002 Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational.

Slides:



Advertisements
Similar presentations
What is the neural code? Puchalla et al., What is the neural code? Encoding: how does a stimulus cause the pattern of responses? what are the responses.
Advertisements

What is the neural code?. Alan Litke, UCSD Reading out the neural code.
The linear/nonlinear model s*f 1. The spike-triggered average.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov Tomas Gedeon John P. Miller.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Spike Train Statistics Sabri IPM. Review of spike train  Extracting information from spike trains  Noisy environment:  in vitro  in vivo  measurement.
Lecture 3 Nonparametric density estimation and classification
Chapter 4: Linear Models for Classification
Neurophysics Part 1: Neural encoding and decoding (Ch 1-4) Stimulus to response (1-2) Response to stimulus, information in spikes (3-4) Part 2: Neurons.
What is the language of single cells? What are the elementary symbols of the code? Most typically, we think about the response as a firing rate, r(t),
Shin Ishii Nara Institute of Science and Technology
Visual Recognition Tutorial
Reading population codes: a neural implementation of ideal observers Sophie Deneve, Peter Latham, and Alexandre Pouget.
How well can we learn what the stimulus is by looking at the neural responses? We will discuss two approaches: devise and evaluate explicit algorithms.
For stimulus s, have estimated s est Bias: Cramer-Rao bound: Mean square error: Variance: Fisher information How good is our estimate? (ML is unbiased:
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.
Mutual Information Mathematical Biology Seminar
Modelling and Control Issues Arising in the Quest for a Neural Decoder Computation, Control, and Biological Systems Conference VIII, July 30, 2003 Albert.
For a random variable X with distribution p(x), entropy is given by H[X] = -  x p(x) log 2 p(x) “Information” = mutual information: how much knowing the.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Spike-triggering stimulus features stimulus X(t) multidimensional decision function spike output Y(t) x1x1 x2x2 x3x3 f1f1 f2f2 f3f3 Functional models of.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
Giansalvo EXIN Cirrincione unit #7/8 ERROR FUNCTIONS part one Goal for REGRESSION: to model the conditional distribution of the output variables, conditioned.
Machine Learning CMPT 726 Simon Fraser University
A Bifurcation Theoretical Approach to the Solving the Neural Coding Problem June 28 Albert E. Parker Complex Biological Systems Department of Mathematical.
Collaborators: Tomas Gedeon Alexander Dimitrov John P. Miller Zane Aldworth Information Theory and Neural Coding PhD Oral Examination November 29, 2001.
CS Pattern Recognition Review of Prerequisites in Math and Statistics Prepared by Li Yang Based on Appendix chapters of Pattern Recognition, 4.
Part III: Inference Topic 6 Sampling and Sampling Distributions
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Review of Lecture Two Linear Regression Normal Equation
Lecture 1 Signals in the Time and Frequency Domains
Population Coding Alexandre Pouget Okinawa Computational Neuroscience Course Okinawa, Japan November 2004.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 / 41 Inference and Computation with Population Codes 13 November 2012 Inference and Computation with Population Codes Alexandre Pouget, Peter Dayan,
Lecture 2 Signals and Systems (I)
2 2  Background  Vision in Human Brain  Efficient Coding Theory  Motivation  Natural Pictures  Methodology  Statistical Characteristics  Models.
Projects: 1.Predictive coding in balanced spiking networks (Erwan Ledoux). 2.Using Canonical Correlation Analysis (CCA) to analyse neural data (David Schulz).
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
What is the neural code?. Alan Litke, UCSD What is the neural code?
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
BCS547 Neural Decoding.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Fast Learning in Networks of Locally-Tuned Processing Units John Moody and Christian J. Darken Yale Computer Science Neural Computation 1, (1989)
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Univariate Gaussian Case (Cont.)
Review of statistical modeling and probability theory Alan Moses ML4bio.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
Neural Codes. Neuronal codes Spiking models: Hodgkin Huxley Model (brief repetition) Reduction of the HH-Model to two dimensions (general) FitzHugh-Nagumo.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Neural Codes. 2 Neuronal Codes – Action potentials as the elementary units voltage clamp from a brain cell of a fly.
Chapter 7. Classification and Prediction
Chapter 3: Maximum-Likelihood Parameter Estimation
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
spike-triggering stimulus features
EE513 Audio Signals and Systems
Generally Discriminant Analysis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Adaptive Rescaling Maximizes Information Transmission
Presentation transcript:

Neural Coding Through The Ages February 1, 2002 Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational Biology Montana State University

 Introduction  The Problem  Approaches in Neural Encoding –Spike Count Coding (Adrian and Zotterman 1926) –Poisson Model (Fatt and Katz 1952) –Wiener/Volterra series (1930, 1958)  Approaches in Neural Decoding –Linear Methods Linear Reconstruction (Reike et al 1997) Vector Method (Georgopoulos et al 1983) Optimal Linear Estimator (Abbot and Salinas 1994) –Gaussian Model (de Ruyter van Steveninck and Bialek 1988) –Metric Space (Victor and Purpura 1996) Outline

We want to understand the neural code. We seek an answer to the question: How does neural activity represent information about environmental stimuli? “The little fly sitting in the fly’s brain trying to fly the fly”

stimulus X(  ) response Y(t) Looking for the dictionary to the neural code … decoding encoding

… but the dictionary is not deterministic! Given a stimulus, an experimenter observes many different neural responses (spike trains): X(  ) Y i (t)| X(  ) i = 1, 2, 3, 4

… but the dictionary is not deterministic! Given a stimulus, an experimenter observes many different neural responses (spike trains): X(  ) Y i (t)| X(  ) i = 1, 2, 3, 4 Neural coding is stochastic!!

Similarly, neural decoding is stochastic: Y(t) X i (  )|Y(t) i = 1, 2, …, 9

Probability Framework X(  ) Y(t) encoder: P(Y(t) |X(  )) environmental stimuli neural responses decoder: P(X(  )|Y(t))

Y X environmental stimuli neural responses P(X(  ),Y(t)) Areas of high probability Information Theory tells us that if the relationship between X and Y can be modeled as an optimal communication channel, then … A coding scheme needs to be stochastic on a fine scale and almost deterministic on a large scale.

How to determine a coding scheme? There are 2 methodologies in this search: encoding (determining P(Y|X)) decoding (determining P(X|Y)) As we search for a coding scheme, we proceed in the spirit of John Tukey: It is better to be approximately right than exactly wrong.

Neural encoding ….

Stimulus Amplitude Response Amplitude Spike Count Coding (Adrian and Zotterman 1926) Encoding as a non-linear process: the response tuning curve Hanging weights from a muscle ( Adrian 1926) Stimulus Amplitude: mass in grams Moving a pattern across the visual field of a blowfly (de Ruyter van Steveninck and Bialek, 1988) Stimulus Amplitude: average velocity (omms/s) in 200ms window In Spike Count Coding, response amplitude is the (spikes count)/time

Stimulus Amplitude P(Y |X) Spike Count Coding spikes/time An experimenter repeats each stimulus many times to estimate the encoder P(Y | X).

Stimulus Amplitude P(X) P(Y) Spike Count Coding spikes/time P(Y |X) And now you can get P(X | Y) from Bayes rule

Spike Count Coding the directional tuning curve ( Miller, Jacobs and Theunissen 1991) The response tuning curves for the 4 interneurons in the cricket cercal sensory system. The preferred directions are orthogonal to each other.

Cons Counting spikes/time neglects the temporal pattern of the spikes of the neural response, which –Potentially decreases the information conveyed –Known short behavioral decision times imply that many neurons make use of just a few spikes Sensory systems respond to stimulus attributes that are very complex. That is, the space of possible stimuli for some systems is a very large (infinite dimensional) space. Hence, it is not feasible to present all possible stimuli in experiment. Pros Some neural systems do seem to encode certain stimulus attributes by (number spikes)/time. Can work well if the stimulus space is small (e.g. when coding direction in cricket cercal sensory system). Abbot, 2001

Electrical multi-site stimulation of chicken retina in vitro: P-type afferent responses in the electric fish to transdermal potential stimulus These (normalized) histograms give the probability per unit time of firing given that x(t) occurred: r[t |X(  )=x(  )] Poisson Model (Fatt and Katz 1952) (Xu, Payne and Nelson 1996) (Stett et al., 2000)

If we assume that the spikes are independent from each other given a stimulus X(  ), then we can model P(Y | X) as an inhomogenuous (dependent on time) Poisson process: Poisson Model If Y is the (spike count)/time, then P(Y | X(  )) = Poisson(  r[t |X(  )] dt) If Y(t) is a spike train, then P(Y(t) |X(  )) = Poisson-like(r[t |X(  )])

Poisson Model When is the Poisson model a good one to try? Examine the mean and variance of the spike counts/time given each stimulus. For a Poisson process, they should be equal: Abbot, 2001

Pros Have an explicit form of P(Y|X) A Poisson process is a basic, well studied process Cons Counting spikes/time neglects the temporal pattern of the spikes of the neural response. Assuming that spikes are independent neglects the refractory period of a neuron. This period must be ‘small’ compared to mean ISI in order for Poisson model to be appropriate. To deal with this: –Berry and Meister (1998) have proposed a Poisson model that includes the refractory period. –Emory Brown (2001) uses a mixture of Gamma and inverse Gaussian models. The space of possible stimuli for some systems is a very large space, so it is not possible to present all possible stimuli in experiments to estimate r(t | X(  )). (Stett et al., 2000)

Wiener / Volterra Series (1930, 1958) The Taylor series for a function y = g(x): y(x) = y(x 0 ) + y’(x 0 ) (x - x 0 ) + ½ y’’ (x 0 )(x - x 0 ) 2 + … = f 0 + f 1 (x - x 0 ) + f 2 (x - x 0 ) 2 + … The Volterra series is the analog of a Taylor series for a functional Y(t) = G[X(t)]: Y(t) = f 0 +  d  1 f 1 (  1 )X(t -  1 ) +  d  1  d  2 f 2 (  1,  2 )X(t -  1 ) X(t -  2 ) + … How to compute { f i } ?? Wiener reformulated the Volterra series in a way so that the new coefficient functions or kernels could be measured from experiment. There are theorems that assure that this series with sufficiently many terms provides a complete description for a broad class of systems. The first Wiener kernel is proportional to the cross correlation of stimulus and response: f 1 = /S X This is proportional to the spike triggered average when Y(t) is a spike train.

Wiener / Volterra Series Constructing the neural response (here, Y is the firing rate) from the first Wiener kernel … seems to be able to capture slow modulations in the firing fate. Reike et al. 1997: recordings from H1 of the fly Actual response Predicted response

Pros Computing the first Wiener kernel is inexpensive. Not much data is required. Cons Although it is theoretically possible to compute many terms in the Wiener series, practical low order approximations, of just f 0 and f 1 for example, don’t work well in practice (i.e. coding is NOT linear). Wiener series is for a continuous function Y(t). This is fine when Y(t) = r[t |X(  )], the firing rate. But how do we construct a series to model the discrete spiking of neurons? The Wiener series gives a specific Y(t) | X(  ). What is P(Y | X)? In principle, one can do a lot of repeated experiments to estimate P(Y | X). In practice, the preparation dies on you before enough data is collected. Slice of a fly brain

Neural Decoding Why Decoding? Encoding looks non-linear in many systems. Maybe decoding is linear and hence easier. It is conceivably easier to estimate P(X|Y) over an ensemble of responses {Y}, since {Y} live in a much smaller space than the {X}.

Linear Reconstruction Method (Reike et al 1997) Consider a linear Volterra approximation (X  Y): X(t)=  K 1 (  )Y(t -  ) d  =  i K 1 (t – t i ) if we represent the discrete spike train as Y(t) =  i  (t – t i ), where the i th spike occurs at t i. How to determine K 1 ? Minimize the mean squared error: min K(  )   |X observed (t) -  i K(t – t i )| 2 dt  X Fourier transform of average stimulus surrounding a spike Power spectrum of the spike train

Linear Reconstruction Method Reconstructing the stimulus with the linear filter K 1 : Reike et al. 1997: recordings from H1 of the fly Actual stimulus Predicted stimulus

Pros It’s cheap. The temporal pattern of the spikes is considered. Even though encoding is non-linear (and hence the failure of the Wiener linear approximation), decoding for some neurons seems linear. Cons Only one neuron is modeled. No explicit form of P(X|Y)

Other Linear Methods For populations of neurons Assumptions: 1. Y i = number of spikes from neuron i in a time window. 2. X(t) is randomly chosen and continuously varying. Vector Method (Georgopoulos et al 1983) X(t) =  i Y i C i where C i is the preferred stimulus for neuron i. Optimal Linear Estimator (OLE) (Abbot and Salinas 1994) X(t) =  i Y i D i where D i is chosen so that   dt |X observed (t) -  i Y i D i | 2  Y  X is minimized so that D i =  j Q ij -1 L j where L j is center of mass of the tuning curve for cell i Q ij is the correlation of Y i and Y j

Other Linear Methods (Abbot and Salinas 1994) cricket cercal sensory system Evidence suggests that the cricket can code direction with an accuracy of up to 5 degrees. This data suggests that these algorithms decode direction as well as the cricket does in this experiment. Difference between the stimulus reconstructed by the Vector and OLE methods and the true stimulus presented.

Pros Vector Method –It’s cheap –This method is ideal when the tuning curve is a (half) cosine. –Has small error if C i are orthogonal OLE –Has smallest average MSE of all linear methods over a population of neurons Cons Vector Method –It is not always obvious what the preferred stimulus C i is for generic stimuli. –Does not work well if the C i are not uniformly distributed (orthogonal) –Requires a lot of neurons in practice Counting spikes/time neglects the temporal pattern of the spikes of the neural response Y(t). No explicit form of P(X|Y)!

Gaussian Model (de Ruyter van Steveninck and Bialek 1988) In experiment, let X(t) be a randomly chosen, continuously varying (GWN) Approximate P(X|Y) with a Gaussian with mean X|Y and covariance X|Y computed from data. Y(t) mean X|Y Reike et al recordings from H1 of the fly

Pros The temporal pattern of the spikes is considered. We have an explicit form for P(X|Y). Why should P(X|Y) be Gaussian? This choice is justified by Jaynes maximum entropy principle: of all models that satisfy a given set of constraints, choose the one that maximizes the entropy. For a fixed mean and covariance, the Gaussian is the maximum entropy model. Cons An inordinate amount of data is required to obtain good estimates of covariance X|Y=y over all observed y(t). – One way to deal with the problem of not having enough data is to cluster the responses together and estimate a gaussian model for each response cluster.

Metric Space Approach (Victor and Purpura 1996) We desire a rigorous decoding method that … Estimates P(X|Y). Takes the temporal structure of the spikes of the neural responses Y(t) into account. Deals with the insufficient data problem by clustering the responses. Assumptions: The stimuli, X 1, X 2, …, X C, must be repeated multiple times. There are a total of T neural responses: Y 1, …, Y T. Abbot, 2001

The Method: 1. Given two spike trains, Y i and Y j, the distance between them is defined by the metric D[q](Y i, Y j ), the minimum cost required to transform Y i into Y j via a path of elementary steps: a. adding or deleting a spike (cost = 1) b. shifting a spike in time by  t (cost = q ·|  t|) –1/q is a measure of the temporal precision of the metric. –D[q=0](Y i, Y j ) is just the difference in the number of spikes between the spike trains Y i and Y j. Decoding based on this metric is just counting spikes. –D[q=  ](Y i, Y j ) gives infinitesimally precise timing of the spikes. Metric Space Approach YiYi YjYj

The Method: 1. Given two spike trains, Y i and Y j, the distance between them is defined by the metric D[q](Y i, Y j ), the minimum cost required to transform Y i into Y j via a path of elementary steps: a. adding or deleting a spike (cost = 1) b. shifting a spike in time by  t (cost = q ·|  t|) –1/q is a measure of the temporal precision of the metric. –D[q=0](Y i, Y j ) is just the difference in the number of spikes between the spike trains Y i and Y j. Decoding based on this metric is just counting spikes. –D[q=  ](Y i, Y j ) gives infinitesimally precise timing of the spikes. Metric Space Approach YiYi YjYj

2.Let r 1, r 2, … r C be response classes.

Metric Space Approach 2.Let r 1, r 2, … r C be response classes. Let N be the classification matrix. N: the classification matrix r 1 r 2 r 3 r 4 r 5 X 1 X 2 X 3 X 4 X 5 X r

Metric Space Approach 2.Let r 1, r 2, … r C be response classes. Let N be the classification matrix. 3.Suppose that Y 1 was elicited by X 1. Assign Y 1 to response class r 3 if   D[q](Y 1, Y) z  Y elicited by X_ 3  1/z is the minimum over all X k for k = 1, …, C. N: the classification matrix r 1 r 2 r 3 r 4 r 5 X 1 X 2 X 3 X 4 X 5

Metric Space Approach 2.Let r 1, r 2, … r C be response classes. Let N be the classification matrix. 3.Suppose that Y 1 was elicited by X 1. Assign Y 1 to response class r 3 if   D[q](Y 1, Y) z  Y elicited by X_ 3  1/z is the minimum over all X k for k = 1, …, C. 4. Increment N 1, 3 by 1. N: the classification matrix r 1 r 2 r 3 r 4 r 5 X 1 X 2 X 3 X 4 X 5

Metric Space Approach 2.Let r 1, r 2, … r C be response classes. Let N be the classification matrix. 3.Suppose that Y i was elicited by X . Assign Y i to response class r  if   D[q](Y i, Y) z  Y elicited by X_   1/z is the minimum over all X k for k = 1, …, C. 4. Increment N ,  by Repeat steps 3 and 4 for Y i for i = 1, …, T. N: the classification matrix r 1 r 2 r 3 r 4 r 5 X 1 X 2 X 3 X 4 X 5

Metric Space Approach 2.Let r 1, r 2, … r C be response classes. Let N be the classification matrix. 3.Suppose that Y i was elicited by X . Assign Y i to response class r  if   D[q](Y i, Y) z  Y elicited by X_   1/z is the minimum over all X k for k = 1, …, C. 4. Increment N ,  by Repeat steps 3 and 4 for Y i for i = 1, …, T. After repeating the process for all Y elicited by X 1 … N: the classification matrix r 1 r 2 r 3 r 4 r 5 X 1 X 2 X 3 X 4 X 5 X 1 was presented 20 times eliciting 20 neural responses

r 1 r 2 r 3 r 4 r 5 X 1 X 2 X 3 X 4 X 5 Metric Space Approach 2.Let r 1, r 2, … r C be response classes. Let N be the classification matrix. 3.Suppose that Y i was elicited by X . Assign Y i to response class r  if   D[q](Y i, Y) z  Y elicited by X_   1/z is the minimum over all X k for k = 1, …, C. 4. Increment N ,  by Repeat steps 3 and 4 for Y i for i = 1, …, T. Then for all Y elicited by X 2 … N: the classification matrix X 2 was presented 20 times eliciting 20 neural responses

Metric Space Approach 2.Let r 1, r 2, … r C be response classes. Let N be the classification matrix. 3.Suppose that Y i was elicited by X . Assign Y i to response class r  if   D[q](Y i, Y) z  Y elicited by X_   1/z is the minimum over all X k for k = 1, …, C. 4. Increment N ,  by Repeat steps 3 and 4 for Y i for i = 1, …, T. Until we have repeated the process for all Y, including the ones elicited by X C. N: the classification matrix r 1 r 2 r 3 r 4 r 5 X 1 X 2 X 3 X 4 X 5 In this example, the T=100 responses (there were 20 neural responses elicited by each stimulus) were quantized or clustered into 5 classes. X r

Note that by normalizing the columns of the matrix N, we get the decoder P(X|r). Decode a neural response Y(t) by looking up its response class r in the normalized matrix N: Metric Space Approach P(X | r) r 1 r 2 r 3 r 4 r 5 X 1 X 2 X 3 X 4 X 5

Pros The responses are clustered together. P(X|r) estimates P(X|Y). Considers the temporal pattern of the spikes. Minimizing the cost function D[q] is intuitively a nice way to quantify jitter in the spike trains. In information theory, this type of cost function is called a distortion function. What to choose for q and z ? The values that maximize the transmitted information from stimulus to response. Cons D[q] imposes our assumptions of what is important in the structure of spike trains (namely that shifts and spike insertions/deletions are important). The space of possible stimuli for some systems is a very large space, so it is not possible to present all possible stimuli in experiment.

Tune in next week, on Friday at the CBS seminar, to see how our method deals with these issues. So we’re looking for a decoding algorithm that … Produces an estimate of P(X|Y) as well as of X(  )|Y(t). Considers the temporal structure of the spike trains Y(t). Makes no assumptions about the linearity of decoding. Does not require that all stimuli be presented. That is, X(t) ought to be randomly chosen and continuously varying (such as a GWN stimulus). Considers a population of neurons. Deals with the problem of never having enough data by clustering the neural responses.