Population Coding Alexandre Pouget Okinawa Computational Neuroscience Course Okinawa, Japan November 2004.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

What is the neural code? Puchalla et al., What is the neural code? Encoding: how does a stimulus cause the pattern of responses? what are the responses.

Pattern Recognition and Machine Learning

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

Pattern Recognition and Machine Learning

Support Vector Machines

Neural Computation Chapter 3. Neural Computation Outline Comparison of behavioral and neural response on a discrimination task –Bayes rule –ROC curves.

1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.

Chapter 4: Linear Models for Classification

BCS547 Neural Encoding.

Visual Recognition Tutorial

Reading population codes: a neural implementation of ideal observers Sophie Deneve, Peter Latham, and Alexandre Pouget.

How well can we learn what the stimulus is by looking at the neural responses? We will discuss two approaches: devise and evaluate explicit algorithms.

For stimulus s, have estimated s est Bias: Cramer-Rao bound: Mean square error: Variance: Fisher information How good is our estimate? (ML is unbiased:

Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.

Maximum likelihood (ML) and likelihood ratio (LR) test

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.

Machine Learning CMPT 726 Simon Fraser University

Visual Recognition Tutorial

Spike Train decoding Summary Decoding of stimulus from response –Two choice case Discrimination ROC curves –Population decoding MAP and ML estimators.

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.

Maximum likelihood (ML)

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Regression and Correlation Methods Judy Zhong Ph.D.

PATTERN RECOGNITION AND MACHINE LEARNING

How do neurons deal with uncertainty?

Principles of Pattern Recognition

Model Inference and Averaging

Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.

ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.

1 / 41 Inference and Computation with Population Codes 13 November 2012 Inference and Computation with Population Codes Alexandre Pouget, Peter Dayan,

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.

Projects: 1.Predictive coding in balanced spiking networks (Erwan Ledoux). 2.Using Canonical Correlation Analysis (CCA) to analyse neural data (David Schulz).

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.

PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.

Population coding Population code formulation Methods for decoding: population vector Bayesian inference maximum a posteriori maximum likelihood Fisher.

Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.

Web page: Textbook. Abbott and Dayan. Homework and grades Office Hours.

BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity

Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.

BCS547 Neural Decoding.

Lecture 2: Statistical learning primer for biologists

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,

Review of statistical modeling and probability theory Alan Moses ML4bio.

Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”

Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.

Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.

Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.

Bayesian Perception.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

12. Principles of Parameter Estimation

Probability Theory and Parameter Estimation I

ICS 280 Learning in Graphical Models

Ch3: Model Building through Regression

Special Topics In Scientific Computing

Synapses Signal is carried chemically across the synaptic cleft.

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Parametric Methods Berlin Chen, 2005 References:

12. Principles of Parameter Estimation

Volume 74, Issue 1, Pages (April 2012)

Presentation transcript:

Population Coding Alexandre Pouget Okinawa Computational Neuroscience Course Okinawa, Japan November 2004

Outline Definition The encoding process Decoding population codes Quantifying information: Shannon and Fisher information Basis functions and optimal computation

Outline Definition The encoding process Decoding population codes Quantifying information: Shannon and Fisher information Basis functions and optimal computation

Receptive field s: Direction of motion Stimulus Response Code: number of spikes 10

7 8 4 Receptive field s: Direction of motion Trial 1 Stimulus Trial 2 Trial 3 Trial 4

Variance of the noise,  i (  ) 2 Encoded variable (s) Mean activity f i (  ) Variance,  i (s) 2, can depend on the input Tuning curve f i (s)

Tuning curves and noise Example of tuning curves: Retinal location, orientation, depth, color, eye movements, arm movements, numbers… etc.

Population Codes Tuning CurvesPattern of activity (r) Direction (deg) Activity Preferred Direction (deg) Activity s?s?

Bayesian approach We want to recover P(s|r). Using Bayes theorem, we have:

Bayesian approach Bayes rule:

Bayesian approach We want to recover P(s|r). Using Bayes theorem, we have: likelihood of s posterior distribution over s prior distribution over r prior distribution over s

Bayesian approach If we are to do any type of computation with population codes, we need a probabilistic model of how the activity are generated, p(r|s), i.e., we need to model the encoding process.

Activity distribution P(r i |s=-60) P(r i |s=0) P(r i |s=-60)

Tuning curves and noise The activity (# of spikes per second) of a neuron can be written as: where f i (  ) is the mean activity of the neuron (the tuning curve) and n i is a noise with zero mean. If the noise is gaussian, then:

Probability distributions and activity The noise is a random variable which can be characterized by a conditional probability distribution, P(n i |s). The distributions of the activity P(r i |s). and the noise differ only by their means (E[n i ]=0, E[r i ]=f i (s)).

Gaussian noise with fixed variance Gaussian noise with variance equal to the mean Examples of activity distributions

Poisson distribution: The variance of a Poisson distribution is equal to its mean.

Comparison of Poisson vs Gaussian noise with variance equal to the mean Activity (spike/sec) Probability

Gaussian noise with fixed variance Population of neurons

Gaussian noise with arbitrary covariance matrix  : Population of neurons

Outline Definition The encoding process Decoding population codes Quantifying information: Shannon and Fisher information Basis functions and optimal computation

Population Codes Tuning CurvesPattern of activity (r) Direction (deg) Activity Preferred Direction (deg) Activity s?s?

Nature of the problem In response to a stimulus with unknown value s, you observe a pattern of activity r. What can you say about s given r? Bayesian approach: recover p(s|r) (the posterior distribution) Estimation theory: come up with a single value estimate  from r

Estimation Theory Preferred orientation Activity vector: r Decoder Encoder (nervous system)

Preferred retinal location r 200 Decoder Trial 200 Encoder (nervous system) Preferred retinal location r2r2 Decoder Trial 2 Encoder (nervous system) Preferred retinal location r1r1 Decoder Trial 1 Encoder (nervous system)...

Preferred retinal location r DecoderEncoder Estimation Theory If, the estimate is said to be unbiased If is as small as possible, the estimate is said to be efficient

Estimation theory A common measure of decoding performance is the mean square error between the estimate and the true value This error can be decomposed as:

Efficient Estimators The smallest achievable variance for an unbiased estimator is known as the Cramer- Rao bound,  CR 2. An efficient estimator is such that In general :

Estimation Theory Preferred orientation Activity vector: r Decoder Encoder (nervous system) Examples of decoders

Voting Methods Optimal Linear Estimator

Linear Estimators

X and Y must be zero mean Trust cells that have small variances and large covariances

Voting Methods Optimal Linear Estimator

Voting Methods Optimal Linear Estimator Center of Mass Linear in r i /  j r j Weights set to s i

Center of Mass/Population Vector The center of mass is optimal (unbiased and efficient) iff: The tuning curves are gaussian with a zero baseline, uniformly distributed and the noise follows a Poisson distribution In general, the center of mass has a large bias and a large variance

Voting Methods Optimal Linear Estimator Center of Mass Population Vector

s riPiriPi P

Voting Methods Optimal Linear Estimator Center of Mass Population Vector Linear in r i Weights set to P i Nonlinear step

Population Vector Typically, Population vector is not the optimal linear estimator.

Population Vector

Population vector is optimal iff: The tuning curves are cosine, uniformly distributed and the noise follows a normal distribution with fixed variance In most cases, the population vector is biased and has a large variance

Maximum Likelihood The maximum likelihood estimate is the value of s maximizing the likelihood P(r|s). Therefore, we seek such that: is unbiased and efficient. Noise distribution

Maximum Likelihood Tuning Curves Direction (deg) Activity Pattern of activity (r) Preferred Direction (deg) Activity

Maximum Likelihood Template

Preferred Direction (deg) Activity Maximum Likelihood Template

ML and template matching Maximum likelihood is a template matching procedure BUT the metric used is not always the Euclidean distance, it depends on the noise distribution.

Maximum Likelihood The maximum likelihood estimate is the value of s maximizing the likelihood P(r|s). Therefore, we seek such that:

Maximum Likelihood If the noise is gaussian and independent Therefore and the estimate is given by: Distance measure: Template matching

Maximum Likelihood Preferred Direction (deg) Activity

Gaussian noise with variance proportional to the mean If the noise is gaussian with variance proportional to the mean, the distance being minimized changes to: Data point with small variance are weighted more heavily

Bayesian approach We want to recover P(s|r). Using Bayes theorem, we have:

Bayesian approach The prior P(s) correspond to any knowledge we may have about s before we get to see any activity. Note: the Bayesian approach does not reduce to the use of a prior…

Bayesian approach Once we have P(s  r), we can proceed in two different ways. We can keep this distribution for Bayesian inferences (as we would do in a Bayesian network) or we can make a decision about s. For instance, we can estimate s as being the value that maximizes P(s|r), This is known as the maximum a posteriori estimate (MAP). For flat prior, ML and MAP are equivalent.

Bayesian approach Limitations: the Bayesian approach and ML require a lot of data (estimating P(r|s) requires at least n+(n-1)(n-1)/2 parameters)…

Bayesian approach Limitations: the Bayesian approach and ML require a lot of data (estimating P(r|s) requires at least O(n 2 ) parameters, n=100, n 2 =10000)… Alternative: estimate P(s|r) directly using a nonlinear estimate (if s is a scalar and P(s|r) is gaussian, we only need to estimate two parameters!).

Outline Definition The encoding process Decoding population codes Quantifying information: Shannon and Fisher information Basis functions and optimal computation

Fisher information is defined as: and it is equal to: where P(r|s) is the distribution of the neuronal noise. Fisher Information

For one neuron with Poisson noise For n independent neurons : The more neurons, the better!Small variance is good! Large slope is good!

Fisher Information and Tuning Curves Fisher information is maximum where the slope is maximum This is consistent with adaptation experiments Fisher information adds up for independent neurons (unlike Shannon information!)

Fisher Information In 1D, Fisher information decreases as the width of the tuning curves increases In 2D, Fisher information does not depend on the width of the tuning curve In 3D and above, Fisher information increases as the width of the tuning curves increases WARNING: this is true for independent gaussian noise.

Ideal observer The discrimination threshold of an ideal observer,  s, is proportional to the variance of the Cramer-Rao Bound. In other words, an efficient estimator is an ideal observer.

An ideal observer is an observer that can recover all the Fisher information in the activity (easy link between Fisher information and behavioral performance) If all distributions are gaussian, Fisher information is the same as Shannon information.

Population Vector and Fisher Information Population vector CR bound Population vector should NEVER be used to estimate information content!!!! The indirect method is prone to severe problems… 1/Fisher information

Outline Definition The encoding process Decoding population codes Quantifying information: Shannon and Fisher information Basis functions and optimal computation

So far we have only talked about decoding from the point of view of an experimentalists. How is that relevant to neural computation? Neurons do not decode, they compute! What kind of computation can we perform with population codes?

Computing functions If we denote the sensory input as a vector S and the motor command as M, a sensorimotor transformation is a mapping from S to M: M=f(S) Where f is typically a nonlinear function

Example 2 Joint arm:  x  y

Basis functions Most nonlinear functions can be approximated by linear combinations of basis functions: Ex: Fourier Transform Ex: Radial Basis Functions

Basis Functions Direction (deg) Activity Preferred Direction (deg) Activity

Basis Functions A basis functions decomposition is like a three layer network. The intermediate units are the basis functions X y

Basis Functions Networks with sigmoidal units are also basis function networks

Basis Function Layer AB CD XY Z 23 Y Z Z Z X YX XY Linear Combination YX YX YX YX

Basis Functions Decompose the computation of M=f(S,P) in two stages: 1.Compute basis functions of S and P 2.Combine the basis functions linearly to obtain the motor command

Basis Functions Note that M can be a population code, e.g. the components of that vector could correspond to units with bell-shaped tuning curves.

Eye Position: X e Head position Gaze + Fixation point Head-centered Location: X a Retinal Location: X r Example: Computing the head-centered location of an object from its retinal location

Basis Functions

H k =R i +E j Preferred retinal location Preferred eye location Preferred head centered location RiRi EjEj Basis Function Units Gain Field Activity Eye-centered location E=20° E=0° E=-20°

H k =R i +E j Preferred retinal location Preferred eye location Preferred head centered location RiRi EjEj Basis Function Units Partially shifting receptive field Activity Eye-centered location E=20° E=0° E=-20°

Fixation point Head-centered location Retinotopic location Screen Visual receptive fields in VIP are partially shifting with the eye (Duhamel, Bremmer, BenHamed and Graf, 1997)

Summary Definition Population codes involve the concerted activity of large populations of neurons The encoding process The activity of the neurons can be formalized as being the sum of a tuning curve plus noise

Summary Decoding population codes Optimal decoding can be performed with Maximum Likelihood estimation (x ML ) or Bayesian inferences (p(s|r)) Quantifying information: Fisher information Fisher information provides an upper bound on the amount of information available in a population code

Summary Basis functions and optimal computation Population codes can be used to perform arbitrary nonlinear transformations because they provide basis sets.

Where do we go from here? Computation and Bayesian inferences Knill, Koerding, Todorov: Experimental evidence for Bayesian inferences in humans. Shadlen: Neural basis of Bayesian inferences Latham, Olshausen: Bayesian inferences in recurrent neural nets

Where do we go from here? Other encoding hypothesis: probabilistic interpretations Zemel, Rao