Synergy, redundancy, and independence in population codes, revisited. or Are correlations important? Peter Latham* and Sheila Nirenberg † *Gatsby Computational.

Synergy, redundancy, and independence in population codes, revisited. or Are correlations important? Peter Latham* and Sheila Nirenberg † *Gatsby Computational Neuroscience Unit † University of California, Los Angeles

sr 1, r 2,..., r n Estimate stimulus from responses: P(r 1, r 2,..., r n |s)P(s|r 1, r 2,..., r n ) P(r 1, r 2,..., r n |s) P(s) P(r 1, r 2,..., r n ) ? Bayes Approach problem probabilistically: The neural coding problem

P(r|s)P(r|s) response (r) easy in 1-D 01020 r2r2 r1r1 P(r 1, r 2 |s) harder in 2-Dimpossible in high-D. “high” ~ 3. r2r2 r1r1 P(r 1 |s)P(r 2 |s) The problem is easy for one neuron (1-D) but harder for populations (≥ 2-D). Why? Because correlations force you to measure the probability in every bin. Note: this problem disappears when the responses are uncorrelated.

● If you want to understand how to decode spike trains, you have to figure out how to deal with correlations. ● The first step is to understand whether you need to deal with correlations. ● In other words, are correlations important?

How to determine if correlations are important: Get rid of them by treating the cells as though they were independent and then estimate the stimulus. If your estimate of the stimulus is different from the true estimate, then correlations are important. Otherwise they are not. Formally, compare P ind (s|r 1, r 2 ) to P(s|r 1, r 2 ), where P ind (s|r 1, r 2 ) = P(r1|s)P(r2|s)P(s)P(r1|s)P(r2|s)P(s) ∑ P(r 1 |s')P(r 2 |s')P(s') s's' Independent response distribution If P ind (s|r 1, r 2 ) ≠ P(s|r 1, r 2 ), correlations are important for decoding. If P ind (s|r 1, r 2 ) = P(s|r 1, r 2 ), correlations are not important.

r2r2 r1r1 s1s1 s2s2 s4s4 s3s3 Neurons are correlated, that is, P(r 1 |s)P(r 2 |s) ≠ P(r 1, r 2 |s), but correlations don’t matter: P ind (s|r 1, r 2 ) = P(s|r 1, r 2 ). P(r 1, r 2 |s)P(r1|s)P(r2|s)P(r1|s)P(r2|s) r2r2 r1r1 s4s4 s3s3 s2s2 s1s1 One might wonder: how can P ind (s|r 1, r 2 ) = P(s|r 1, r 2 ) when neurons are correlated – i.e., when P(r 1 |s)P(r 2 |s) ≠ P(r 1, r 2 |s)?

Intuitively, the closer P ind (s|r 1, r 2 ) is to P(s|r 1, r 2 ), the less important correlations are. We measure “close” using  I = ∑ P(r 1, r 2 )P(s|r 1, r 2 ) log P(s|r 1, r 2 ) P ind (s|r 1, r 2 ) r 1,r 2,s = 0 if and only if P ind (s|r 1, r 2 ) = P(s|r 1, r 2 ) = penalty in yes/no questions for ignoring correlations = upper bound on information loss. If  I/I is small, then you don’t lose much information if you treat the cells as independent.

Quantifying information loss Information is the log of the number of messages that can be transmitted over a noisy channel with vanishingly small probability of error. An example: neurons coding for orientation.

You know: P(θ|r 1, r 2 ). You build: θ(r 1, r 2 ) = optimal estimator. ^ You know: P ind (θ|r 1, r 2 ). You build: θ ind (r 1, r 2 ) = suboptimal estimator ^ θ1θ1 θ2θ2 ΔθΔθ distribution of θ given θ 1 was presented. ^^ I ≈ log ΔθΔθ 180 distribution of θ given θ 2 was presented. Information loss: I - I ind θ1θ1 θ2θ2 Δθ ind distribution of θ ind given θ 1 was presented. ^ ^ I ind ≈ log Δθ ind 180 distribution of θ ind given θ 2 was presented.

if Δθ is large: ● Show multiple trials. ● Stimuli appear in only two possible orders. θ ^ trial order #2 1 20 θ ^ trial order #1 1 20 θ 1 presented θ 2 presented I ≈I ≈ log of the number of distinct orders n = 20 log 2 rigorous as n → ∞ !!!!

Formal analysis: the general case c(1) = s 1 (1) s 2 (1) s 3 (1)... s n (1) c(2) = s 1 (2) s 2 (2) s 3 (2)... s n (2) c(w) = s 1 (w) s 2 (w) s 3 (w)... s n (w)... code words (different ordering of stimuli) ● Observe r 1 r 2 r 3... r n ; guess code word (guess w). ● More code words = mistakes are more likely. ● You can transmit more code words without mistakes if use p to decode than if you use q. ● The difference tells you how much information you lose by using q rather than p.... true distribution: p(r|s) approximate distribution: q(r|s) how many code words (a.k.a. orders) can you transmit using each? trials

true probabilty: p(w|r 1, r 2, r 3,..., r n ) ~ ∏ i p(r i |s i (w)) p(w) approx. probability: q(w|r 1, r 2, r 3,..., r n ) ~ ∏ i q(r i |s i (w)) p(w) decode: w = arg max ∏ i p(r i |s i (w)) or ∏ i q(r i |s i (w)) want: ∏ i p(r i |s i (w*)) > ∏ i p(r i |s i (w)) w≠w* true code word prob. error: P e [p,w]= prob { ∏ i p(r i | s i (w)) > ∏ i p(r i |s i (w*)) } P e [q,w]= prob { ∏ i q(r i | s i (w)) > ∏ i q(r i |s i (w*)) } number of code words that can be transmitted with vanishingly small probability of error ~ 1/P e. I[p] = log(1/P e [p,w])/n I[q] = log(1/P e [q,w])/n Information loss = I[p] – I[q] ≤ ΔI constant algebra algebra algebra definition ^ w

 I synergy = I(r 1, r 2 ;s) – I(r 1 ;s) – I(r 2 ;s) Intuition: If responses taken together provide more information than the sum of the individual responses, which can only happen when correlations exist, then correlations “must” be important. Good points: ● Cool(ish) name. ● Compelling intuition. Bad points: ● Intuition is wrong:  I synergy can’t tell you whether correlations are important. Other measures

s1s1 s1s1 s3s3 s3s3 s2s2 s2s2 s1s1 s1s1 s3s3 s3s3 s2s2 s2s2 r2r2 P(r 1, r 2 |s)P(r1|s)P(r2|s)P(r1|s)P(r2|s) r1r1 r2r2 r1r1 Schneidman, Bialek and Berry (2003) used this example to argue that  I synergy is a good measure of whether or not correlations are important. We find this baffling.  I synergy can be: zero, positive, negative when P ind (s|r 1, r 2 ) = P(s|r 1, r 2 ) (Nirenberg and Latham, PNAS, 2003). Example: A case where and you can decode perfectly, that is, P ind (s|r 1, r 2 ) = P(s|r 1, r 2 ) for all responses that occur, but,  I synergy > 0.

 I shuffled = I true – I shuffled Intuition: 1. I shuffled > I true : Correlations hurt 2. I shuffled < I true : Correlations help 3. I shuffled = I true : Correlations don’t matter Good point: ● Can be used to answer high-level questions about neural code (what class of correlations increases information?). Bad points: ● Intuition #3 is false; #1 and #2 are not so relevant, as they correspond to cases the brain doesn’t see. Information from neurons that saw the same stimulus but at different times (so that correlations are removed).

 I shuffled and  I synergy do not measure the importance of correlations for decoding – they are confounded.  I does measure the importance of correlations for decoding:  I = 0 if and only if P ind (s|r 1, r 2 ) = P (s|r 1, r 2 ).  I is an upper bound on information loss. Summary #1

1.Our goal was to answer the question: Are correlations important for decoding? 2.We developed a quantitative information-theoretic measure,  I, which is an upper bound on the information loss associated with ignoring correlations. 3. For pairs of neurons,  I/I is small, < 12%, except in the LGN where it’s 20-40%. 4. For larger populations, this is still an open question. Summary #2

Synergy, redundancy, and independence in population codes, revisited. or Are correlations important? Peter Latham* and Sheila Nirenberg † *Gatsby Computational.

Similar presentations

Presentation on theme: "Synergy, redundancy, and independence in population codes, revisited. or Are correlations important? Peter Latham* and Sheila Nirenberg † *Gatsby Computational."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Synergy, redundancy, and independence in population codes, revisited. or Are correlations important? Peter Latham* and Sheila Nirenberg † *Gatsby Computational.

Similar presentations

Presentation on theme: "Synergy, redundancy, and independence in population codes, revisited. or Are correlations important? Peter Latham* and Sheila Nirenberg † *Gatsby Computational."— Presentation transcript:

Similar presentations

About project

Feedback