Start MATLAB
Konrad Kording and Ian Stevenson Bayes tutorial Konrad Kording and Ian Stevenson
Why care? Experimental results are noisy Our own sensors are noisy See for example Faisal AA, Selen LP, Wolpert DM (2008) Noise in the nervous system. NATURE REVIEWS NEUROSCIENCE 9: 292
Part I: Probabilities in decoding Movement onset repetition repetition Obviously we may want to not classify direction of movement but estimate the direction of movement. These techniques are very similar to the ones we use here and equally use the same Bayesian formalism. time time
Part II: Probabilities in perception Brain estimates P(stimulus|percept) Combine information across cues
The tutorial Learn the concepts Try it yourself (anything for you to type is in red courier) Use it in the future Remark: references in lecture notes available at www.koerding.com/tutorial Part I: decoding movement intent Part II: perception: decoding the world
Part I: decoding movement intent See lots of cool videos and interesting papers at: http://motorlab.neurobio.pitt.edu/ For experimental labs see (in no particular order): http://www.physio.northwestern.edu/Secondlevel/Miller/hp.html http://www.stanford.edu/~shenoy/Group.htm http://www.nicolelislab.net/ http://donoghue.neuro.brown.edu/ For decoding techniques see (equally in random order): http://www.stat.columbia.edu/~liam/ http://www.neurostat.mit.edu/ http://koerding.com http://lib.stat.cmu.edu/~kass/ Apologies for all the labs I forgot to include From Andy Schwartz’s group
Part I: Probabilities in decoding Movement onset
Simulated data 1 Neuron two directions (100,000 trials) load datasetStart (remember Capitalization is important for MATLAB) neuronL neuronR Firing rate during left trial i Firing rate during right trial i 53.9781 57.6395 56.1187 38.0109 67.3739 … 58.3077 56.3932 38.9440 39.9052 50.7743 … The simulated neuron has Gaussian tuning properties – this is a very simple minded model for a neuron. It turns out though, that the assumption of Gaussianity often works rather well for decoding even when neurons do not have Gaussian statistics (see the work of Shenoy).
Histograms plotHistograms Left 2000 hist(neuronL) number 1000 Right Right 2000 hist(neuronR) number 1000 20 40 60 80 100 120 140 firing rate [sp/s]
Probability
Probabilities plotProbability Left 0.06 0.04 probability 0.02 0.06 hist(neuronL)/length(neuronL) probability 0.02 0.06 Right 0.04 hist(neuronR)/length(neuronR) probability 0.02 20 40 60 80 100 firing rate [sp/s]
Gaussian approximations plotProbabilitiesFromGauss Left 0.06 0.04 probability 0.02 0.06 Right 0.04 pXgMS=1/(sqrt(2*pi)*sigma)*… exp(-(x-mu).^2/(2*sigma^2)) probability 0.02 20 40 60 80 100 firing rate [sp/s]
Bayes rule for decoding If we use a situation in which p(L) is not equal to p(R) then the system will be (optimally) biased in its judgements. In this tutorial p(L)=p(R)=.5
Decoding decodeOneNeuron .04 Left pSgL p(S|L) .02 .04 p(S|R) Right .04 p(S|R) Right pSgR .02 As we go to the left (low firing rates) the probability of right decreases but much less so than the probability of left. Therefore, the ratio in the lower equation goes towards zero. The opposite is true for high firing rates. 1 Decoding pSgL./(pSgL+pSgR) P(L|S) 20 30 40 50 60 70 80 90 100 firing rate [Hz]
Typical curves in the CNS 1 1 p(L|S) p(L|S) 100 Spikes [Hz] 100 Spikes [Hz] Not very informative Very informative The width over which the curve goes from small values to high values will depend on the ratio of the distance between the Gaussians and their width. This is the typical case in the CNS
Combining info across neurons Assumption (naïve Bayes) Movement intent Naïve Bayes is a very good technique in the limit of many observed variables (neurons) and small numbers of observations for some exciting reasons that do not fit into the notes section here. See: Ng AY (2004) Feature selection, L 1 vs. L 2 regularization, and rotational invariance. ACM International Conference Proceeding Series Ng AY, Jordan MI (2002) On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes. In: NIPS, vol 14 Neuron 1 Neuron 2 …
Exercise: combine two neurons Same Bayes rule as above
CombineTwoNeurons combineTwoNeurons pLgN1 pLgN2
Classifying based on 1 neuron nClassifiedRightWhileLeftN1 = 30 nClassifiedLeftWhileRightN1 = 32 nClassifiedRightWhileLeftN2 = 32 nClassifiedLeftWhileRightN2 = 32 About 30% wrong in all cases In fact it is possible to mathematically calculate the expected numbers of errors in this case.
Exercise: combine two neurons combineTwoNeurons is already prepared so that really only this equation needs to be used. pN1gL = probability of Neuron 1 spikes given left movement pN1gR = probability of Neuron 1 spikes given right movement pN2gL = probability of Neuron 1 spikes given left movement pN2gR = probability of Neuron 1 spikes given right movement edit combineTwoNeurons
Exercise: combine information from two neurons pBoth=pN1gL.*pN2gL*.5./(pN1gL.*pN2gL*.5+pN1gR.*pN2gR*.5); 50 100 150 200 0.5 1 trial number p(L|N1) decoding from Neuron 1 only p(L|N2) decoding from Neuron 2 only p(L|N1,N2) decoding from both Neurons It does help and instead of 30 mistakes it only does about 20.
Model comparison: Cross Validate Train Test For a fascinating discussion of such effects see: Puzzlingly High Correlations in fMRI Studies of Emotion, Personality, and Social Cognition by Edward Vul, Christine Harris, Piotr Winkielman, & Harold Pashler In this tutorial we ignore the problem
Combine info across n neurons Data: Recordings from Lee Miller’s group Thanks to Lee Miller and Anil Cherian. The dataset was rebinned from the original results and we aplogize for the way we mangled the data. Please do not draw any scientific conclusions from this – the conversion that we did to get the matrix we are using was a pretty fast hack. realData
Real Data
Decoding from real Neurons realNeuronDecoding pLeft(i)=prod(pNgLeft(i,:)) Just implementing the above equation >99% correct
Relevance of decoding Gene Chip data Social Neuroscience Marketing Estimating cellular networks etc Here is a partial list of techniques each of which may work great on some set of problems Support Vector machines Boosting L1 regularized logistic regression Remark: there are better techniques than naïve Bayes for many cases
Part II: Probabilities in perception Perception is noisy – both for audition and for vision Cues can be combined. Brain estimates P(stimulus|percept) Combine information across cues
Example: McGurk Effect Found this video on the internet but was unable to find out whose video this was. If you did the video – please contact me so I can acknowledge you. First described by: Harry McGurk and John MacDonald in "Hearing lips and seeing voices", Nature 264, 746-748 (1976).
Perception (multiple modalities) Movement intent World state Sensor 1 Neuron 1 Sensor 2 Neuron 2 … … This Bayesian framework has emerged in the late 80s early 90s. Here are some of the early influential papers: Ghahramani Z (1995) Computational and psychophysics of sensorimotor integration. In: Brain and Cognitive Science. Massachusetts Institute of Technology, Cambridge Yuille A, Bulthoff HH (1996) Bayesian decision theory and psychophysics. In: Knill D, Richards W (eds) Perception as Bayesian Inference. Cambridge University Press., Cambridge, U.K. Landy MS, Maloney LT, Johnston EB, Young M (1995) Measurement and modeling of depth cue combination: in defense of weak fusion. Vision Res 35: 389-412
Visuo-auditory cue combination Various labs have variants of this kind of experiment. Projectors or monitors are often used. Some scientists use headphones. Some references Hairston WD, Wallace MT, Vaughan JW, Stein BE, Norris JL, Schirillo JA (2003) Visual localization ability influences cross-modal bias. J Cogn Neurosci 15: 20-29 Wallace MT, Roberson GE, Hairston WD, Stein BE, Vaughan JW, Schirillo JA (2004) Unifying multisensory signals across time and space. Exp Brain Res 158: 252-258 Alais D, Burr D (2004) The ventriloquist effect results from near-optimal bimodal integration. Curr Biol 14: 257-262 Choe CS, Welch RB, Gilford RM, Juola JF (1975) The "ventriloquist effect": visual dominance or response bias. Perception and Psychophysics 18: 55-60 Typical setup: from Hairston & Schirillo 2004
What is different discrete estimation for direction decoding continuous 1D estimation now .04 Left p(S|L) .02
Likelihood plotProbabilitySpaceVision probability -10 -8 -6 -4 visual percept -2 It’s the difference between the actual source and the perceived position that matters for the probability 2 4 6 8 10 -10 -5 5 10 position of stimulus
Combining two cues combineVisionAudition 0.04 0.03 probability 0.02 perceived V perceived A 0.03 probability 0.02 In reality there may be a third factor, the prior. Priors are important for sensorimotor integration, e.g.: Adams WJ, Graf EW, Ernst MO (2004) Experience can change the 'light-from-above' prior. Nat Neurosci 7: 1057-1058 Brainard DH, Freeman WT (1997) Bayesian color constancy. J Opt Soc Am A Opt Image Sci Vis 14: 1393-1411 Ernst MO, Bulthoff HH (2004) Merging the senses into a robust percept. Trends Cogn Sci 8: 162-169 Flanagan JR, Bittner JP, Johansson RS (2008) Experience Can Change Distinct Size-Weight Priors Engaged in Lifting Objects and Judging their Weights. Current Biology Kersten D, Mamassian P, Yuille A (2004) Object perception as Bayesian inference. Annu Rev Psychol 55: 271-304 Kording KP, Wolpert DM (2004) Bayesian integration in sensorimotor learning. Nature 427: 244-247 Miyazaki M, Nozaki D, Nakajima Y (2005) Testing Bayesian models of human coincidence timing. J Neurophysiol 94: 395-399 Miyazaki M, Yamamoto S, Uchida S, Kitazawa S (2006) Bayesian calibration of simultaneity in tactile temporal order judgment. Nat Neurosci 9: 875-877 Stocker AA, Simoncelli EP (2006) Noise characteristics and prior expectations in human visual speed perception. Nat Neurosci 9: 578-585 Tassinari H, Hudson TE, Landy MS (2006) Combining priors and noisy visual cues in a rapid pointing task. J Neurosci 26: 10154-10163 Posterior~pV.*pA 0.01 -8 -6 -4 -2 2 4 6 8 Position
What if vision is better? combineVisionAudition2 0.09 0.08 0.07 0.06 probability 0.05 0.04 0.03 0.02 0.01 -8 -6 -4 -2 2 4 6 8 Position
Dependence on position of visual feedback dependenceOnCuePosition 6 4 2 best estimate [cm] -2 -4 -6 -8 -6 -4 -2 2 4 6 8 feedback position [cm]
Typical experimental strategy Measure how good one piece of information is, e.g. Measure how good another piece of information is, e.g. Measure weight, e.g. Compare to prediction
Near universal finding Vision & Audition Priors & Likelihood for movement Vision & Proprioception Texture & Disparity Cognitive science Here is a general list of Bayesian approaches to human behavior. It is highly incomplete. Alais D, Burr D (2004) The ventriloquist effect results from near-optimal bimodal integration. Curr Biol 14: 257-262 Albrecht DW, Zukerman I, Nicholson. AE (1998) Bayesian Models for Keyhole Plan Recognition in an Adventure Game. User Modeling and User-Adapted Interaction 8: 5-47 Barber MJ, Clark JW, Anderson CH (2003) Neural representation of probabilistic information. Neural Comput 15: 1843-1864 Battaglia PW, Schrater PR (2007) Humans trade off viewing time and movement duration to improve visuomotor accuracy in a fast reaching task. J Neurosci 27: 6984-6994 Beierholm U, Shams L, Kording K, Ma WJ (2008) Comparing Bayesian models for multisensory cue combination without mandatory integration. In: Neural Information Processing Systems, Vancouver Brainard DH, Freeman WT (1997) Bayesian color constancy. J Opt Soc Am A Opt Image Sci Vis 14: 1393-1411 Bresciani JP, Dammeier F, Ernst MO (2006) Vision and touch are automatically integrated for the perception of sequences of events. J Vis 6: 554-564 Chater N, Tenenbaum JB, Yuille A (2006) Probabilistic models of cognition: conceptual foundations. Trends Cogn Sci 10: 287-291 Courville AC, Daw ND, Touretzky DS (2006) Bayesian theories of conditioning in a changing world. Trends Cogn Sci 10: 294-300 Ernst M, O., (2006a) A Bayesian view on multimodal cue integration. In: Knoblich G, Thornton, I., M., Grosjean, M., Shiffrar, M., (ed) Human body perception from the inside out. Oxford University Press., New York, pp 105-131 Ernst MO (2006b) A Bayesian view on multimodal cue integration. A Bayesian view on multimodal cue integration. In: Knoblich G, M. , Grosjean I, Thornton M (eds). Oxford University Press, New York, pp 105-131 Ernst MO, Banks MS (2002) Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415: 429-433 Jacobs RA (1999) Optimal integration of texture and motion cues to depth. Vision Res 39: 3621-3629 Kersten D, Yuille A (2003) Bayesian models of object perception. Curr Opin Neurobiol 13: 150-158 Knill D, Richards W (1996) Perception as Bayesian Inference. Cambridge University Press Knill DC (2003) Mixture models and the probabilistic structure of depth cues. Vision Res 43: 831-854 Knill DC (2007) Robust cue integration: a Bayesian model and evidence from cue-conflict studies with stereoscopic and figure cues to slant. J Vis 7: 5 1-24 Kording K (2007) Decision theory: what "should" the nervous system do? Science 318: 606-610 Körding KP, Ku SP, Wolpert D (2004) Bayesian Integration in force estimation. Journal of Neurophysiology 92: 3161-3165 Kording KP, Ku SP, Wolpert DM (2004) Bayesian integration in force estimation. J Neurophysiol 92: 3161-3165 Kording KP, Tenenbaum JB (2006) Causal inference in sensorimotor integration. In: Scholkopf B, Platt J, Hoffman T (eds) Advances in Neural Information Processing Systems vol 19, pp 737-744 Kording KP, Tenenbaum JB, Shadmehr R (2006 ) Multiple timescales and uncertainty in motor adaptation. In: Advances in Neural Information Processing Systems, vol 19 Kording KP, Tenenbaum JB, Shadmehr R (2007) The dynamics of memory as a consequence of optimal adaptation to a changing body. Nat Neurosci 10: 779-786 Kording KP, Wolpert DM (2004a) Bayesian integration in sensorimotor learning. Nature 427: 244-247 Kording KP, Wolpert DM (2004b) Probabilistic Inference in Human Sensorimotor Processing. Advances in Neural Information Processing Systems Kording KP, Wolpert DM (2006a) Bayesian decision theory in sensorimotor control. Trends Cogn Sci 10: 319-326 Kording KP, Wolpert DM (2006b) Bayesian decision theory in sensorimotor control. Trends Cogn Sci Laurens J, Droulez J (2007) Bayesian processing of vestibular information. Biol Cybern 96: 389-404 MacNeilage PR, Ganesan N, Angelaki DE (2008) Computational approaches to spatial orientation: from transfer functions to dynamic Bayesian inference. J Neurophysiol 100: 2981-2996 Miyazaki M, Nozaki D, Nakajima Y (2005) Testing Bayesian models of human coincidence timing. J Neurophysiol 94: 395-399 Miyazaki M, Yamamoto S, Uchida S, Kitazawa S (2006) Bayesian calibration of simultaneity in tactile temporal order judgment. Nat Neurosci 9: 875-877 Najemnik J, Geisler WS (2005) Optimal eye movement strategies in visual search. Nature 434: 387-391 Roach NW, Heron J, McGraw PV (2006) Resolving multisensory conflict: a strategy for balancing the costs and benefits of audio-visual integration. Proc Biol Sci 273: 2159-2168 Rowland B, Stanford T, Stein BE (2007) A Bayesian model unifies multisensory spatial localization with the physiological properties of the superior colliculus. Exp Brain Res DOI 10.1007/s00221-006-0847-2 Sato Y, Toyoizumi T, Aihara K (2007) Bayesian inference explains perception of unity and ventriloquism aftereffect: identification of common sources of audiovisual stimuli. Neural Comput 19: 3335-3355 Sobel D, Tenenbaum JB, Gopnik A (2004) Children's causal inferences from indirect evidence: Backwards blocking and Bayesian reasoning in preschoolers. . Cognitive Science 28: 303-333 Steyvers I, Tenenbaum JB, Wagenmakers EJ, Blum B (2003) Inferring causal networks from observations and interventions. Cognitive Science 27: 453-489. Stocker A, Simoncelli E (2005a) Sensory Adaptation within a Bayesian Framework for Perception. In: Weiss Y, Scholkopf B, J. P (eds) Advances in Neural Information Processing Systems. MIT Press, Vancouver BC Canada Stocker A, Simoncelli EP (2005b) Constraining a Bayesian model of human visual speed perception. In: Adv. Neural Information Processing Systems, v17, Stocker AA, Simoncelli EP (2006) Noise characteristics and prior expectations in human visual speed perception. Nat Neurosci 9: 578-585 Tassinari H, Hudson TE, Landy MS (2006) Combining priors and noisy visual cues in a rapid pointing task. J Neurosci 26: 10154-10163 Tenenbaum JB, Griffiths TL, Kemp C (2006) Theory-based Bayesian models of inductive learning and reasoning. Trends Cogn Sci 10: 309-318 Tenenbaum JB, Niyogi S (2003) Learning causal laws. In: Twenty-Fifth Annual Conference of the Cognitive Science Society Todorov E (2004) Optimality principles in sensorimotor control. Nat Neurosci 7: 907-915 Todorov E (2005) Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural Comput 17: 1084-1108 Todorov E, Ghahramani Z (2004) Analysis of the synergies underlying complex hand manipulation. Conf Proc IEEE Eng Med Biol Soc 6: 4637-4640 van Ee R, Adams WJ, Mamassian P (2003) Bayesian modeling of cue interaction: Bistability in stereoscopic slant perception. Journal of the Optical Society of America, A 20: 1398-1406 Wei K, Kording K (2009) Relevance of error: what drives motor adaptation? J Neurophysiol 101: 655-664 Wei K, Kording KP (2008) Relevance of error: what drives motor adaptation? J Neurophysiol Wolpert DM (2007) Probabilistic models in human sensorimotor control. Hum Mov Sci Yuille A, Bulthoff HH (1996) Bayesian decision theory and psychophysics. In: Knill D, Richards W (eds) Perception as Bayesian Inference. Cambridge University Press., Cambridge, U.K. Yuille A, Kersten D (2006) Vision as Bayesian inference: analysis by synthesis? Trends Cogn Sci 10: 301-308 Yuille AL, Bulthoff HH (1996 ) Bayesian decision theory and psychophysics In: Perception as Bayesian inference Cambridge University Press, Cambridge pp 123-161 Zupan LH, Merfeld DM, Darlot C (2002) Using sensory weighting to model the influence of canal, otolith and visual cues on spatial orientation and eye movements. Biological cybernetics 86: 209-230 countless papers in many domains
Classical interpretation Auditory input Visual input Auditory estimate Visual estimate The problem in this framework is that the weights need to change constantly as sensory uncertainty is affected by various factors Position Probabilistic framework led to fundamental reinterpretation (What follows is some new development)
But, back to the assumptions Position Vision Audition … …
Some more psychophysics
Causality / Relevance P(causal) P(not causal) Position Position or Vision Audition Vision Audition See Kording KP, Beierholm U, Ma WJ, Quartz S, Tenenbaum JB, Shams L (2007) Causal inference in multisensory perception. PLoS ONE 2: e943 Also see: Knill DC (2003) Mixture models and the probabilistic structure of depth cues. Vision Res 43: 831-854 Knill DC (2007) Robust cue integration: a Bayesian model and evidence from cue-conflict studies with stereoscopic and figure cues to slant. J Vis 7: 5 1-24 Beierholm U, Shams L, Kording K, Ma WJ (2008) Comparing Bayesian models for multisensory cue combination without mandatory integration. In: Neural Information Processing Systems, Vancouver Shams L, Ma WJ, Beierholm U (2005) Sound-induced flash illusion as an optimal percept. Neuroreport 16: 1923-1927 irrelevant
Causal inference: estimation for audition If causal otherwise Remark – we ignore priors here. They are important but they would just make the derivation here significantly (p<.005) more complicated. causal irrelevant
Nonlinear motor adaptation From Wei and Kording, J Neurophys, 2009
Exercises: causal inference edit dependenceOnCuePosition2 causal irrelevant The code is to be changed before the normalization. Otherwise the alpha constant must be different. Explore how the data I just showed (sublinear integration) can be replicated in a causal inference model alpha=.00005; pV=(1-alpha)*exp(-(x-muV).^2/(2*sigmaV^2))+alpha;
Causal inference 4 2 best estimate [cm] -2 -4 -8 -6 -4 -2 2 4 6 8 -2 -4 -8 -6 -4 -2 2 4 6 8 feedback position [cm]
Why talk about causal inference Bayesian approach is modular To solve a new problem we can almost entirely built on previous work Take advantage of other’s experiments
Thank you check www.koerding.com/tutorial for more information send email to: i-stevenson@northwestern.edu or kk@northwestern.edu with any questions