Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.

Slides:



Advertisements
Similar presentations
Alan Pickering Department of Psychology
Advertisements

Reinforcement Learning I: prediction and classical conditioning
On the Effects of Immediate Feedback Ido Erev, Adi Luria & Annan Erev Technion Israeli Institute of Technology The Max-Wertheimer Minerva Center for Cognitive.
Neural population code for fine perceptual decisions in area MT Gopathy Purushothaman m M David C Bradley Image from: PLoS Journal Club # 4 September 28.
Unconditioned stimulus (food) causes unconditioned response (saliva) Conditioned stimulus (bell) causes conditioned response (saliva)
Reinforcement learning
Classical Conditioning II
Remembering to decide: discrimination of temporally separated stimuli (selecting the best apple) Paul Miller Brandeis University.
Chapter 3: Neural Processing and Perception. Lateral Inhibition and Perception Experiments with eye of Limulus –Ommatidia allow recordings from a single.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov Tomas Gedeon John P. Miller.
Randomized Strategies and Temporal Difference Learning in Poker Michael Oder April 4, 2002 Advisor: Dr. David Mutchler.
Long-Term Memory: Encoding and Retrieval
Pre-frontal cortex and Executive Function Squire et al Ch 52.
1 Asymmetric Encoding of Positive and Negative Values in the Basal Ganglia Mati Joshua Avital Adler Hagai Bergman Hagai Bergman.
Neural Network of the Cerebellum: Temporal Discrimination and the Timing of Responses Michael D. Mauk Dean V. Buonomano.
Spike Train Statistics Sabri IPM. Review of spike train  Extracting information from spike trains  Noisy environment:  in vitro  in vivo  measurement.
Spike Trains Kenneth D. Harris 3/2/2015. You have recorded one neuron How do you analyse the data? Different types of experiment: Controlled presentation.
The free-energy principle: a rough guide to the brain? Karl Friston
Neurophysics Part 1: Neural encoding and decoding (Ch 1-4) Stimulus to response (1-2) Response to stimulus, information in spikes (3-4) Part 2: Neurons.
The University of Manchester Introducción al análisis del código neuronal con métodos de la teoría de la información Dr Marcelo A Montemurro
Reading population codes: a neural implementation of ideal observers Sophie Deneve, Peter Latham, and Alexandre Pouget.
Electrophysiology of Visual Attention. Moran and Desimone (1985) “Classical” RF prediction: there should be no difference in responses in these two conditions.
We are on track for an exam on NOVEMBER 2 nd To cover everything since last exam up to Friday the 28th.
Journal club Marian Tsanov Reinforcement Learning.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
Time, Rate and Conditioning or “A model with no free parameters that explains everything in behaviour” C.R. Gallistel John Gibbon Psych. Review 2000, 107(2):
Dopamine, Uncertainty and TD Learning CNS 2004 Yael Niv Michael Duff Peter Dayan Gatsby Computational Neuroscience Unit, UCL.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
Decision making. ? Blaise Pascal Probability in games of chance How much should I bet on ’20’? E[gain] = Σgain(x) Pr(x)
Introduction: What does phasic Dopamine encode ? With asymmetric coding of errors, the mean TD error at the time of reward is proportional to p(1-p) ->
1 The Neural Basis of Temporal Processing Michael D. Mauk Department of Neurobiology and Anatomy University of Texas Houston Medical School Slideshow by.
Reward processing (1) There exists plenty of evidence that midbrain dopamine systems encode errors in reward predictions (Schultz, Neuron, 2002) Changes.
Uncertainty, Neuromodulation and Attention Angela Yu, and Peter Dayan.
FIGURE 4 Responses of dopamine neurons to unpredicted primary reward (top) and the transfer of this response to progressively earlier reward-predicting.
Serotonin and Impulsivity Yufeng Zhang. Serotonin Originate from the median and dorsal raphe nuclei. Serotonin has been implicated in a variety of motor,
Romain Brette Ecole Normale Supérieure, Paris Philosophy of the spike.
The free-energy principle: a rough guide to the brain? K Friston Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
An Information Processing Perspective on Conditioning C. R. Gallistel Rutgers Center for Cognitive Science.
Neural mechanisms of Spatial Learning. Spatial Learning Materials covered in previous lectures Historical development –Tolman and cognitive maps the classic.
1 / 41 Inference and Computation with Population Codes 13 November 2012 Inference and Computation with Population Codes Alexandre Pouget, Peter Dayan,
Dr. Ramez. Bedwani.  Different methods of learning  Factors affecting learning.
Neural coding (1) LECTURE 8. I.Introduction − Topographic Maps in Cortex − Synesthesia − Firing rates and tuning curves.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 26- Reinforcement Learning for Robots; Brain Evidence.
Chapter 50 The Prefrontal Cortex and Executive Brain Functions Copyright © 2014 Elsevier Inc. All rights reserved.
Bayesian evidence for visualizing model selection uncertainty Gordon L. Kindlmann
Unconditioned stimulus (food) causes unconditioned response (saliva) Conditioned stimulus (bell) causes conditioned response (saliva)
The Function of Synchrony Marieke Rohde Reading Group DyStURB (Dynamical Structures to Understand Real Brains)
A View from the Bottom Peter Dayan Gatsby Computational Neuroscience Unit.
Blocking The phenomenon of blocking tells us that what happens to one CS depends not only on its relationship to the US but also on the strength of other.
Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder.
Global plan Reinforcement learning I: –prediction –classical conditioning –dopamine Reinforcement learning II: –dynamic programming; action selection –Pavlovian.
Result 1: Effect of List Length Result 2: Effect of Probe Position Prediction by perceptual similarity Prediction by physical similarity Subject
July 23, BSA, a Fast and Accurate Spike Train Encoding Scheme Benjamin Schrauwen.
Reverse engineering the brain Prof. Jan Lauwereyns Advanced Engineering A.
Neural correlates of risk sensitivity An fMRI study of instrumental choice behavior Yael Niv, Jeffrey A. Edlund, Peter Dayan, and John O’Doherty Cohen.
Psychology and Neurobiology of Decision-Making under Uncertainty Angela Yu March 11, 2010.
Does the brain compute confidence estimates about decisions?
The General Linear Model (GLM): the marriage between linear systems and stats FFA.
Neuroimaging of associative learning
Predicting Every Spike
Chapter 5 Learning.
Volume 72, Issue 4, Pages (November 2011)
The free-energy principle: a rough guide to the brain? K Friston
Neuroimaging of associative learning
Uma R. Karmarkar, Dean V. Buonomano  Neuron 
Ethan S. Bromberg-Martin, Masayuki Matsumoto, Okihide Hikosaka  Neuron 
Neuroimaging of associative learning
Genela Morris, David Arkadir, Alon Nevet, Eilon Vaadia, Hagai Bergman 
Neural network model of the experiment and the brain regions associated with information processing. Neural network model of the experiment and the brain.
Orbitofrontal Cortex as a Cognitive Map of Task Space
Presentation transcript:

Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan

What does Dopamine encode? Important neuromodulator -Neurological/psychiatric disorders -Drug addiction/self stimulation Fundamental role in RL -Classical/Pavlovian conditioning -Instrumental/operant conditioning DA neurons respond to: −Unexpected (appetitive) rewards −Stimuli predicting (appetitive) rewards −Withdrawal of expected rewards −Novel/Salient stimuli

What does Dopamine encode?  DA represents some aspect of reward, but not rewards as such.

The TD Hypothesis of Dopamine DA encodes the reward prediction error <-DA Stimulus Reward DA δ(t) Precise theory for the generation of DA firing patterns Compelling account for the role of DA in classical conditioning

But: Fiorillo, Tobler & Schultz 2003 Introduce inherent uncertainty into the classical conditioning paradigm Five visual stimuli indicating different reward probabilities: P=0, ¼, ½, ¾,1 CS = 2 sec visual stimulus US (probabilistic) = drops of juice

Fiorillo, Tobler & Schultz 2003 At stimulus time: DA represents mean expected reward Interesting: A ramp in activity up to reward (highest for p=½) Hypothesis: DA ramp encodes uncertainty in reward

Dopamine: Uncertainty or TD error? No apparent reason for ramp The ramp is predictable from the stimulus TD predicts away predictable quantities contradiction ! Side issue: the ramp is like a constantly surprising reward -- it can’t influence action choice

At time of reward: Prediction errors result from uncertainty Crucially: Positive and negative errors cancel out A closer look at FTS’s results: p = 0.5 p = 0.75

TD error δ(t) can be positive or negative Neuronal firing rate is only positive (negative values are coded relative to base firing rate) But: DA base firing rate is low -> asymmetric encoding of δ(t) A closer look at FTS’s results: 55% 270% δ(t) DA

x(1) x(2) … r(t) δ(t) V(1) V(20) Tapped delay line Standard online TD learning Fixed learning rate Negative δ(t) scaled by d=1/6 prior to PSTH Modeling TD with asymmetric errors Learning proceeds normally (without scaling) −Necessary to produce the right predictions −Can be biologically plausible

TD learning with asymmetric prediction errors replicates the recorded data accurately.  Ramps result from asymmetrically coded prediction errors propagating back to stimulus Artifact of summing PSTHs over nonstationary recent reward histories Modeling TD with asymmetric errors

Analytically deriving the maximum error at the time of the reward we get: => the ramp is indeed highest for P=½ But: DA Encodes nothing but temporal difference error! Experimental test: Ramp as within or between trial phenomenon? DA: Uncertainty or Temporal Difference?

Trace conditioning: A puzzle and its resolution Same (if not more) uncertainty, but… no DA ramping! (Fiorillo et al.; Morris, Arkadir, Nevet, Vaadia & Bergman) Resolution: lower learning rate in trace conditioning eliminates ramp CS = short visual stimulus Trace period US (probabilistic) = drops of juice

Modeling TD with asymmetric errors: Small response for reward at Pr=1 and for stimulus at Pr=0 -> Result from misidentification of stimuli (Morris Arkadir, Nevet, Vaadia & Bergman)

σ = σ = σ = prediction errorweights Rate coding is inherently stochastic Add noise to tapped delay line representation => TD learning is robust to this type of noise Mirenowicz and Schultz (1996) Other sources of uncertainty: Representational Noise (1)

Neural timing of events is necessarily inaccurate Add temporal noise to tapped delay line representation => Devastating effects of even small amounts of temporal noise on TD’s predictions! Other sources of uncertainty: Representational Noise (2) ε = 0.05 ε = 0.10

Conclusions Preserve the TD hypothesis of Dopamine: −No explicit coding of uncertainty −Ramping explained by neural constraints −Explains the disappearance of the ramp in trace conditioning Important challenges to the TD hypothesis −Conditioned inhibition −Effects of timing