Download presentation
Presentation is loading. Please wait.
1
Dopamine, Uncertainty and TD Learning CNS 2004 Yael Niv Michael Duff Peter Dayan Gatsby Computational Neuroscience Unit, UCL
2
What is the function of Dopamine? Dorsal Striatum (Caudate, Putamen) Ventral Tegmental Area Substantia Nigra Amygdala Nucleus Accumbens (Ventral Striatum) Prefrontal Cortex Parkinson’s Disease -> Movement control? Intracranial self- stimulation; Drug addiction -> Reward pathway? -> Learning? Also involved in: - Working memory - Novel situations - ADHD - Schizophrenia …
3
What does phasic Dopamine encode? Unpredicted reward (neutral/no stimulus) Predicted reward (learned task) Omitted reward (probe trial) (Schultz et al.)
4
The TD Hypothesis of Dopamine Phasic DA encodes a reward prediction error Precise theory for generation of DA firing patterns Compelling account for the role of DA in classical conditioning (Sutton+Barto 1987, Schultz,Dayan,Montague 1997) Temporal difference error
5
But: Fiorillo, Tobler & Schultz 2003 Introduce inherent uncertainty into the classical conditioning paradigm Five visual stimuli indicating different reward probabilities: P= 100%, 75%, 50%, 25%, 0% Stimulus = 2 sec visual stimulus Reward (probabilistic) = drops of juice
6
Fiorillo, Tobler & Schultz 2003 At stimulus time - DA represents mean expected reward Delay activity - A ramp in activity up to reward Hypothesis: DA ramp encodes uncertainty in reward
7
“Uncertainty Ramping” and TD error? The uncertainty is predictable from the stimulus TD predicts away predictable quantities If it represents uncertainty, the ramping activity should disappear with learning according to TD. Uncertainty ramping is not easily compatible with the TD hypothesis Are the ramps really coding uncertainty?
8
At time of reward: Prediction errors result from probabilistic reward delivery Crucially: Positive and negative errors cancel out A closer look at FTS’s results p = 50% p = 75%
9
TD prediction error δ(t) can be positive or negative Neuronal firing rate is only positive (negative values can be encoded relative to base firing rate) But: DA base firing rate is low -> asymmetric encoding of δ(t) A TD Resolution: 55% 270% δ(t) DA
10
Negative δ(t) scaled by d=1/6 prior to PSTH summation Simulating TD with asymmetric errors Learning proceeds normally (without scaling) − Necessary to produce the right predictions − Can be biologically plausible
11
With asymmetric coding of errors, the mean TD error at the time of reward p(1-p) => Maximal at p=50% However: No need to assume explicit coding of uncertainty - Ramping is explained by neural constraints. Explanation for puzzling absence of ramp in trace conditioning results. Experimental test: Ramp as within or between trial phenomenon? Challenges: TD and noise; Conditioned inhibition, additivity DA - Uncertainty or Temporal Difference? Experiment Model
12
Trace conditioning: A puzzle and its resolution Same (if not more) uncertainty, but no DA ramping (Fiorillo et al.; Morris, Arkadir, Nevet, Vaadia & Bergman) Resolution: lower learning rate in trace conditioning eliminates ramp CS = short visual stimulus Trace period US (probabilistic) = drops of juice
13
Rate coding is inherently stochastic Add noise to tapped delay line representation => TD learning is robust to this type of noise σ = 0.0577 σ = 0.0866 σ = 0.1155 prediction errorweights Mirenowicz and Schultz (1996) Other sources of uncertainty: Representational Noise (1)
14
Neural timing of events is necessarily inaccurate Add temporal noise to tapped delay line representation => Devastating effects of even small amounts of temporal noise on TD predictions Other sources of uncertainty: Representational Noise (2) ε = 0.05 ε = 0.10
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.