Dopamine, Uncertainty and TD Learning CNS 2004 Yael Niv Michael Duff Peter Dayan Gatsby Computational Neuroscience Unit, UCL.

Slides:



Advertisements
Similar presentations
Alan Pickering Department of Psychology
Advertisements

Reinforcement Learning I: prediction and classical conditioning
Remembering to decide: discrimination of temporally separated stimuli (selecting the best apple) Paul Miller Brandeis University.
Pre-frontal cortex and Executive Function Squire et al Ch 52.
1 Asymmetric Encoding of Positive and Negative Values in the Basal Ganglia Mati Joshua Avital Adler Hagai Bergman Hagai Bergman.
Spike Train Statistics Sabri IPM. Review of spike train  Extracting information from spike trains  Noisy environment:  in vitro  in vivo  measurement.
Learning - Dot Point 2. Part A. Learning and Changes in the Brain – Brain Structures Associated with Learning.
The free-energy principle: a rough guide to the brain? Karl Friston
Global plan Reinforcement learning I: –prediction –classical conditioning –dopamine Reinforcement learning II: –dynamic programming; action selection –Pavlovian.
Control of Attention and Gaze in the Natural World.
1 Decision making. 2 How does the brain learn the values?
Journal club Marian Tsanov Reinforcement Learning.
CS 182/CogSci110/Ling109 Spring 2008 Reinforcement Learning: Details and Biology 4/3/2008 Srini Narayanan – ICSI and UC Berkeley.
Neurobiology of drug action and
Decision making. ? Blaise Pascal Probability in games of chance How much should I bet on ’20’? E[gain] = Σgain(x) Pr(x)
Introduction: What does phasic Dopamine encode ? With asymmetric coding of errors, the mean TD error at the time of reward is proportional to p(1-p) ->
Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.
Reward processing (1) There exists plenty of evidence that midbrain dopamine systems encode errors in reward predictions (Schultz, Neuron, 2002) Changes.
ADDICTION Smoking: Biochemistry. Initiation  Nicotine nucleus accumbens  Brain reward centre  Increases dopamine in mesolimbic pathway  Ventral Tegmental.
Uncertainty, Neuromodulation and Attention Angela Yu, and Peter Dayan.
FIGURE 4 Responses of dopamine neurons to unpredicted primary reward (top) and the transfer of this response to progressively earlier reward-predicting.
Dopamine pathways & antipsychotics Pharmacology Instructor Health Sciences Faculty University of Mendoza Argentina Psychiatry Resident Mental Health Teaching.
Models of addiction: role of dopamine and other neurobiological substrates Paul E. M. Phillips, Ph.D. Department of Psychiatry and Behavioral Sciences.
Neurobiology of drug action and addiction Richard Palmiter Dept Biochemistry.
Drug Tolerance Cross Tolerance Metabolic Tolerance
Chapter 8 Instrumental Conditioning: Learning the Consequences of Behavior.
Neural circuits for bias and sensitivity in decision-making Jan Lauwereyns Associate Professor, Victoria University of Wellington, New Zealand Long-term.
Rapid Dopamine Signaling: Cocaine Versus “Natural” Rewards
1 / 41 Inference and Computation with Population Codes 13 November 2012 Inference and Computation with Population Codes Alexandre Pouget, Peter Dayan,
Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein MTAT Seminar in Computational Neuroscience Zurab Bzhalava.
Michael S. Beauchamp, Ph.D. Assistant Professor Department of Neurobiology and Anatomy University of Texas Health Science Center at Houston Houston, TX.
Testing computational models of dopamine and noradrenaline dysfunction in attention deficit/hyperactivity disorder Jaeseung Jeong, Ph.D Department of Bio.
Neural coding (1) LECTURE 8. I.Introduction − Topographic Maps in Cortex − Synesthesia − Firing rates and tuning curves.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 26- Reinforcement Learning for Robots; Brain Evidence.
DOPAMINE HYPOTHESIS.
Chapter 16. Basal Ganglia Models for Autonomous Behavior Learning in Creating Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from Humans.
A2 Unit 4 Revision Mindmaps. Biological model -Genes -Twins -Pathways -VTA-NA + MDP Initiation Maintenance Relapse 1. Models of addictive behaviour Addictive.
Chapter 50 The Prefrontal Cortex and Executive Brain Functions Copyright © 2014 Elsevier Inc. All rights reserved.
Dual mechanisms of cognitive control
A View from the Bottom Peter Dayan Gatsby Computational Neuroscience Unit.
Summary of part I: prediction and RL Prediction is important for action selection The problem: prediction of future reward The algorithm: temporal difference.
UNIT 4 BRAIN, BEHAVIOUR & EXPERIENCE AREA OF STUDY 2 MENTAL HEALTH.
Neural Reinforcement Learning Peter Dayan Gatsby Computational Neuroscience Unit thanks to Yael Niv for some slides.
Global plan Reinforcement learning I: –prediction –classical conditioning –dopamine Reinforcement learning II: –dynamic programming; action selection –Pavlovian.
What is meant by “top-down” and “bottom-up” processing? Give examples of both. Bottom up processes are evoked by the visual stimulus. Top down processes.
Reverse engineering the brain Prof. Jan Lauwereyns Advanced Engineering A.
Learning. What is Learning? Acquisition of new knowledge, behavior, skills, values, preferences or understanding.
What is meant by “top-down” and “bottom-up” processing? Give examples of both. Bottom up processes are evoked by the visual stimulus. Top down processes.
Neural correlates of risk sensitivity An fMRI study of instrumental choice behavior Yael Niv, Jeffrey A. Edlund, Peter Dayan, and John O’Doherty Cohen.
Dopamine: A Transmitter of Motion and Motivation Margaret E. Rice, Ph.D. Department of Neurosurgery Department of Physiology and Neuroscience Druckenmiller.
Nucleus Accumbens An Introductory Guide.
Dopamine system: neuroanatomy
What are the current guidelines for healthy living
DOPAMINE HYPOTHESIS.
Neuroimaging of associative learning
Specification details:
Getting Formal with Dopamine and Reward
Dopamine pathways & antipsychotics
Neuroimaging of associative learning
Volume 69, Issue 4, Pages (February 2011)
The Brain on Drugs: From Reward to Addiction
Dopamine in Motivational Control: Rewarding, Aversive, and Alerting
Ronald Keiflin, Patricia H. Janak  Neuron 
Getting Formal with Dopamine and Reward
Uma R. Karmarkar, Dean V. Buonomano  Neuron 
Neuroimaging of associative learning
Genela Morris, David Arkadir, Alon Nevet, Eilon Vaadia, Hagai Bergman 
Predictive Neural Coding of Reward Preference Involves Dissociable Responses in Human Ventral Midbrain and Ventral Striatum  John P. O'Doherty, Tony W.
Neural network model of the experiment and the brain regions associated with information processing. Neural network model of the experiment and the brain.
Orbitofrontal Cortex as a Cognitive Map of Task Space
Presentation transcript:

Dopamine, Uncertainty and TD Learning CNS 2004 Yael Niv Michael Duff Peter Dayan Gatsby Computational Neuroscience Unit, UCL

What is the function of Dopamine? Dorsal Striatum (Caudate, Putamen) Ventral Tegmental Area Substantia Nigra Amygdala Nucleus Accumbens (Ventral Striatum) Prefrontal Cortex Parkinson’s Disease -> Movement control? Intracranial self- stimulation; Drug addiction -> Reward pathway? -> Learning? Also involved in: - Working memory - Novel situations - ADHD - Schizophrenia …

What does phasic Dopamine encode? Unpredicted reward (neutral/no stimulus) Predicted reward (learned task) Omitted reward (probe trial) (Schultz et al.)

The TD Hypothesis of Dopamine  Phasic DA encodes a reward prediction error Precise theory for generation of DA firing patterns Compelling account for the role of DA in classical conditioning (Sutton+Barto 1987, Schultz,Dayan,Montague 1997) Temporal difference error

But: Fiorillo, Tobler & Schultz 2003 Introduce inherent uncertainty into the classical conditioning paradigm Five visual stimuli indicating different reward probabilities: P= 100%, 75%, 50%, 25%, 0% Stimulus = 2 sec visual stimulus Reward (probabilistic) = drops of juice

Fiorillo, Tobler & Schultz 2003 At stimulus time - DA represents mean expected reward Delay activity - A ramp in activity up to reward Hypothesis: DA ramp encodes uncertainty in reward

“Uncertainty Ramping” and TD error? The uncertainty is predictable from the stimulus TD predicts away predictable quantities  If it represents uncertainty, the ramping activity should disappear with learning according to TD.  Uncertainty ramping is not easily compatible with the TD hypothesis Are the ramps really coding uncertainty?

At time of reward: Prediction errors result from probabilistic reward delivery Crucially: Positive and negative errors cancel out A closer look at FTS’s results p = 50% p = 75%

TD prediction error δ(t) can be positive or negative Neuronal firing rate is only positive (negative values can be encoded relative to base firing rate) But: DA base firing rate is low -> asymmetric encoding of δ(t) A TD Resolution: 55% 270% δ(t) DA

Negative δ(t) scaled by d=1/6 prior to PSTH summation Simulating TD with asymmetric errors Learning proceeds normally (without scaling) − Necessary to produce the right predictions − Can be biologically plausible

With asymmetric coding of errors, the mean TD error at the time of reward  p(1-p) => Maximal at p=50% However: No need to assume explicit coding of uncertainty - Ramping is explained by neural constraints. Explanation for puzzling absence of ramp in trace conditioning results. Experimental test: Ramp as within or between trial phenomenon? Challenges: TD and noise; Conditioned inhibition, additivity DA - Uncertainty or Temporal Difference? Experiment Model

Trace conditioning: A puzzle and its resolution Same (if not more) uncertainty, but no DA ramping (Fiorillo et al.; Morris, Arkadir, Nevet, Vaadia & Bergman) Resolution: lower learning rate in trace conditioning eliminates ramp CS = short visual stimulus Trace period US (probabilistic) = drops of juice

Rate coding is inherently stochastic Add noise to tapped delay line representation => TD learning is robust to this type of noise σ = σ = σ = prediction errorweights Mirenowicz and Schultz (1996) Other sources of uncertainty: Representational Noise (1)

Neural timing of events is necessarily inaccurate Add temporal noise to tapped delay line representation => Devastating effects of even small amounts of temporal noise on TD predictions Other sources of uncertainty: Representational Noise (2) ε = 0.05 ε = 0.10