FIGURE 4 Responses of dopamine neurons to unpredicted primary reward (top) and the transfer of this response to progressively earlier reward-predicting.

Slides:



Advertisements
Similar presentations
Alan Pickering Department of Psychology
Advertisements

Reinforcement Learning I: prediction and classical conditioning
Unconditioned stimulus (food) causes unconditioned response (saliva) Conditioned stimulus (bell) causes conditioned response (saliva)
Reinforcement learning
Wanting Things How Your Brain Works - Week 8 Dr. Jan Schnupp HowYourBrainWorks.net.
dopamine and prediction error
Pre-frontal cortex and Executive Function Squire et al Ch 52.
Neural Correlates of Variations in Event Processing during Learning in Basolateral Amygdala Matthew R. Roesch*, Donna J. Calu, Guillem R. Esber, and Geoffrey.
Computational Neuromodulation Peter Dayan Gatsby Computational Neuroscience Unit University College London Nathaniel Daw Sham Kakade Read Montague John.
Global plan Reinforcement learning I: –prediction –classical conditioning –dopamine Reinforcement learning II: –dynamic programming; action selection –Pavlovian.
Control of Attention and Gaze in the Natural World.
1 Decision making. 2 How does the brain learn the values?
Journal club Marian Tsanov Reinforcement Learning.
An overly aroused state may change spatial learners in to response learners. The binding of glucocorticoids and/or NE in the amygdala.
Dopamine, Uncertainty and TD Learning CNS 2004 Yael Niv Michael Duff Peter Dayan Gatsby Computational Neuroscience Unit, UCL.
CS 182/CogSci110/Ling109 Spring 2008 Reinforcement Learning: Details and Biology 4/3/2008 Srini Narayanan – ICSI and UC Berkeley.
Decision making. ? Blaise Pascal Probability in games of chance How much should I bet on ’20’? E[gain] = Σgain(x) Pr(x)
Attention A classic definition: Everyone knows what attention is. It is the taking possession by the mind, in clear and vivid form, of one out of what.
Introduction: What does phasic Dopamine encode ? With asymmetric coding of errors, the mean TD error at the time of reward is proportional to p(1-p) ->
Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.
Reward processing (1) There exists plenty of evidence that midbrain dopamine systems encode errors in reward predictions (Schultz, Neuron, 2002) Changes.
Learning Rules 2 Computational Neuroscience 03 Lecture 9.
Theoretical Analysis of Classical Conditioning Thomas G. Bowers, Ph.D. Penn State Harrisburg.
Jochen Triesch, UC San Diego, 1 Organizing Principles for Learning in the Brain Associative Learning: Hebb rule and variations,
Neural circuits for bias and sensitivity in decision-making Jan Lauwereyns Associate Professor, Victoria University of Wellington, New Zealand Long-term.
Neural mechanisms of Spatial Learning. Spatial Learning Materials covered in previous lectures Historical development –Tolman and cognitive maps the classic.
Reinforcement learning This is mostly taken from Dayan and Abbot ch. 9 Reinforcement learning is different than supervised learning in that there is no.
Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein MTAT Seminar in Computational Neuroscience Zurab Bzhalava.
The Basal Ganglia (Lecture 6) Harry R. Erwin, PhD COMM2E University of Sunderland.
Testing computational models of dopamine and noradrenaline dysfunction in attention deficit/hyperactivity disorder Jaeseung Jeong, Ph.D Department of Bio.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 26- Reinforcement Learning for Robots; Brain Evidence.
Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.
The role of the basal ganglia in habit formation Group 4 Youngjin Kang Zhiheng Zhou Baoyu Wang.
Operant Conditioning of Cortical Activity E Fetz, 1969.
Chapter 16. Basal Ganglia Models for Autonomous Behavior Learning in Creating Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from Humans.
FMRI studies of the human basal ganglia learning system Carol A. Seger Cognitive and Behavioral Neuroscience programs Department of Psychology Colorado.
Unconditioned stimulus (food) causes unconditioned response (saliva) Conditioned stimulus (bell) causes conditioned response (saliva)
The Role of the Basal Ganglia in Habit Formation By Henry H. Yin & Barbara J. Knowlton Group 3, Week 10 Alicia Iafonaro Kimberly Villalva Tawni Voyles.
A View from the Bottom Peter Dayan Gatsby Computational Neuroscience Unit.
Neural Reinforcement Learning Peter Dayan Gatsby Computational Neuroscience Unit thanks to Yael Niv for some slides.
Lectures 9&10: Pavlovian Conditioning (Major Theories)
Effects of Ventral and Dorsal CA 1 Subregional Lesions on Trace Fear Conditioning J.L. Rogers, M.R. Hunsaker, R.P. Kesner.
Global plan Reinforcement learning I: –prediction –classical conditioning –dopamine Reinforcement learning II: –dynamic programming; action selection –Pavlovian.
What is meant by “top-down” and “bottom-up” processing? Give examples of both. Bottom up processes are evoked by the visual stimulus. Top down processes.
Neural correlates of risk sensitivity An fMRI study of instrumental choice behavior Yael Niv, Jeffrey A. Edlund, Peter Dayan, and John O’Doherty Cohen.
Does the brain compute confidence estimates about decisions?
Neural Coding of Basic Reward Terms of Animal Learning Theory, Game Theory, Microeconomics and Behavioral Ecology Wolfram Schultz Current Opinion in Neurobiology.
קשב ולמידה – Attention and associability
Neuroimaging of associative learning
מוטיבציה והתנהגות free operant
Volume 95, Issue 1, Pages e3 (July 2017)
Homework Schultz, Dayan, & Montague, Science, 1997
Pavlovian Conditioning: Mechanisms and Theories
Neuroimaging of associative learning
The Orbitofrontal Oracle: Cortical Mechanisms for the Prediction and Evaluation of Specific Behavioral Outcomes  Peter H. Rudebeck, Elisabeth A. Murray 
Hannah M. Bayer, Paul W. Glimcher  Neuron 
Impulsivity, Compulsivity, and Top-Down Cognitive Control
The Prefrontal Cortex—An Update
PSY402 Theories of Learning
Ronald Keiflin, Patricia H. Janak  Neuron 
Orbitofrontal Cortex: A Neural Circuit for Economic Decisions
Ethan S. Bromberg-Martin, Masayuki Matsumoto, Okihide Hikosaka  Neuron 
Neuroimaging of associative learning
Reward Mechanisms in Obesity: New Insights and Future Directions
Circuitry of self-control and its role in reducing addiction
Temporal Specificity of Reward Prediction Errors Signaled by Putative Dopamine Neurons in Rat VTA Depends on Ventral Striatum  Yuji K. Takahashi, Angela J.
Fabian Chersi, Neil Burgess  Neuron 
Attention, Learning, and the Value of Information
Reward Mechanisms in Obesity: New Insights and Future Directions
Orbitofrontal Cortex as a Cognitive Map of Task Space
Presentation transcript:

FIGURE 4 Responses of dopamine neurons to unpredicted primary reward (top) and the transfer of this response to progressively earlier reward-predicting conditioned stimuli with training (middle). The bottom record shows a control baseline task when the reward is predicted by an earlier stimulus and not the light. From Schultz et al. (1995) with permission.

Odor Selective Cells in the Amygdala fire preferentially with regard to outcome or reward value of an odor prior to demonstration that the animal has learned this outcome or value. Odor Selective Cells in the Amygdala fire preferentially with regard to outcome or reward value of an odor simultaneous to demonstration that the animal has learned this outcome or value.

Cells in Orbitofrontal Cortex (OFC) show less selectivity to outcome, in rats without an amygdala. This demonstrates a role for the amygdala in conveying motivational/reward information to the OFC.

Dopamine, reward processing and optimal prediction ONLY AS A REFERENCE FOR THOSE WHO ARE INTERESTED IN BEGINNING TO CROSS THE NEUROBEHAVIORALCOMPUTATIONAL DIVIDE – Maybe after the Exam??

Human dopaminergic system

Cortical and striatal projections Schultz, 1998

Koob & Le Moal, 2001

Schultz, Dayan & Montague 1997

Expected Reward v = wu v : expected reward w : weight (association) u : stimulus (binary)

Rescorla-Wagner Rule Association update rule: w  w + αδu w : weight (association) α : learning rate u : stimulus Prediction error: δ = r - v r : actual reward v : expected reward

Rescorla - Wagner provides account for: Some Pavlovian conditioning Extinction Partial reinforcement and, with more than one stimulus: Blocking Inhibitory conditioning Overshadowing … but not Latent inhibition (CS preexposure effect) Secondary conditioning

A recent update: uncertainty (  i ² ) Kakade, Montague & Dayan, 2001

Kalman weight update rule: w i  w i + α i δ With associability : α i =  i ² u i  j  j ² u j + E

An example:

U 1 U 2 U 3 U 4 U 5 U(t)U(t) input

U(t)U(t) r(t)r(t)

U(t)U(t) r(t)r(t) w(t)w(t)

U(t)U(t) ŵ(t)ŵ(t) v(t)v(t)

U(t)U(t) r(t)r(t) ŵ(t)ŵ(t) v(t)v(t)

U(t)U(t) r(t)r(t) ŵ(t)ŵ(t) v(t)v(t) δ (t)

 (t) = r(t) - v(t) Error Rule

U(t) ŵ(t)ŵ(t) v(t)v(t) inset U i -input ii w i -uncertainty -weight

Uncertainty

Kalman learning & associability weight update rule: ŵ i (t+1) = ŵ i (t) + α i (t) δ (t) associability: α i (t) =  i (t)² x i (t)  j  j (t)² x j (t) + E

Stimulus uncertainties

Reward prediction

Predicting future reward single time steps: v = wu v : expected reward w : weight (association) u : stimulus total predicted reward: v(t) = w(τ) u(t - τ) t : time steps in a trial τ : current time step tτ=0 tτ=0

Sum of discounted future rewards: With 0 ≤ γ ≤ 1 In recursive form: Schultz, Dayan & Montague, 1997

Exponential discounting, γ =.95

Temporal difference rule Total estimated future reward: v(t) = r(t)+ γv(t+1) r(t) = v(t)-γv(t+1) Temporal difference rule : δ = r(t)+γv(t+1)- v(t) ( With single time steps : δ = r - v r : actual reward v : expected reward )

Temporal difference rule Total estimated future reward: v(t) = r(t)+v(t+1) r(t) = v(t)-v(t+1) Temporal difference rule : δ = r(t) + v(t+1)- v(t) ( With single time steps : δ = r - v r : actual reward v : expected reward )

Schultz, Dayan & Montague, 1997

Schultz, 1996

Anatomical interpretation Schultz, Dayan & Montague, 1997

Temporal Difference Rule for Navigation between successive steps u and u’ δ = r a (u) + γ v(u’)-v(u)

Behavior evaluationHippocampal place field Foster, Morris & Dayan 2000

Spatial learning Foster, Morris & Dayan 2000

Conclusions Behavioral study of (nonhuman) neural systems is interesting Neural processes amenable to contemporary learning theory.. they may play distinct roles a normative framework of learning e.g. vta, hippocampus, subiculum, also- Ach in NBM/SI, NE in LC, 5-HT, ventral striatum, lateral connections, core/shell distinctions of the NAAC, patch-matrix anatomy in basal ganglia, the superior colliculus, psychoalphabetadiscobioaquadodoo