Introduction: What does phasic Dopamine encode ? With asymmetric coding of errors, the mean TD error at the time of reward is proportional to p(1-p) ->

Slides:

Advertisements

Similar presentations

Alan Pickering Department of Psychology

Advertisements

Reinforcement Learning I: prediction and classical conditioning

Remembering to decide: discrimination of temporally separated stimuli (selecting the best apple) Paul Miller Brandeis University.

Omri Barak Collaborators: Larry Abbott David Sussillo Misha Tsodyks Sloan-Swartz July 12, 2011 Messy nice stuff & Nice messy stuff.

1 Asymmetric Encoding of Positive and Negative Values in the Basal Ganglia Mati Joshua Avital Adler Hagai Bergman Hagai Bergman.

Neural Network of the Cerebellum: Temporal Discrimination and the Timing of Responses Michael D. Mauk Dean V. Buonomano.

Spike Train Statistics Sabri IPM. Review of spike train  Extracting information from spike trains  Noisy environment:  in vitro  in vivo  measurement.

Neural Correlates of Variations in Event Processing during Learning in Basolateral Amygdala Matthew R. Roesch*, Donna J. Calu, Guillem R. Esber, and Geoffrey.

Spike Trains Kenneth D. Harris 3/2/2015. You have recorded one neuron How do you analyse the data? Different types of experiment: Controlled presentation.

Neuroscience & Behavior Program Robert J. Polewan & John W. Moore* University of Massachusetts Amherst COMPOUND CONDITIONING UNDER TEMPORAL UNCERTAINTY.

Teaching with Immersive Gaming CPD 1 The Science behind Twigging.

Spike timing-dependent plasticity: Rules and use of synaptic adaptation Rudy Guyonneau Rufin van Rullen and Simon J. Thorpe Rétroaction lors de l‘ Intégration.

Artificial Spiking Neural Networks

Brain Rhythms and Short-Term Memory Earl K. Miller The Picower Institute for Learning and Memory and Department of Brain and Cognitive Sciences, Massachusetts.

Rules for Information Maximization in Spiking Neurons Using Intrinsic Plasticity Prashant Joshi & Jochen Triesch { joshi,triesch

1 Decision making. 2 How does the brain learn the values?

Dopamine, Uncertainty and TD Learning CNS 2004 Yael Niv Michael Duff Peter Dayan Gatsby Computational Neuroscience Unit, UCL.

CS 182/CogSci110/Ling109 Spring 2008 Reinforcement Learning: Details and Biology 4/3/2008 Srini Narayanan – ICSI and UC Berkeley.

Decision making. ? Blaise Pascal Probability in games of chance How much should I bet on ’20’? E[gain] = Σgain(x) Pr(x)

Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.

Reward processing (1) There exists plenty of evidence that midbrain dopamine systems encode errors in reward predictions (Schultz, Neuron, 2002) Changes.

Uncertainty, Neuromodulation and Attention Angela Yu, and Peter Dayan.

FIGURE 4 Responses of dopamine neurons to unpredicted primary reward (top) and the transfer of this response to progressively earlier reward-predicting.

Serotonin and Impulsivity Yufeng Zhang. Serotonin Originate from the median and dorsal raphe nuclei. Serotonin has been implicated in a variety of motor,

Romain Brette Ecole Normale Supérieure, Paris Philosophy of the spike.

Zicong Zhang Authors Wendy A. Suzuki Professor of Neural Science and Psychology, New York University Research interest: Organization of memory.

Abstract We start with a statistical formulation of Helmholtz’s ideas about neural energy to furnish a model of perceptual inference and learning that.

The free-energy principle: a rough guide to the brain? K Friston Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.

D-ITET / IBT / TNU Role of Norepinephrine in Learning and Plasticity (Part II) a Computational Approach Valance Wang TNU, ETH Zurich.

Neural mechanisms of Spatial Learning. Spatial Learning Materials covered in previous lectures Historical development –Tolman and cognitive maps the classic.

PSY105 Neural Networks 4/5 4. “Traces in time” Assignment note: you don't need to read the full book to answer the first half of the question. You should.

1 / 41 Inference and Computation with Population Codes 13 November 2012 Inference and Computation with Population Codes Alexandre Pouget, Peter Dayan,

Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein MTAT Seminar in Computational Neuroscience Zurab Bzhalava.

The attraction hypothesis

CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 26- Reinforcement Learning for Robots; Brain Evidence.

Chapter 16. Basal Ganglia Models for Autonomous Behavior Learning in Creating Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from Humans.

Biological Modeling of Neural Networks Week 6 Hebbian LEARNING and ASSOCIATIVE MEMORY Wulfram Gerstner EPFL, Lausanne, Switzerland 6.1 Synaptic Plasticity.

CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.

PSY105 Neural Networks 5/5 5. “Function – Computation - Mechanism”

Braun Y. R. 1, Edelman, S. 2, Ebstein R. P. 3, 4, Gluck, M.A. 5 and Tomer R. 1 1 Psychology Department, University of Haifa, Haifa 31905, Israel, 2 Neurobiology.

Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University

The Function of Synchrony Marieke Rohde Reading Group DyStURB (Dynamical Structures to Understand Real Brains)

A View from the Bottom Peter Dayan Gatsby Computational Neuroscience Unit.

Pattern Classification of Attentional Control States S. G. Robison, D. N. Osherson, K. A. Norman, & J. D. Cohen Dept. of Psychology, Princeton University,

Summary of part I: prediction and RL Prediction is important for action selection The problem: prediction of future reward The algorithm: temporal difference.

Read Montague Baylor College of Medicine Houston, TX Reward Processing and Social Exchange.

EMPATH: A Neural Network that Categorizes Facial Expressions Matthew N. Dailey and Garrison W. Cottrell University of California, San Diego Curtis Padgett.

Perseveration following a temporal delay in the Dimensional Change Card Sort. Anthony Steven Dick and Willis F. Overton Temple University Correspondence.

Global plan Reinforcement learning I: –prediction –classical conditioning –dopamine Reinforcement learning II: –dynamic programming; action selection –Pavlovian.

What is meant by “top-down” and “bottom-up” processing? Give examples of both. Bottom up processes are evoked by the visual stimulus. Top down processes.

CHARACTERIZATION OF NONLINEAR NEURON RESPONSES AMSC 664 Final Presentation Matt Whiteway Dr. Daniel A. Butts Neuroscience.

Reverse engineering the brain Prof. Jan Lauwereyns Advanced Engineering A.

>>ITD.m running… IC 800Hz 40 sp/sec 34 O azim Neuron April 16, 2009 Bo Zhu HST.723 Spring 2009 Theme 3 Paper Presentation April 1, 2009.

Bayesian inference Lee Harrison York Neuroimaging Centre 23 / 10 / 2009.

Neural correlates of risk sensitivity An fMRI study of instrumental choice behavior Yael Niv, Jeffrey A. Edlund, Peter Dayan, and John O’Doherty Cohen.

Does the brain compute confidence estimates about decisions?

The Neural Code Baktash Babadi SCS, IPM Fall 2004.

Spontaneous activity in V1: a probabilistic framework

Chapter 6: Temporal Difference Learning

An Overview of Reinforcement Learning

The General Linear Model (GLM): the marriage between linear systems and stats FFA.

Neuroimaging of associative learning

The free-energy principle: a rough guide to the brain? K Friston

Neuroimaging of associative learning

Chapter 6: Temporal Difference Learning

Hannah M. Bayer, Paul W. Glimcher Neuron

Ethan S. Bromberg-Martin, Masayuki Matsumoto, Okihide Hikosaka Neuron

Neuroimaging of associative learning

Genela Morris, David Arkadir, Alon Nevet, Eilon Vaadia, Hagai Bergman

Neural network model of the experiment and the brain regions associated with information processing. Neural network model of the experiment and the brain.

Presentation transcript:

Introduction: What does phasic Dopamine encode ? With asymmetric coding of errors, the mean TD error at the time of reward is proportional to p(1-p) -> Indeed maximal at p=50% Classical conditioning paradigm (delay conditioning) using probabilistic outcomes -> generates ongoing prediction errors in a learned task Single DA cell recordings in VTA/SNc: At stimulus time - DA represents mean expected reward (compliant with TD hypothesis) Surprising ramping of activity in the delay -> Fiorillo et al.’s hypothesis: Coding of uncertainty However: No prediction error to `justify’ ramp TD learning predicts away any predictable quantity Uncertainty not available for control -> The uncertainty hypothesis seems contradictory to the TD hypothesis DA single cell recordings from the lab of Wolfram Schultz Overview Substantial evidence suggests that phasic dopaminergic firing represents a temporal difference (TD) error in the predictions of future reward. Recent experiments probe the way information about outcomes propagates back to the stimuli predicting them. These use stochastic rewards (eg., Fiorillo et al., 2003) which allow systematic study of persistent prediction errors even in well learned tasks. We use a novel theoretical analysis to show that across-trials ramping in DA activity may be a signature of this process. Importantly, we address the asymmetric coding in DA activity of positive and negative TD errors, and acknowledge the constant learning that results from ongoing prediction errors. Selected References [1] Fiorillo, Tobler & Schultz (2003) - Discrete coding of reward probability and uncertainty by dopamine neurons. Science, 299, 1898–1902. [2]Morris, Arkadir, Nevet, Vaadia & Bergman (2004) – Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron, 43, [3]Montague, Dayan & Sejnowski (1996) – J Neurosci, 16: [4]Sutton and Barto (1988) – Reinforcement learning: An introduction, MIT Press.Acknowledgements This research was funded by an EC Thematic Network short-term fellowship to YN and The Gatsby Charitable Foundation. Asymmetric Coding of Temporal Difference Errors: Implications for Dopamine Firing Patterns Y. Niv 1,2, M.O. Duff 2 and P. Dayan 2 (1)Interdisciplinary Center for Neural Computation, Hebrew University, Jerusalem, (2) Gatsby Computational Neuroscience Unit, University College London Simulating TD with asymmetric coding Unpredicted reward (neutral/no stimulus) Predicted reward (learned task) -> DA encodes a temporally sophisticated reward signal Computational hypothesis – DA encodes reward prediction error: (Sutton, Barto 1987, Montague, Dayan, Sejnowski, 1996) Temporal Difference error -> Phasic DA encodes reward prediction error Precise computational theory for generation of DA firing patterns Compelling account for role of DA in appetitive conditioning Experimental results: measuring propagating errors Fiorillo et al. (2003) 2 sec visual stimulus indicating reward probability – 100%, 75%, 50%, 25% or 0% Probabilistic reward (drops of juice) A TD resolution: Ramps result from backpropagating prediction errors - Note that according to TD, activity at time of reward should cancel out – but it doesn’t. This is because… Prediction errors can be positive or negative However, firing rate is positive -> encoding of negative errors is relative to baseline activity But: baseline activity in DA cells is low (2-5Hz) -> asymmetric coding of errors Experiment Model Omitted reward (probe trial) Negative δ(t) scaled by d=1/6 prior to PSTH summation Learning proceeds normally (without scaling): Necessary to produce the right predictions Can be biologically plausible Conclusion: Uncertainty or Temporal Difference? Trace conditioning: A puzzle solved Short visual stimulus Trace period Reward (probabilistic) = drops of juice Same (if not more) uncertainty But: no DA ramping Morris et al. (2004) (see also Fiorillo et al. (2003)) Solution: Lower learning rate in trace conditioning eliminates ramp Indeed: computed learning rate in Morris et al.’s data near zero (personal communication) However: No need to assume explicit coding of uncertainty – Ramping in DA activity is explained by neural constraints. Explanation for puzzling absence of ramp in trace conditioning results. Experimental tests:  Ramp as within or between trial phenomenon?  Relationship between ramp size and learning rate (within/between experiments)? Challenges to TD remain: TD and noise; Conditioned inhibition; additivity… Visualizing Temporal-Difference Learning: After first trial After third trialTask learnedLearning continues (~10 trials) x(1) x(2) … r( t ) δ(t)δ(t) V(1) V(30) 55% 270% δ(t) DA Bayer and Glimcher Schultz lab -> Ongoing (intertwined) backpropagation of asymmetrically coded positive and negative errors causes ramps to appear in the summed PSTH -> The ramp itself is a between trial and not a within trial phenomenon (results from summation over different reward histories) x50% x75% x25% p = 50% p = 75%