Dopamine enhances model-based over model-free choice behavior Peter Smittenaar *, Klaus Wunderlich *, Ray Dolan.

Slides:



Advertisements
Similar presentations
Flexible Shaping: How learning in small steps helps Hierarchical Organization of Behavior, NIPS 2007 Kai Krueger and Peter Dayan Gatsby Computational Neuroscience.
Advertisements

Learning School of Computing, University of Leeds, UK AI23 – 2004/05 – demo 2.
1-Way Analysis of Variance
Learning, Volatility and the ACC Tim Behrens FMRIB + Psychology, University of Oxford FIL - UCL.
dopamine and prediction error
Institute for Theoretical Physics and Mathematics Tehran January, 2006 Value based decision making: behavior and theory.
Pre-frontal cortex and Executive Function Squire et al Ch 52.
Consumer Behavior Prediction using Parametric and Nonparametric Methods Elena Eneva CALD Masters Presentation 19 August 2002 Advisors: Alan Montgomery,
Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction From Sutton & Barto Reinforcement Learning An Introduction.
M ☺ deling of User Behavior In Matching Task Based on Previous Reward History and Personal Risk Factor April 1, 2004 Helen Belogolova Amy Daitch.
Journal club Marian Tsanov Reinforcement Learning.
Dopamine, Uncertainty and TD Learning CNS 2004 Yael Niv Michael Duff Peter Dayan Gatsby Computational Neuroscience Unit, UCL.
Project funded by the Future and Emerging Technologies arm of the IST Programme FET-Open scheme Neural Robot Control Cornelius Weber Hybrid Intelligent.
Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.
Biologically Inspired Robotics Group,EPFL Associative memory using coupled non-linear oscillators Semester project Final Presentation Vlad TRIFA.
From T. McMillen & P. Holmes, J. Math. Psych. 50: 30-57, MURI Center for Human and Robot Decision Dynamics, Sept 13, Phil Holmes, Jonathan.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Episodic Control: Singular Recall and Optimal Actions Peter Dayan Nathaniel Daw Máté Lengyel Yael Niv.
Resilient Machines Through Continuous Self-Modeling Pattern Recognition Seung-Hyun Lee Soft Computing Lab. Josh Bongard,Victor Zykov, and Hod.
Priming the ant or the grasshopper in people’s mind: How regulatory mode affects inter- temporal choices Lucia Mannetti*, Susanne Leder**, Libera Insalata*,
1 / 41 Inference and Computation with Population Codes 13 November 2012 Inference and Computation with Population Codes Alexandre Pouget, Peter Dayan,
Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein MTAT Seminar in Computational Neuroscience Zurab Bzhalava.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 11: Temporal Difference Learning (cont.), Eligibility Traces Dr. Itamar Arel College.
Computational models for imaging analyses Zurich SPM Course February 6, 2015 Christoph Mathys.
Chapter 16. Basal Ganglia Models for Autonomous Behavior Learning in Creating Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from Humans.
AMERICAN METEOROLOGICAL SOCIETY 1 Harvey Stern 22 nd Conference on Interactive Information and Processing Systems, Atlanta, 30 January, Combining.
Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.
Unconditioned stimulus (food) causes unconditioned response (saliva) Conditioned stimulus (bell) causes conditioned response (saliva)
The Influence of Dispositional Optimism and Pessimism on Task Engagement for Spatial and Temporal Discrimination J.L. Szalma, J.M. Ross, & P.A. Hancock.
Buying and Selling Prices under Risk, Ambiguity and Conflict Michael Smithson The Australian National University Paul D. Campbell Australian Bureau of.
Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University
CMSC 471 Fall 2009 Temporal Difference Learning Prof. Marie desJardins Class #25 – Tuesday, 11/24 Thanks to Rich Sutton and Andy Barto for the use of their.
A View from the Bottom Peter Dayan Gatsby Computational Neuroscience Unit.
Experimental Psychology PSY 433
Perceptual Learning, Roving and the Unsupervised Bias By Aaron Clarke, Henning Sprekeler, Wolfram Gerstner and Michael Herzog Brain Mind Institute École.
Neural Correlates of Learning Models in a Dynamic Decision-Making Task Darrell A. Worthy 1, Marissa A. Gorlick 2, Jeanette Mumford 2, Akram Bakkour 2,
Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.
Reinforcement learning (Chapter 21)
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: /itec6310.htm Office:
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.
THE GUIDE TO ECONOMIC THINKING
Are Protected Values Quantity Sensitive? Rumen Iliev Northwestern University.
Lateral habenula as a source of negative reward signals in dopamine neurons Masayuki Matsumoto & Okihide Hikosaka (2007) Presented by Miranda Stewart.
SLA Effects of Recasts as Implicit Knowledge Young-ah Do Fall, College English Education.
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 3: Missing Data in Longitudinal Studies.
With Age Comes Wisdom: Age-Based Differences on Choice Dependent and Choice Independent Decision-Making Tasks Darrell A. Worthy 1, Marissa A. Gorlick 2,
Neural correlates of risk sensitivity An fMRI study of instrumental choice behavior Yael Niv, Jeffrey A. Edlund, Peter Dayan, and John O’Doherty Cohen.
From Managing Emotions to Improving Relationships: Higher Quality Best Friendships Predicted from Earlier Emotion Regulation. Elenda T. Hessel, Megan M.
Ch  BEHAVIOR  the way an organism reacts to changes in its internal condition or external environment  Behaviors can be simple or complex depending.
Does the brain compute confidence estimates about decisions?
Manipulating the teaching signal: effects of dopamine-related drugs on human learning systems Wellcome Trust Centre for NeuroImaging University College.
TD(0) prediction Sarsa, On-policy learning Q-Learning, Off-policy learning.
Neural Coding of Basic Reward Terms of Animal Learning Theory, Game Theory, Microeconomics and Behavioral Ecology Wolfram Schultz Current Opinion in Neurobiology.
Chap 6.1 Simulations.
Neuroimaging of associative learning
Volume 80, Issue 4, Pages (November 2013)
Volume 95, Issue 5, Pages e5 (August 2017)
Henrik Singmann Karl Christoph Klauer Sieghard Beller
Sang Wan Lee, Shinsuke Shimojo, John P. O’Doherty  Neuron 
Neuroimaging of associative learning
Volume 65, Issue 1, Pages (January 2010)
Hedging Your Bets by Learning Reward Correlations in the Human Brain
Chapter 7: Eligibility Traces
Neuroimaging of associative learning
Bayesian Model Selection and Averaging
A, Rewarded voluntary switch task, a combined risky decision-making and task-switching paradigm. A, Rewarded voluntary switch task, a combined risky decision-making.
Striatal Activity Underlies Novelty-Based Choice in Humans
World models and basis functions
Presentation transcript:

Dopamine enhances model-based over model-free choice behavior Peter Smittenaar *, Klaus Wunderlich *, Ray Dolan

Model-based and model-free systems model-free (habitual) -Cached values: single stored value -Learned over many repetitions -TD prediction error -Inflexible, but computationally cheap model-based (goal-directed) -Model of environment with states and rewards -Forward model computes best action ‘on-the-fly’ -Flexible, but computationally costly Behavior is a combination of these two systems (Daw et al., 2011)

How do these two systems interact to generate behavior? Compete at output / collaborate during learning? (Daw et al., 2005; Doll et al., 2009; Biele et al., 2011) Both systems use overlapping neural systems. (Daw et al., 2011; Wunderlich et al., 2012) What is the role of dopamine in model-based/model-free interactions? How does L-DOPA affect control exerted by either system? Two systems interact Daw et al., 2011 conjunction: model-based & model-free

2-step task based on Daw et al., 2011 X

p(stay) dissociates two systems

Daw et al., 2011 choices show both systems have control

Daw et al., 2011 choices show both systems have control

Choice is a mix of model-free and model-based control Daw et al., 2011 choices show both systems have control 18 subjects on and off L-DOPA within-subject design

Daw et al., 2011 choices show both systems have control Choice is a mix of model-free and model-based control

L-DOPA enhances model-based control L-DOPA increases model-based, but not model-free behavior

Parameter w weights MB and MF influence V 1 : value stimulus 1 w: weighting parameter α: model-free learning rate λ: eligibility gain r: reward on trial t 12

L-DOPA increases model-based control (w) * p =.005

L-DOPA does not affect model-free system L-DOPA enhances model-based over model-free control No effect on model-free: Xlearning rate Xnoise Xpolicy / value updating Xpositive / negative prediction errors

Conclusion L-DOPA enhances model-based over model-free control No effect on model-free: Xlearning rate Xnoise Xpolicy / value updating Xpositive / negative prediction errors L-DOPA might o improve components of model-based system o directly alter interaction between both systems at learning or choice (Doll et al., 2009)

L-DOPA minus placebo Effect stronger after unrewarded trials

L-DOPA minus placebo Effect stronger after unrewarded trials Increase in model-based control particularly strong after unrewarded trials

Conclusion L-DOPA enhances model-based over model-free behavior L-DOPA might o improve components of model-based system o directly alter interaction between both systems at learning or choice (Doll et al., 2009) o facilitate switching to model-based control when needed (Isoda and Hikosaka, 2011)

Acknowledgements Klaus Wunderlich Tamara Shiner The Einstein meeting’s organizers Ray Dolan

Thank you

‘Random effects’ Bayesian model comparison Alternative models Alternative model to a, b, p, w PlaceboL-DOPA Better in #subjects Exceedanc e probability Better in #subjects Exceedanc e probability a1, a2, b1, b2, l, p, w17> a1, a2, b1, b2, p, w a, b1, b2, p, w a1, a2, b, p, w15> >0.999 a+, a-, b, p, w12>.83115>.996 a, b, l, p, w16> >0.999 a, b, w16> a, b16> MF/MB learning rates Actor/critic learning18> >0.999 MB prediction errors12>