Computational Neuromodulation Peter Dayan Gatsby Computational Neuroscience Unit University College London Nathaniel Daw Sham Kakade Read Montague John.

Slides:



Advertisements
Similar presentations
BEHAVIORAL RESEARCH IN MANAGERIAL ACCOUNTING RANJANI KRISHNAN HARVARD BUSINESS SCHOOL & MICHIGAN STATE UNIVERSITY 2008.
Advertisements

Alan Pickering Department of Psychology
Reinforcement Learning I: prediction and classical conditioning
Reinforcement learning 2: action selection
Reinforcement learning
dopamine and prediction error
Attention Wolfe et al Ch 7, Werner & Chalupa Ch 75, 78.
Global plan Reinforcement learning I: –prediction –classical conditioning –dopamine Reinforcement learning II: –dynamic programming; action selection –Pavlovian.
1 Decision making. 2 How does the brain learn the values?
Journal club Marian Tsanov Reinforcement Learning.
Organizational Notes no study guide no review session not sufficient to just read book and glance at lecture material midterm/final is considered hard.
Substance Abuse, Multi-Stage Decisions, Generalization Error How are they connected?! S.A. Murphy Univ. of Michigan CMU, Nov., 2004.
Dopamine, Uncertainty and TD Learning CNS 2004 Yael Niv Michael Duff Peter Dayan Gatsby Computational Neuroscience Unit, UCL.
CS 182/CogSci110/Ling109 Spring 2008 Reinforcement Learning: Details and Biology 4/3/2008 Srini Narayanan – ICSI and UC Berkeley.
What is Cognitive Science? … is the interdisciplinary study of mind and intelligence, embracing philosophy, psychology, artificial intelligence, neuroscience,
Introduction: What does phasic Dopamine encode ? With asymmetric coding of errors, the mean TD error at the time of reward is proportional to p(1-p) ->
Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.
Baysian Approaches Kun Guo, PhD Reader in Cognitive Neuroscience School of Psychology University of Lincoln Quantitative Methods 2011.
What is Cognitive Science? … is the interdisciplinary study of mind and intelligence, embracing philosophy, psychology, artificial intelligence, neuroscience,
Reward processing (1) There exists plenty of evidence that midbrain dopamine systems encode errors in reward predictions (Schultz, Neuron, 2002) Changes.
Uncertainty, Neuromodulation and Attention Angela Yu, and Peter Dayan.
FIGURE 4 Responses of dopamine neurons to unpredicted primary reward (top) and the transfer of this response to progressively earlier reward-predicting.
Serotonin and Impulsivity Yufeng Zhang. Serotonin Originate from the median and dorsal raphe nuclei. Serotonin has been implicated in a variety of motor,
TOWARDS A CONTROL THEORY OF ATTENTION by John Taylor Department of Mathematics King’s College London, UK s:
Abstract We start with a statistical formulation of Helmholtz’s ideas about neural energy to furnish a model of perceptual inference and learning that.
D-ITET / IBT / TNU Role of Norepinephrine in Learning and Plasticity (Part II) a Computational Approach Valance Wang TNU, ETH Zurich.
1 / 41 Inference and Computation with Population Codes 13 November 2012 Inference and Computation with Population Codes Alexandre Pouget, Peter Dayan,
Reinforcement learning This is mostly taken from Dayan and Abbot ch. 9 Reinforcement learning is different than supervised learning in that there is no.
Prediction in Human Presented by: Rezvan Kianifar January 2009.
Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein MTAT Seminar in Computational Neuroscience Zurab Bzhalava.
Testing computational models of dopamine and noradrenaline dysfunction in attention deficit/hyperactivity disorder Jaeseung Jeong, Ph.D Department of Bio.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 26- Reinforcement Learning for Robots; Brain Evidence.
The role of the basal ganglia in habit formation Group 4 Youngjin Kang Zhiheng Zhou Baoyu Wang.
Chapter 16. Basal Ganglia Models for Autonomous Behavior Learning in Creating Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from Humans.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Slide 1 Tutorial: Optimal Learning in the Laboratory Sciences Overview December 10, 2014 Warren B. Powell Kris Reyes Si Chen Princeton University
1 Impaired Decision Making In Substance Use Disorders Claire Wilcox MD UNM Dept of Psychiatry Alcohol Medical Scholars Program © AMSP.
Optimality, robustness, and dynamics of decision making under norepinephrine modulation: A spiking neuronal network model Joint work with Philip Eckhoff.
Unconditioned stimulus (food) causes unconditioned response (saliva) Conditioned stimulus (bell) causes conditioned response (saliva)
Computational Neuromodulation
Autism Presented by : Hosein Hamdi. Autism manifests during the first three years of life Genetic factors play a significant and complex role in autism.
Summary of part I: prediction and RL Prediction is important for action selection The problem: prediction of future reward The algorithm: temporal difference.
Abstract We offer a formal treatment of choice behaviour based on the premise that agents minimise the expected free energy of future outcomes. Crucially,
Neural Networks Chapter 7
A View from the Bottom Peter Dayan Gatsby Computational Neuroscience Unit.
Summary of part I: prediction and RL Prediction is important for action selection The problem: prediction of future reward The algorithm: temporal difference.
Neural Reinforcement Learning Peter Dayan Gatsby Computational Neuroscience Unit thanks to Yael Niv for some slides.
Jochen Triesch, UC San Diego, 1 Attention Outline: Overview bottom-up attention top-down attention physiology of attention.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Global plan Reinforcement learning I: –prediction –classical conditioning –dopamine Reinforcement learning II: –dynamic programming; action selection –Pavlovian.
Neural correlates of risk sensitivity An fMRI study of instrumental choice behavior Yael Niv, Jeffrey A. Edlund, Peter Dayan, and John O’Doherty Cohen.
Psychology and Neurobiology of Decision-Making under Uncertainty Angela Yu March 11, 2010.
Bayesian Brain - Chapter 11 Neural Models of Bayesian Belief Propagation Rajesh P.N. Rao Summary by B.-H. Kim Biointelligence Lab School of.
Modelling Madness Susan Totterdell.
کارگاه تخصصی توانبخشی توجه
Computational Neuromodulation
An Overview of Reinforcement Learning
קשב ולמידה – Attention and associability
Neuroimaging of associative learning
Chapter 2 Human Information Processing
מוטיבציה והתנהגות free operant
Dopamine Does Double Duty in Motivating Cognitive Effort
Chapter 2: Evaluative Feedback
Neuroimaging of associative learning
Uncertainty, Neuromodulation, and Attention
Neuroimaging of associative learning
Neuromodulation of Attention
Will Penny Wellcome Trust Centre for Neuroimaging,
Chapter 2: Evaluative Feedback
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

Computational Neuromodulation Peter Dayan Gatsby Computational Neuroscience Unit University College London Nathaniel Daw Sham Kakade Read Montague John O’Doherty Wolfram Schultz Ben Seymour Terry Sejnowski Angela Yu

2 5. Diseases of the Will Contemplators Bibliophiles and Polyglots Megalomaniacs Instrument addicts Misfits Theorists

3 There are highly cultivated, wonderfully endowed minds whose wills suffer from a particular form of lethargy. Its undeniable symptoms include a facility for exposition, a creative and restless imagination, an aversion to the laboratory, and an indomitable dislike for concrete science and seemingly unimportant data… When faced with a difficult problem, they feel an irresistible urge to formulate a theory rather than question nature. As might be expected, disappointments plague the theorist… Theorists

4 Computation and the Brain statistical computations –representation from density estimation (Terry) –combining uncertain information over space, time, modalities for sensory/memory inference –learning as a hierarchical Bayesian problem –learning as a filtering problem control theoretic computations –optimising rewards, punishments –homeostasis/allostasis

5 Conditioning Ethology Psychology –classical/operant conditioning Computation –dynamic programming –Kalman filtering Algorithm –TD/delta rules Neurobiology neuromodulators; amygdala; OFC; nucleus accumbens; dorsal striatum prediction: of important events control: in the light of those predictions policy evaluation policy improvement

6 Dopamine no predictionprediction, rewardprediction, no reward R R L Schultz et al RLR drug addiction, self-stimulation effect of antagonists effect on vigour link to action `scalar’ signal

7 Prediction, but What Sort? Sutton: predict sum future reward TD error

8 Rewards rather than Punishments no predictionprediction, rewardprediction, no reward TD error V(t) R RL dopamine cells in VTA/SNc Schultz et al

9 Prediction, but What Sort? Sutton: Watkins: policy evaluation predict sum future reward TD error

10 Policy Improvement Sutton: define (x;M) do R-M on: uses the same TD error Watkins: value iteration with

11 Active Issues exploration/exploitation model-based (PFC)/cached (striatal) methods motivational influences vigour hierarchical control (PFC) hyperbolic discounting, Pavlovian misbehavior and ‘the will’ representational learning appetitive/aversive opponency links with behavioural economics

12 Computation and the Brain statistical computations –representation from density estimation (Terry) –combining uncertain information over space, time, modalities for sensory/memory inference –learning as a hierarchical Bayesian problem –learning as a filtering problem control theoretic computations –optimising rewards, punishments –homeostasis/allostasis –exploration/exploitation trade-offs

13 Uncertainty Computational functions of uncertainty: weaken top-down influence over sensory processing promote learning about the relevant representations expected uncertainty from known variability or ignorance We focus on two different kinds of uncertainties: unexpected uncertainty due to gross mismatch between prediction and observation ACh NE

14 Norepinephrine vigilance reversals modulates plasticity? exploration? scalar

15 Aston-Jones: Target Detection detect and react to a rare target amongst common distractors elevated tonic activity for reversal activated by rare target (and reverses) not reward/stimulus related? more response related?

16 Vigilance Task variable time in start η controls confusability one single run cumulative is clearer exact inference effect of 80% prior

17 Phasic NE NE reports uncertainty about current state state in the model, not state of the model divisively related to prior probability of that state NE measured relative to default state sequence start → distractor temporal aspect - start → distractor structural aspect target versus distractor

18 Phasic NE onset response from timing uncertainty (SET) growth as P( target )/0.2 rises act when P( target )=0.95 stop if P( target )=0.01 arbitrarily set NE=0 after 5 timesteps (small prob of reflexive action)

19 Four Types of Trial 19% 1.5% 1% 77% fall is rather arbitrary

20 Response Locking slightly flatters the model – since no further response variability

21 Interrupts/Resets (SB) LC PFC/ACC

22 Active Issues approximate inference strategy interaction with expected uncertainty (ACh) other representations of uncertainty finer gradations of ignorance

23 Computation and the Brain statistical computations –representation from density estimation (Terry) –combining uncertain information over space, time, modalities for sensory/memory inference –learning as a hierarchical Bayesian problem –learning as a filtering problem control theoretic computations –optimising rewards, punishments –homeostasis/allostasis –exploration/exploitation trade-offs

24 general: excitability, signal/noise ratios specific: prediction errors, uncertainty signals Computational Neuromodulation

25 Learning and Inference Learning: predict; control ∆ weight  (learning rate) x (error) x (stimulus) –dopamine phasic prediction error for future reward –serotonin phasic prediction error for future punishment –acetylcholine expected uncertainty boosts learning –norepinephrine unexpected uncertainty boosts learning

26 Learning and Inference ACh expected uncertainty top-down processing bottom-up processing sensory inputs cortical processing context NE unexpected uncertainty prediction, learning,...

27 High Pain Low Pain Temporal Difference Prediction Error predict sum future pain: TD error ∆ weight  (learning rate) x (error) x (stimulus)

28 High Pain Low Pain Prediction error TD error Temporal Difference Prediction Error Value

29 TD model ? A – B – HIGH C – D – LOW C – B – HIGH A – B – HIGH A – D – LOW C – D – LOW A – B – HIGH A – B – HIGH C – D – LOW C – B – HIGH Brain responses Prediction error experimental sequence….. MR scanner Ben Seymour; John O’Doherty Temporal Difference Prediction Error

30 TD prediction error: ventral striatum Z=-4R

31 Temporal Difference Values right anterior insula dorsal raphe?

32 Rewards rather than Punishments no predictionprediction, rewardprediction, no reward TD error V(t) R RL dopamine cells in VTA/SNc Schultz et al

33 TD Prediction Errors computation:dynamic programming and optimal control algorithm:ongoing error in predictions of the future implementation: –dopamine: phasic prediction error for reward; tonic punishment –serotonin:phasic prediction error for punishment; tonic reward evident in VTA; striatum; raphe? next: action; motivation; addiction; misbehavior

34 Two Cohenesque Theories Qualitative (AJ): exploration v exploitation –high tonic mode involves labile attention –search for better options –important if short term reward rate is below par –implemented by changed brittleness? Quantitative (EB): gain change in decision nets –NE controls balance of recurrence/bottom-up –implements changed S/N ratio with target –detect to detect –barely any benefit –why only for targets?

35 Task Difficulty set η=0.65 rather than information accumulates over a longer period hits more affected than cr’s timing not quite right

36 Intra-trial Uncertainty phasic NE as unexpected state change within a model relative to prior probability; against default interrupts (resets) ongoing processing tie to ADHD? close to alerting (AJ) – but not necessarily tied to behavioral output (onset rise) close to behavioural switching (PR) – but not DA farther from optimal inference (EB) phasic ACh: aspects of known variability within a state?

37 Where Next dopamine –tonic release and vigour –appetitive misbehaviour and hyperbolic discounting –actions and habits –psychosis serotonin –aversive misbehaviour and psychiatry norepinephrine –stress, depression and beyond

38 ACh & NE have distinct behavioral effects: ACh boosts learning to stimuli with uncertain consequences NE boosts learning upon encountering global changes in the environment (e.g. Bear & Singer, 1986; Kilgard & Merzenich, 1998) ACh & NE have similar physiological effects suppress recurrent & feedback processing enhance thalamocortical transmission boost experience-dependent plasticity (e.g. Gil et al, 1997) (e.g. Kimura et al, 1995; Kobayashi et al, 2000) Experimental Data (e.g. Bucci, Holland, & Gallagher, 1998) (e.g. Devauges & Sara, 1990)

39 Model Schematics ACh expected uncertainty top-down processing bottom-up processing sensory inputs cortical processing context NE unexpected uncertainty prediction, learning,...

40 Attention attentional selection for (statistically) optimal processing, above and beyond the traditional view of resource constraint sensory input Example 1: Posner’s Task stimulu s locatio n cue sensory input cue high validity low validity stimulu s locatio n (Phillips, McAlonan, Robb, & Brown, 2000) cue targe t respon se s 0.1s 0.15s generalize to the case that cue identity changes with no notice

41 Formal Framework cues: vestibular, visual,... target: stimulus location, exit direction... variability in quality of relevant cue variability in identity of relevant cue ACh NE Sensory Information avoid representing full uncertainty

42 Simulation Results: Posner’s Task increase ACh validity effect % normal level decrease ACh % normal level vary cue validity  vary ACh fix relevant cue  low NE nicotine validity effect concentration scopolamine (Phillips, McAlonan, Robb, & Brown, 2000)

43 Maze Task example 2: attentional shift reward cue 1 cue 2 reward cue 1 cue 2 relevant irrelevant relevant (Devauges & Sara, 1990) no issue of validity

44 Simulation Results: Maze Navigation fix cue validity  no explicit manipulation of ACh change relevant cue  NE % Rats reaching criterion No. days after shift from spatial to visual task % Rats reaching criterion No. days after shift from spatial to visual task experimental data model data (Devauges & Sara, 1990)

45 Simulation Results: Full Model true & estimated relevant stimuli neuromodulation in action trials validity effect (VE)

46 Simulated Psychopharmacology 50% NE 50% ACh/NE ACh compensation NE can nearly catch up

47 Summary single framework for understanding ACh, NE and some aspects of attention ACh/NE as expected/unexpected uncertainty signals experimental psychopharmacological data replicated by model simulations implications from complex interactions between ACh & NE predictions at the cellular, systems, and behavioral levels activity vs weight vs neuromodulatory vs population representations of uncertainty