Computational Neuromodulation Peter Dayan Gatsby Computational Neuroscience Unit University College London Nathaniel Daw Sham Kakade Read Montague John.

Computational Neuromodulation Peter Dayan Gatsby Computational Neuroscience Unit University College London Nathaniel Daw Sham Kakade Read Montague John O’Doherty Wolfram Schultz Ben Seymour Terry Sejnowski Angela Yu

2 5. Diseases of the Will Contemplators Bibliophiles and Polyglots Megalomaniacs Instrument addicts Misfits Theorists

3 There are highly cultivated, wonderfully endowed minds whose wills suffer from a particular form of lethargy. Its undeniable symptoms include a facility for exposition, a creative and restless imagination, an aversion to the laboratory, and an indomitable dislike for concrete science and seemingly unimportant data… When faced with a difficult problem, they feel an irresistible urge to formulate a theory rather than question nature. As might be expected, disappointments plague the theorist… Theorists

4 Computation and the Brain statistical computations –representation from density estimation (Terry) –combining uncertain information over space, time, modalities for sensory/memory inference –learning as a hierarchical Bayesian problem –learning as a filtering problem control theoretic computations –optimising rewards, punishments –homeostasis/allostasis

5 Conditioning Ethology Psychology –classical/operant conditioning Computation –dynamic programming –Kalman filtering Algorithm –TD/delta rules Neurobiology neuromodulators; amygdala; OFC; nucleus accumbens; dorsal striatum prediction: of important events control: in the light of those predictions policy evaluation policy improvement

6 Dopamine no predictionprediction, rewardprediction, no reward R R L Schultz et al RLR drug addiction, self-stimulation effect of antagonists effect on vigour link to action `scalar’ signal

7 Prediction, but What Sort? Sutton: predict sum future reward TD error

8 Rewards rather than Punishments no predictionprediction, rewardprediction, no reward TD error V(t) R RL dopamine cells in VTA/SNc Schultz et al

9 Prediction, but What Sort? Sutton: Watkins: policy evaluation predict sum future reward TD error

10 Policy Improvement Sutton: define (x;M) do R-M on: uses the same TD error Watkins: value iteration with

11 Active Issues exploration/exploitation model-based (PFC)/cached (striatal) methods motivational influences vigour hierarchical control (PFC) hyperbolic discounting, Pavlovian misbehavior and ‘the will’ representational learning appetitive/aversive opponency links with behavioural economics

12 Computation and the Brain statistical computations –representation from density estimation (Terry) –combining uncertain information over space, time, modalities for sensory/memory inference –learning as a hierarchical Bayesian problem –learning as a filtering problem control theoretic computations –optimising rewards, punishments –homeostasis/allostasis –exploration/exploitation trade-offs

13 Uncertainty Computational functions of uncertainty: weaken top-down influence over sensory processing promote learning about the relevant representations expected uncertainty from known variability or ignorance We focus on two different kinds of uncertainties: unexpected uncertainty due to gross mismatch between prediction and observation ACh NE

14 Norepinephrine vigilance reversals modulates plasticity? exploration? scalar

15 Aston-Jones: Target Detection detect and react to a rare target amongst common distractors elevated tonic activity for reversal activated by rare target (and reverses) not reward/stimulus related? more response related?

16 Vigilance Task variable time in start η controls confusability one single run cumulative is clearer exact inference effect of 80% prior

17 Phasic NE NE reports uncertainty about current state state in the model, not state of the model divisively related to prior probability of that state NE measured relative to default state sequence start → distractor temporal aspect - start → distractor structural aspect target versus distractor

18 Phasic NE onset response from timing uncertainty (SET) growth as P( target )/0.2 rises act when P( target )=0.95 stop if P( target )=0.01 arbitrarily set NE=0 after 5 timesteps (small prob of reflexive action)

19 Four Types of Trial 19% 1.5% 1% 77% fall is rather arbitrary

20 Response Locking slightly flatters the model – since no further response variability

21 Interrupts/Resets (SB) LC PFC/ACC

22 Active Issues approximate inference strategy interaction with expected uncertainty (ACh) other representations of uncertainty finer gradations of ignorance

23 Computation and the Brain statistical computations –representation from density estimation (Terry) –combining uncertain information over space, time, modalities for sensory/memory inference –learning as a hierarchical Bayesian problem –learning as a filtering problem control theoretic computations –optimising rewards, punishments –homeostasis/allostasis –exploration/exploitation trade-offs

24 general: excitability, signal/noise ratios specific: prediction errors, uncertainty signals Computational Neuromodulation

25 Learning and Inference Learning: predict; control ∆ weight  (learning rate) x (error) x (stimulus) –dopamine phasic prediction error for future reward –serotonin phasic prediction error for future punishment –acetylcholine expected uncertainty boosts learning –norepinephrine unexpected uncertainty boosts learning

26 Learning and Inference ACh expected uncertainty top-down processing bottom-up processing sensory inputs cortical processing context NE unexpected uncertainty prediction, learning,...

27 High Pain Low Pain 0.81.0 0.81.0 0.2 Temporal Difference Prediction Error predict sum future pain: TD error ∆ weight  (learning rate) x (error) x (stimulus)

28 High Pain Low Pain 0.81.0 0.81.0 0.2 Prediction error TD error Temporal Difference Prediction Error Value

29 TD model ? A – B – HIGH C – D – LOW C – B – HIGH A – B – HIGH A – D – LOW C – D – LOW A – B – HIGH A – B – HIGH C – D – LOW C – B – HIGH Brain responses Prediction error experimental sequence….. MR scanner Ben Seymour; John O’Doherty Temporal Difference Prediction Error

30 TD prediction error: ventral striatum Z=-4R

31 Temporal Difference Values right anterior insula dorsal raphe?

32 Rewards rather than Punishments no predictionprediction, rewardprediction, no reward TD error V(t) R RL dopamine cells in VTA/SNc Schultz et al

33 TD Prediction Errors computation:dynamic programming and optimal control algorithm:ongoing error in predictions of the future implementation: –dopamine: phasic prediction error for reward; tonic punishment –serotonin:phasic prediction error for punishment; tonic reward evident in VTA; striatum; raphe? next: action; motivation; addiction; misbehavior

34 Two Cohenesque Theories Qualitative (AJ): exploration v exploitation –high tonic mode involves labile attention –search for better options –important if short term reward rate is below par –implemented by changed brittleness? Quantitative (EB): gain change in decision nets –NE controls balance of recurrence/bottom-up –implements changed S/N ratio with target –detect to detect –barely any benefit –why only for targets?

35 Task Difficulty set η=0.65 rather than 0.675 information accumulates over a longer period hits more affected than cr’s timing not quite right

36 Intra-trial Uncertainty phasic NE as unexpected state change within a model relative to prior probability; against default interrupts (resets) ongoing processing tie to ADHD? close to alerting (AJ) – but not necessarily tied to behavioral output (onset rise) close to behavioural switching (PR) – but not DA farther from optimal inference (EB) phasic ACh: aspects of known variability within a state?

37 Where Next dopamine –tonic release and vigour –appetitive misbehaviour and hyperbolic discounting –actions and habits –psychosis serotonin –aversive misbehaviour and psychiatry norepinephrine –stress, depression and beyond

38 ACh & NE have distinct behavioral effects: ACh boosts learning to stimuli with uncertain consequences NE boosts learning upon encountering global changes in the environment (e.g. Bear & Singer, 1986; Kilgard & Merzenich, 1998) ACh & NE have similar physiological effects suppress recurrent & feedback processing enhance thalamocortical transmission boost experience-dependent plasticity (e.g. Gil et al, 1997) (e.g. Kimura et al, 1995; Kobayashi et al, 2000) Experimental Data (e.g. Bucci, Holland, & Gallagher, 1998) (e.g. Devauges & Sara, 1990)

39 Model Schematics ACh expected uncertainty top-down processing bottom-up processing sensory inputs cortical processing context NE unexpected uncertainty prediction, learning,...

40 Attention attentional selection for (statistically) optimal processing, above and beyond the traditional view of resource constraint sensory input Example 1: Posner’s Task stimulu s locatio n cue sensory input cue high validity low validity stimulu s locatio n (Phillips, McAlonan, Robb, & Brown, 2000) cue targe t respon se 0.2-0.5s 0.1s 0.15s generalize to the case that cue identity changes with no notice

41 Formal Framework cues: vestibular, visual,... target: stimulus location, exit direction... variability in quality of relevant cue variability in identity of relevant cue ACh NE Sensory Information avoid representing full uncertainty

42 Simulation Results: Posner’s Task increase ACh validity effect % normal level 100120140 decrease ACh % normal level 1008060 vary cue validity  vary ACh fix relevant cue  low NE nicotine validity effect concentration scopolamine (Phillips, McAlonan, Robb, & Brown, 2000)

43 Maze Task example 2: attentional shift reward cue 1 cue 2 reward cue 1 cue 2 relevant irrelevant relevant (Devauges & Sara, 1990) no issue of validity

44 Simulation Results: Maze Navigation fix cue validity  no explicit manipulation of ACh change relevant cue  NE % Rats reaching criterion No. days after shift from spatial to visual task % Rats reaching criterion No. days after shift from spatial to visual task experimental data model data (Devauges & Sara, 1990)

45 Simulation Results: Full Model true & estimated relevant stimuli neuromodulation in action trials validity effect (VE)

46 Simulated Psychopharmacology 50% NE 50% ACh/NE ACh compensation NE can nearly catch up

47 Summary single framework for understanding ACh, NE and some aspects of attention ACh/NE as expected/unexpected uncertainty signals experimental psychopharmacological data replicated by model simulations implications from complex interactions between ACh & NE predictions at the cellular, systems, and behavioral levels activity vs weight vs neuromodulatory vs population representations of uncertainty

Computational Neuromodulation Peter Dayan Gatsby Computational Neuroscience Unit University College London Nathaniel Daw Sham Kakade Read Montague John.

Similar presentations

Presentation on theme: "Computational Neuromodulation Peter Dayan Gatsby Computational Neuroscience Unit University College London Nathaniel Daw Sham Kakade Read Montague John."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computational Neuromodulation Peter Dayan Gatsby Computational Neuroscience Unit University College London Nathaniel Daw Sham Kakade Read Montague John.

Similar presentations

Presentation on theme: "Computational Neuromodulation Peter Dayan Gatsby Computational Neuroscience Unit University College London Nathaniel Daw Sham Kakade Read Montague John."— Presentation transcript:

Similar presentations

About project

Feedback