Computational models of cognitive control (II) Matthew Botvinick Princeton Neuroscience Institute and Department of Psychology, Princeton University
Banishing the homunculus
Decision-making in control:
Banishing the homunculus Decision-making in control: Not only, “How does control shape decision-making?”
Banishing the homunculus Decision-making in control: Not only, “How does control shape decision-making?” But also, “How are ‘control states’ selected?”
Banishing the homunculus Decision-making in control: Not only, “How does control shape decision-making?” But also, “How are ‘control states’ selected?” And, “How are they updated over time?”
1. Routine sequential action Botvinick & Plaut, Psychological Review, 2004 Botvinick, Proceedings of the Royal Society, B, Botvinick, TICS, 2008
‘Routine sequential action’ Action on familiar objects Well-defined sequential structure Concrete goals Highly routine Everyday tasks
Computational models of cognitive control (II) Matthew Botvinick Princeton Neuroscience Institute and Department of Psychology, Princeton University ?!
Hierarchical structure MAKE INSTANT COFFEE ADD GROUNDSADD CREAMADD SUGAR SCOOP ADD SUGAR FROM SUGARPACK ADD SUGAR FROM SUGARBOWL PICK-UPPUT-DOWNPOURSTIRTEAR
Hierarchical models of action ADD SUGAR FROM SUGARBOWL / PACKET MAKE INSTANT COFFEE ADD GROUNDS ADD CREAM ADD SUGAR PICK-UPPUT-DOWNPOURSTIRTEAR SCOOP Hierarchical structure of task built directly into architecture (e.g.,Cooper & Shallice, 2000; Estes, 1972; Houghton, 1990; MacKay, 1987, Rumelhart & Norman, 1982) Schemas as primitive elements
p t+2 a t+2 s t+2 An alternative approach ptpt atat stst p t+1 a t+1 s t+1
ptpt atat stst p t+1 a t+1 s t+1 p t+2 a t+2 s t+2 p, s, a = patterns of activation over simple processing units Weighted, excitatory/inhibitory connections Weights adjusted through gradient-descent learning in target task domains
Recurrent neural networks Feedback as well as feedforward connections Allow preservation of information over time Demonstrated capacity to learn sequential behaviors (e.g., Cleermans, 1993; Elman, 1990)
environment action internal representation perceptual input The model
Fixate(Blue)Fixate(Green)Fixate(Top) PickUpFixate(Table)PutDown Fixate(Green)PickUp Ballard, Hayhoe, Pook & Rao, (1996). BBS.
environment action perceptual input viewed object held object Model architecture manipulative perceptual
Routine sequential action: Task domain Hierarchically structured Actions/subtasks may appear in multiple contexts Environmental cues alone sometimes insufficient to guide action selection Subtasks that may be executed in variable order Subtask disjunctions
drink steep tea cream ` drink grounds Start End
Representations sugar - packet Manipulative actions Perceptual actions
Input Target/ output
Input Target/ output
Input Target/ output
Input Target/ output
Input Target/ output
Input Target/ output
Input Target/ output
Model behavior
15%18% 12%10% 20%25% cream drink grounds Start End cream drink grounds Start End drink steep tea Start End cream drink grounds Start End drink steep tea Start End
Slips of action (after Reason) Occur at decision (or fork) points Sequence errors involve subtask omissions, repetitions, and lapses Lapses show effect of relative task frequency
environment action perceptual input viewed object held object manipulative perceptual
Sample of behavior: pick-up coffee-pack pull-open coffee-pack pour coffee-pack into cup put-down coffee-pack pick-up spoon stir cup put-down spoon pick-up sugar-pack tear-open sugar-pack pour sugar-pack into cup put-down sugar-pack pick-up spoon stir cup put-down spoon pick-up cup* sip cup say-done grounds sugar (pack) drink cream omitted
subtask 1 subtask 2 subtask 3 subtask 4 Step in coffee sequence Percentage of trials error-free 100 0
Noise level (variance) Percentage of trials Omissions / anticipations Repetitions / perseverations Intrusions / lapses
steep tea sugar cream * :11:11:5 Tea : coffee Odds of lapse into coffee-making drink steep tea cream drink grounds Start End
Action disorganization syndrome (after Schwartz and colleagues) Fragmentation of sequential structure (independent actions) Specific error types Omission effect
environment action perceptual input viewed object held object manipulative perceptual
Sample of behavior: pick-up coffee-pack pull-open coffee-pack put-down coffee-pack* pick-up coffee-pack pour coffee-pack into cup put-down coffee-pack pick-up spoon stir cup put-down spoon pick-up sugar-pack tear-open sugar-pack pour sugar-pack into cup put-down sugar-pack pick-up cup* put-down cup pull-off sugarbowl lid* put-down lid pick-up spoon scoop sugarbowl with spoon put-down spoon* pick-up cup* sip cup say-done sugar repeated cream omitted disrupted subtask subtask fragment
Empirical data: Schwartz, et al. Neuropsychology, Noise (variance) Proportion Independents
From: Schwartz, et al. Neuropsychology, Noise (variance) Errors (per opportunity) Sequence errors Omission errors
Internal representations
cream drink grounds drink steep tea
cream drink grounds drink steep tea
Etiology of a slip drink steep tea
Tea representation Coffee representation
tea rep’n coffee rep’n
Coffee more frequent coffee tea Tea more frequent tea coffee
Input Peripheral (input) Output Peripheral (Output) Intermediate (input) Intermediate (Output) Apex
Store-Ignore-Recall (SIR) task R “nine” “eight” “four” “seven” “eight”
Input Peripheral (input) Output Peripheral (Output) Intermediate (input) Intermediate (Output) Apex
Input Peripheral (input) Output Peripheral (Output) Intermediate (input) Intermediate (Output) Apex
Conclusions Architectural hierarchy is not necessary for hierarchically structured behavior (or to understand action errors). Recurrent connectivity combined with graded, distributed representation is sufficient. Nonetheless, if architectural hierarchy is present, it can lead to a graded division of labor, according to which units furthest from sensory and motor peripheries specialize in coding information pertaining to temporal context. This may give us a way of explaining why the prefrontal cortex seems to be involved in routine sequential behavior.
2. Hierarchical reinforcement learning Botvinick, Niv & Barto, Cognition, in press. Botvinick, TICS, 2008
Reinforcement Learning 1. States 2. Actions 3. Transition function 4. Reward function Policy?
Action strengths State values Prediction error
Adapted from Sutton et al., AI, 1999
O Hierarchical Reinforcement Learning O: I, , (After Sutton, Precup & Singh, 1999) GREENRED “green” “red” Color-naming Word-reading Adapted from Cohen et al., Psych. Rev., 1990 “Policy abstraction”
OOO OOO OOO
From Humpheys & Forde, Cog. Neuropsych., 2001
1 2
cf. Luchins, Psychol. Monol., 1942
Genetic algorithms (Elfwing, 2003) Frequently visited states (Picket & Barto, 2002; Thrun & Schwartz, 1996) Graph partitioning (Menache et al., 2002; Mannor et al., 2004; Simsek et al., 2005) Intrinsic motivation (Simsek & Barto, 2005) Other possibilities: Impasses (Soar); Social transmission The Option Discovery Problem
Extension 1: Support for representing option identifiers 1
White & Wise, Exp Br Res, 1999 (See also: Assad, Rainer & Miller, 2000; Bunge, 2004; Hoshi, Shima & Tanji, 1998; Johnston & Everling, 2006; Wallis, Anderson & Miller, 2001; White, 1999…)
Miller & Cohen, Ann. Rev. Neurosci, 2001
From Curtis & D’Esposito, TICS, 2003, after Funahashi et al., J. Neurophysiol,1989.
Koechlin, Attn & Perf., 2008
2 Extension 2: Option-specific policies
O’Reilly & Frank, Neural Computation, 2006
Aldridge & Berridge, J Neurosci, 1998
3 Extension 3: Option-specific state values
Schoenbaum, et al. J Neurosci See also: O’Doherty, Critchley, Deichmann, Dolan, 2003
4 Extension 4: Temporal scope of the prediction error
Schoenbaum, Roesch & Stalnaker, TICS, 2006
Roesch, Taylor & Schoenbaum, Neuron, 2006
Daw, NIPS, 2003
3. Goal-directed behavior Botvinick & An, submitted.
Niv, Joel & Dayan, TICS (2006) T R
T R 4023
T R 4023
T R
T R 4023
T R 4023
Blodgett, 1929 Latent learning
Blodgett, 1929 Latent learning
Tolman & Honzik, 1930 Detour behavior
Tolman & Honzik, 1930 Detour behavior
Tolman & Honzik, 1930 Detour behavior
Niv, Joel & Dayan, TICS (2006) Devaluation
White & Wise, Exp Br Res, 1999 (See also: Assad, Rainer & Miller, 2000; Bunge, 2004; Hoshi, Shima & Tanji, 1998; Johnston & Everling, 2006; Wallis, Anderson & Miller, 2001; White, 1999; Miller & Cohen, 2001…)
Miller & Cohen, Ann. Rev. Neurosci, 2001
Padoa-Schioppa & Assad, Nature, 2006
Gopnik, et al., Psych Rev, 2004
R T
?
Redish data… Johnson & Redish, J. Neurosci., 2007
,
,
Botvinick & An, submitted
Cf. Tatman & Shachter, 1990
Cf. Verma & Rao, 2006
Policy query
Reward query
Policy query Reward query
Policy query Reward query
+1 / 0 +2 / -3
Collaborators James An Andy Barto Todd Braver Deanna Barch Jonathan Cohen Andrew Ledvina Joseph McGuire David Plaut Yael Niv