Marius Usher, Phil Holmes, Juan Gao, Bill Newsome and Alan Rorie

Slides:

Advertisements

Similar presentations

BPT2423 – STATISTICAL PROCESS CONTROL

Advertisements

Quasi-Continuous Decision States in the Leaky Competing Accumulator Model Jay McClelland Stanford University With Joel Lachter, Greg Corrado, and Jim Johnston.

Decision Dynamics and Decision States: the Leaky Competing Accumulator Model Psychology 209 March 4, 2013.

HON207 Cognitive Science Sequential Sampling Models.

© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 5 Making Systematic Observations.

From T. McMillen & P. Holmes, J. Math. Psych. 50: 30-57, MURI Center for Human and Robot Decision Dynamics, Sept 13, Phil Holmes, Jonathan.

Inference in Dynamic Environments Mark Steyvers Scott Brown UC Irvine This work is supported by a grant from the US Air Force Office of Scientific Research.

Distinguishing Evidence Accumulation from Response Bias in Categorical Decision-Making Vincent P. Ferrera 1,2, Jack Grinband 1,2, Quan Xiao 1,2, Joy Hirsch.

Theory of Decision Time Dynamics, with Applications to Memory.

STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)

Decision Making Theories in Neuroscience Alexander Vostroknutov October 2008.

Dynamic Decision Making in Complex Task Environments: Principles and Neural Mechanisms Annual Workshop Introduction August, 2008.

Dynamic Decision Making in Complex Task Environments: Principles and Neural Mechanisms Progress and Future Directions November 17, 2009.

BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity

Neural Modeling - Fall NEURAL TRANSFORMATION Strategy to discover the Brain Functionality Biomedical engineering Group School of Electrical Engineering.

BCS547 Neural Decoding.

The Computing Brain: Focus on Decision-Making

What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

Decision Dynamics and Decision States in the Leaky Competing Accumulator Model Jay McClelland Stanford University With Juan Gao, Marius Usher and others.

Dynamics of Reward and Stimulus Information in Human Decision Making Juan Gao, Rebecca Tortell & James L. McClelland With inspiration from Bill Newsome.

Psychology and Neurobiology of Decision-Making under Uncertainty Angela Yu March 11, 2010.

Dynamics of Reward Bias Effects in Perceptual Decision Making Jay McClelland & Juan Gao Building on: Newsome and Rorie Holmes and Feng Usher and McClelland.

Optimal Decision-Making in Humans & Animals Angela Yu March 05, 2009.

Bayesian inference in neural networks

PSY 626: Bayesian Statistics for Psychological Science

Mechanisms of Simple Perceptual Decision Making Processes

Deep Feedforward Networks

Dynamics of Reward Bias Effects in Perceptual Decision Making

Graduate School of Business Leadership

Jay McClelland Stanford University

Piercing of Consciousness as a Threshold-Crossing Operation

Bayesian inference in neural networks

Dynamical Models of Decision Making Optimality, human performance, and principles of neural information processing Jay McClelland Department of Psychology.

Volume 53, Issue 1, Pages 9-16 (January 2007)

PSY 626: Bayesian Statistics for Psychological Science

A Classical Model of Decision Making: The Drift Diffusion Model of Choice Between Two Alternatives At each time step a small sample of noisy information.

On the Nature of Decision States: Theory and Data

Dynamical Models of Decision Making Optimality, human performance, and principles of neural information processing Jay McClelland Department of Psychology.

Decision Making during the Psychological Refractory Period

Using Time-Varying Motion Stimuli to Explore Decision Dynamics

Human Reward / Stimulus/ Response Signal Experiment: Data and Analysis

Recency vs Primacy -- an ongoing project

A Vestibular Sensation: Probabilistic Approaches to Spatial Perception

Choice Certainty Is Informed by Both Evidence and Decision Time

Braden A. Purcell, Roozbeh Kiani Neuron

Probabilistic Population Codes for Bayesian Decision Making

Banburismus and the Brain

C. Shawn Green, Alexandre Pouget, Daphne Bavelier Current Biology

Decision Making as a Window on Cognition

Volume 71, Issue 4, Pages (August 2011)

Volume 24, Issue 13, Pages (July 2014)

Volume 36, Issue 5, Pages (December 2002)

Jack Grinband, Joy Hirsch, Vincent P. Ferrera Neuron

Jay and Juan building on Feng and Holmes

Nicholas J. Priebe, David Ferster Neuron

Interaction of Sensory and Value Information in Decision-Making

Ralf M. Haefner, Pietro Berkes, József Fiser Neuron

Franco Pestilli, Marisa Carrasco, David J. Heeger, Justin L. Gardner

ཡུལ་རྟོགས་ཀྱི་དཔེ་གཟུགས་ངོ་སྤྲོད།

Uma R. Karmarkar, Dean V. Buonomano Neuron

Fast Sequences of Non-spatial State Representations in Humans

Redmond G. O’Connell, Michael N. Shadlen, KongFatt Wong-Lin, Simon P

Volume 75, Issue 5, Pages (September 2012)

Franco Pestilli, Marisa Carrasco, David J. Heeger, Justin L. Gardner

The Normalization Model of Attention

Analysis Assumptions -x m - m + c

Encoding of Stimulus Probability in Macaque Inferior Temporal Cortex

Volume 75, Issue 5, Pages (September 2012)

Richard A. Andersen, He Cui Neuron

Volume 27, Issue 6, Pages (March 2017)

Presentation transcript:

Marius Usher, Phil Holmes, Juan Gao, Bill Newsome and Alan Rorie Dynamical Models of Decision Making Optimality, human and monkey performance, and the Leaky, Competing Accumulator Model Jay McClelland Department of Psychology and Center for Mind Brain and Computation Stanford University With Marius Usher, Phil Holmes, Juan Gao, Bill Newsome and Alan Rorie

Is the rectangle longer toward the northwest or longer toward the northeast?

Longer toward the Northeast! 2.00” 1.99”

An Abstract Statistical Theory How do you decide if an urn contains more black balls or white balls? We assume you can only draw balls one at a time and want to stop as soon as you have enough evidence to achieve a desired level of accuracy. You know in advance that the proportion of black balls is either p or 1-p.

Optimal Policy (Sequential Probability Ratio Test) Reach in and grab a ball. Keep track of difference = #black - #white. A given difference d occurs with probability pd/[pd +(1-p)d] E.g., if p is either .9 or .1: With n = 1, we’ll be right 9 times out of 10 if we say “black” With n = 2, we’ll be right 81 times out of 82 if we say “black” So, respond when the difference d reaches a criterion value +C or -C, set based on p to produce a desired level of accuracy. This policy produces fastest possible decisions on average for a given level of accuracy.

The Drift Diffusion Model Continuous version of the SPRT A random decision variable x is initialized at 0. At each time step a small random step is taken. Mean direction of steps is +m for one alternative, –m for the other. Size of m corresponds to discriminability of the alternatives. Start at 0, respond when criterion is reached. Alternatively, in ‘time controlled’ tasks, respond when a response signal is given.

Two Problems with the DDM Easy Accuracy should gradually improve toward 100% correct, even for very hard discriminations, as stimulus processing time is allowed to increase, but this is not what is observed in data. The model also predicts both correct and incorrect RT’s will be equally fast for a given level of difficulty, but incorrect RT’s are generally slower than correct RT’s. These findings present problems for models based on DDM/SPRT in both the psychological and neuroscience literature (e.g., Mazurek et al, 2003). Prob. Correct Hard Errors Correct Responses RT Hard -> Easy

Goals of Our Research We seek an understanding of how dynamical neural processes in the brain give rise the patterns of behavior seen in decision making situations. We seek a level of analysis in describing the neural processes that can be linked to behavior on the one hand and the details of neural processes on the other. We seek to understand individual differences in processing and the brain basis of these differences. Future direction: Can processing dynamics be optimized by training?

Usher and McClelland (2001) Leaky Competing Accumulator Model Addresses the process of deciding between two alternatives based on neural population variables subject to external input (r1 + r2 = 1) with leakage, self-excitation, mutual inhibition, and noise: dx1/dt = r1-l(x1)+af(x1)–bf(x2)+x1 dx2/dt = r2-l(x2)+af(x2)–bf(x1)+x2 The difference x1 – x2 reduces to the DDM under certain conditions, but produces a range of interesting features under other conditions. - - + +

Dynamics of winning and loosing populations in the brain and the LCAM

Wong & Wang (2006) ~Usher & McClelland (2001)

Usher and McClelland (2001) Leaky Competing Accumulator Model Addresses the process of deciding between two alternatives based on external input (r1 + r2 = 1) with leakage l, self-excitation a, mutual inhibition b, and noise x: dx1/dt = r1-l(x1)+af(x1)–bf(x2)+x1 dx2/dt = r2-l(x2)+af(x2)–bf(x1)+x2 How does the model behave as a function of different choices of parameters? - - + +

Usher and McClelland (2001) Leaky Competing Accumulator Model Consider the linearized version: dx1/dt = r1-l(x1)+a(x1)–b(x2)+x1 dx2/dt = r2-l(x2)+a(x2)–b(x1)+x2 Then take the difference: dxd/dt = rd-l(xd)+a(xd)+b(xd)+xd Let k = l-a, referred to as the ‘net leakage’. Then we have: dxd/dt = rd-(k-b)(xd)+xd We now consider several cases. - - + +

Time-accuracy curves for different |k-b|, assuming response 1 is chosen at iteration n if xd > 0 |k-b| = .2; □, b=0; ◊, b=.4 |k-b| = .4 Solid lines: Analytic solutions using linear model Data points: Simulation results with threshold nonlinearity: f(xi) = [xi]+

Underlying differences with different k (= l–a) and b. |k-b|= Underlying differences with different k (= l–a) and b. |k-b|=.2 in all cases

Testing the model Quantitative tests: Qualitative test: Detailed fits to various aspects of experimental data, including shapes of ‘time-accuracy curves’ Qualitative test: Understanding the dynamics of the model leads to a novel prediction

LCAM vs. DDV In the LCAM, accuracy is limited by ‘imperfect integration’ According to Ratcliff’s well-known version of the DDM, accuracy is limited by between trial variability in the direction of drift. These assumptions produce subtle differences; our data slightly favors the LCAM d(x)/dt = rd+(k–b)(x)+x2

* *OU = analytic approx to LCAM; DDV = DDM w/ between trial drift variance (Ratcliff, 1978)

Assessing Integration Dynamics Participant sees stream of S’s and H’s Must decide which is predominant 50% of trials last ~500 msec, allow accuracy assessment 50% are ~250 msec, allow assessment of dynamics These streams contain an equal # of S’s and H’s But there are clusters bunched together at the end (0, 2 or 4). Leak-dominant Inhibition-dominant Favored early Favored late

S3 Large individual differences. Subjects show both kinds of biases. The less the bias, the higher the accuracy, as predicted. S3

Extension to N alternatives Extension to n alternatives is very natural. dxi/dt = ri-l(xi)+af(xi)–b(Si≠jf(xj))+xi Model accounts quite well for Hick’s law (RT increases with log n alternatives), assuming that threshold is raised with n to maintain equal accuracy in all conditions. Use of non-linear activation function increases efficiency when there are only a few alternatives ‘in contention’ given the stimulus.

Integration of reward and motion information (Rorie & Newsome) Monkey’s choices reflect a beautifully systematic combination of the effects of stimulus coherence and reward. Bias is slightly larger than optimal, but the cost to the monkey is small.

Population Response of LIP Neurons in four Reward Conditions (one Monkey, 51 Neurons)

Some Open Questions How localized vs. distributed are the neural populations involved in decisions? What does it mean in the brain to pass the threshold for response initiation? How are dynamics of the influence of reward bias tuned to allow a practiced participant to optimize decisions, combining reward and stimulus information in the right mix to achieve (near) optimality?

Human Experiment Examining Reward Bias Effect at Different Time Points after Target Onset 750 msec Response Response Signal Reward Cue Stimulus Response Window 0-2000 msec Reward cue occurs 750 msec before stimulus. Small arrow head visible for 250 msec. Only biased reward conditions (2 vs 1 and 1 vs 2) are considered. Stimuli are rectangles shifted 1,3, or 5 pixels L or R of fixation Response signal (tone) occurs at 10 different lags : 0 75 150 225 300 450 600 900 1200 2000 Participant receives reward if response occurs within 250 msec of response signal and is correct. Participants were run for 15-25 sessions to provide stable data. Data are from later sessions in which the effect of reward appeared to be fairly stable.

A participant with very little reward bias Top panel shows probability of response giving larger reward as a function of actual response time for combinations of: Stimulus shift (1 3 5) pixels Reward-stimulus compatibility Lower panel shows data transformed to z scores, and corresponds to the theoretical construct: mean(x1(t)-x2(t))+bias(t) sd(x1(t)-x2(t)) where x1 represents the state of the accumulator associated with greater reward, x2 the same for lesser reward, and S is thought to choose larger reward if x1(t)-x2(t)+bias(t) > 0.

Participants Showing Reward Bias

Actual vs. Optimal bias for three S’s Except for sl, all participants start with a high bias, then level off, as expected under the optimal policy. All participants are under-biased for short lags At longer lags, some are under, some are over, and some are ~optimal.

Results of Reward Bias Analysis (with Juan Gao) Within the LCAM, the data are not consistent with bias as setting the starting state of a leaky information accumulation process. Nor are data consistent with treating the bias as an additional input to the competing accumulators. The ‘effective bias’ starts large and falls to a fixed residual, but details of how this is implemented are still under investigation.

Some Take-Home Messages and a Question Participants vary in dynamics of both stimulus and reward processing. Some are capable of achieving far closer approximations to optimality than others. Question: Are these fixed characteristics of individuals or can we train optimization of the dynamics of decision making?