Dynamics of Reward Bias Effects in Perceptual Decision Making Jay McClelland & Juan Gao Building on: Newsome and Rorie Holmes and Feng Usher and McClelland.

Slides:

Advertisements

Similar presentations

Adaptive Methods Research Methods Fall 2010 Tamás Bőhm.

Advertisements

Quasi-Continuous Decision States in the Leaky Competing Accumulator Model Jay McClelland Stanford University With Joel Lachter, Greg Corrado, and Jim Johnston.

Decision Dynamics and Decision States: the Leaky Competing Accumulator Model Psychology 209 March 4, 2013.

PSYCHOPHYSICS What is Psychophysics? Classical Psychophysics Thresholds Signal Detection Theory Psychophysical Laws.

CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.

1 Information Markets & Decision Makers Yiling Chen Anthony Kwasnica Tracy Mullen Penn State University This research was supported by the Defense Advanced.

8-2 Basics of Hypothesis Testing

Prediction and Change Detection Mark Steyvers Scott Brown Mike Yi University of California, Irvine This work is supported by a grant from the US Air Force.

From T. McMillen & P. Holmes, J. Math. Psych. 50: 30-57, MURI Center for Human and Robot Decision Dynamics, Sept 13, Phil Holmes, Jonathan.

Inference in Dynamic Environments Mark Steyvers Scott Brown UC Irvine This work is supported by a grant from the US Air Force Office of Scientific Research.

Distinguishing Evidence Accumulation from Response Bias in Categorical Decision-Making Vincent P. Ferrera 1,2, Jack Grinband 1,2, Quan Xiao 1,2, Joy Hirsch.

Programme to Support Pro-Poor Policy Development A partnership between the Presidency, Republic of South Africa and the European Union Explaining Education.

Theory of Decision Time Dynamics, with Applications to Memory.

Effects of Warning Validity and Proximity on Responses to Warnings Joachim Meyer, Israel HUMAN FACTORS, Vol. 43, No. 4 (2001)

An Integrated Model of Decision Making and Visual Attention Philip L. Smith University of Melbourne Collaborators: Roger Ratcliff, Bradley Wolfgang.

Lecture Slides Elementary Statistics Twelfth Edition

Theory of Probability Statistics for Business and Economics.

Signal Detection Theory October 10, 2013 Some Psychometrics! Response data from a perception experiment is usually organized in the form of a confusion.

VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,

Decision Making Theories in Neuroscience Alexander Vostroknutov October 2008.

Dynamic Decision Making in Complex Task Environments: Principles and Neural Mechanisms Annual Workshop Introduction August, 2008.

Dynamic Decision Making in Complex Task Environments: Principles and Neural Mechanisms Progress and Future Directions November 17, 2009.

Chapter 10 Verification and Validation of Simulation Models

Behavior Control of Virtual Vehicle

The Computing Brain: Focus on Decision-Making

Spatial coding of the Predicted Impact Location of a Looming* Object M. Neppi-Mòdona D. Auclair A.Sirigu J.-R. Duhamel.

What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

Decision Dynamics and Decision States in the Leaky Competing Accumulator Model Jay McClelland Stanford University With Juan Gao, Marius Usher and others.

Javad Azimi, Ali Jalali, Xiaoli Fern Oregon State University University of Texas at Austin In NIPS 2011, Workshop in Bayesian optimization, experimental.

Dynamics of Reward and Stimulus Information in Human Decision Making Juan Gao, Rebecca Tortell & James L. McClelland With inspiration from Bill Newsome.

Signal Detection Theory October 5, 2011 Some Psychometrics! Response data from a perception experiment is usually organized in the form of a confusion.

Psychology and Neurobiology of Decision-Making under Uncertainty Angela Yu March 11, 2010.

Does the brain compute confidence estimates about decisions?

Optimal Decision-Making in Humans & Animals Angela Yu March 05, 2009.

Mechanisms of Simple Perceptual Decision Making Processes

Dynamics of Reward Bias Effects in Perceptual Decision Making

Understanding Results

Jay McClelland Stanford University

Piercing of Consciousness as a Threshold-Crossing Operation

Contribution of spatial and temporal integration in heading perception

Chapter 10 Verification and Validation of Simulation Models

Dynamical Models of Decision Making Optimality, human performance, and principles of neural information processing Jay McClelland Department of Psychology.

A Classical Model of Decision Making: The Drift Diffusion Model of Choice Between Two Alternatives At each time step a small sample of noisy information.

David L. Barack, Steve W.C. Chang, Michael L. Platt Neuron

On the Nature of Decision States: Theory and Data

Dynamical Models of Decision Making Optimality, human performance, and principles of neural information processing Jay McClelland Department of Psychology.

Using Time-Varying Motion Stimuli to Explore Decision Dynamics

Human Reward / Stimulus/ Response Signal Experiment: Data and Analysis

Marius Usher, Phil Holmes, Juan Gao, Bill Newsome and Alan Rorie

Recency vs Primacy -- an ongoing project

Choice Certainty Is Informed by Both Evidence and Decision Time

Braden A. Purcell, Roozbeh Kiani Neuron

Nils Kolling, Marco Wittmann, Matthew F.S. Rushworth Neuron

October 6, 2011 Dr. Itamar Arel College of Engineering

Shunan Zhang, Michael D. Lee, Miles Munro

Banburismus and the Brain

A Role for the Superior Colliculus in Decision Criteria

Attentional Modulations Related to Spatial Gating but Not to Allocation of Limited Resources in Primate V1 Yuzhi Chen, Eyal Seidemann Neuron Volume.

Decision Making as a Window on Cognition

Volume 24, Issue 13, Pages (July 2014)

Jay and Juan building on Feng and Holmes

Interaction of Sensory and Value Information in Decision-Making

Neural Mechanisms of Speed-Accuracy Tradeoff

Redmond G. O’Connell, Michael N. Shadlen, KongFatt Wong-Lin, Simon P

Timescales of Inference in Visual Adaptation

Analysis Assumptions -x m - m + c

Volume 92, Issue 2, Pages (October 2016)

Matthew I Leon, Michael N Shadlen Neuron

Metacognitive Failure as a Feature of Those Holding Radical Beliefs

Presentation transcript:

Dynamics of Reward Bias Effects in Perceptual Decision Making Jay McClelland & Juan Gao Building on: Newsome and Rorie Holmes and Feng Usher and McClelland

Our Questions Can we trace the effect of reward bias on decision making over time? Can we determine what would be the optimal reward effect? Can we determine how well participants do at achieving optimality? Can we uncover the processing mechanisms that lead to the observed patterns of behavior?

Overview Experiment Results Optimality analysis Abstract (one-d) dynamical model Mechanistic (two-d) dynamical model

Human Experiment Examining Reward Bias Effect at Different Time Points after Target Onset Reward cue occurs 750 msec before stimulus. –Small arrow head visible for 250 msec. –Only biased reward conditions (2 vs 1 and 1 vs 2) are considered. Stimuli are rectangles shifted 1,3, or 5 pixels L or R of fixation Response signal (tone) occurs at 10 different lags : Participant receives reward if response occurs within 250 msec of response signal and is correct. -Participants were run for sessions to provide stable data. -Data are from later sessions in which the effect of reward appeared to be fairly stable msec Reward Cue Stimulus 750 msec Response Window Response Signal

tradeoff Sequential sampling models are really about speed- accuracy tradeoff Air Force has a strong interest in understanding dynamic decision making under time pressure – learn how the brain does it! Slide from Zhang 2007

A participant with very little reward bias Top panel shows probability of response giving larger reward as a function of actual response time for combinations of: Stimulus shift (1 3 5) pixels Reward-stimulus compatibility Lower panel shows data transformed to z scores, and corresponds to the theoretical construct: mean(x 1 (t)-x 2 (t))+bias(t) sd(x 1 (t)-x 2 (t)) where x 1 represents the state of the accumulator associated with greater reward, x 2 the same for lesser reward, and S is thought to choose larger reward if x 1 (t)-x 2 (t)+bias(t) > 0.

Participants Showing Reward Bias

Summary Initial bias is high, and tapers off over time, to a fixed low level. Questions –Is this reasonable? –How close to optimal is it? –Are some subjects more optimal that others?

Abstract optimality analysis

Assumptions (from Signal Detection Theory) At a given time, decision variable comes from one of two distributions, means - , + , same STD . [‘-’ is ‘consistent with high reward’] Choose High reward alternative (H) if x < X c, else choose low reward alternative (L). For three difficulty levels, means ±  i (i=1,2,3), with shared , same choice policy  x c 

X opt Optimal Bias Expected Reward c = Likelihood c Reward c Premises: Choose Alternative with larger Expected Reward c Result: X opt /  = log(Reward H /Reward L )/d' (This policy maximizes expected reward overall.)

As d’ increases, X opt decreases X opt =∞ X opt d’=0 d’ increases

Estimating normalized  and X c values from data at each signal lag, one difficulty level  x c  Std normal deviates (  = 1)

Estimating normalized  and X c values from data at each signal lag with multiple difficulty levels

Actual Criterion Optimal Criterion Optimal Bias with Multiple Difficulties

Subject’s sensitivity, a definition in theory of signal detectability When response signal delay varies For each subject, fit with function Empirical Characterization of Time-course of Change in Sensitivity (d’)

Subject Sensitivity

Actual vs. Optimal bias for three S’s Except for sl, all participants show the start with a high bias, then level off, conforming approximately to optimal. All participants are under-biased for short lags At longer lags, some are under, some are over, and some are ~optimal.

Our Questions Can we trace the effect of reward bias on decision making over time? Can we determine what would be the optimal reward effect? Can we determine how well participants do at achieving optimality?  Can we uncover the processing mechanisms that lead to the observed patterns of behavior?

Two Paths… Qualitative analysis with a one- dimensional decision variable (following Holmes and Feng) asking: –How should reward bias be represented? Possible answers: –Offset in initial conditions? –An additional term in the input to the decision variable? –A time-varying offset that optimizes reward? –A fixed offset in the value of the decision variable? Inverse-Micro-Speed-Accuracy Tradeoff (discovered by Juan) Steps toward a Leaky Competing Accumulator model that addresses this and other aspects of the data. Reward bias Stimulus Response signal

Qualitative Dynamical Analysis Based on one dimensional leaky integrator model. Input I = aC; C is chosen from {-5,-3,-1,1,3,5}. Initial condition: x = 0 Chose left if x > 0 when the response signal is detected; otherwise choose right. Accuracy approximates exponential approach to asymptote because of leakage. How is the reward implemented? –Offset in initial conditions? –An additional term in the input to the decision variable? –A time-varying offset that optimizes reward? –A fixed offset in the value of the decision variable?

Offset in Initial Conditions Note: Effect of bias decays away as t increases.

Reward as a term in the input Reward signal comes at –  ; processing starts at that time For t <  : input = b For t > , input = b+aC Notes: 1.Effect of the bias persists. 2.But bias is sub-optimal initially, and there is no ‘dip’. 3.Initially high bias and dip occurs if  starts low and increases at stimulus onset.

Time-varying term that optimizes rewards (No free parameter for reward bias) Expression for b(t) is for a single difficulty level. Bias is equivalent to a time- varying criterion = - b(t). There is a dip at No analytic expression is available for multiple difficulty levels, but numerical simulation is possible Time (s) P of choice toward larger reward RSC 1, diff 5 RSC 0, diff 5 RSC 1, diff 3 RSC 0, diff 3 RSC 1, diff 1 RSC 0, diff 1

Reward as a constant offset in the decision variable Notes: Equivalent to setting criterion at –  0 Bias effect persists for <0. With a single C level, a dip at Prediction and test: higher C level  earlier dip Variability in starting point or magnitude of offset can pull initial bias off ceiling.

Preliminary Conclusion Fit qual seems possible with one-d model If –We treat reward as a constant offset in the decision variable And The value of the constant varies from trial to trial –Or There is added starting point variability Next step Actually try to fit individual subject data

A New Phenomenon Discovered by Juan: Inverse-Micro-SAT! (also occurs in Monkey Data)

Consistent with other models? Ratcliff and colleagues, and also Shadlen and colleagues, argue for ‘integration to a bound’, even in response-signal tasks like this one. Once bound is reached, the participant enters a discrete decision state. Our data suggests that the decision variable remains continuous even to the end of the trial. –Time to respond reflects this continuous state.

High-Threshold Leaky Competing Accumulator Model Decision variable remains continuous until signal occurs Signal provides additional input to the accumulators, driving to high threshold x1x1 x2x2 Response Signal Response Triggered threshold Reward bias Stimulus Response signal

Preliminary Simulations Reward bias Stimulus Response signal

Reward, Stimulus and Response Cue All Contribute Input to Accumulators

Three Possible Architectures Reward bias Stimulus Response signal Reward bias Stimulus Response signal Reward bias Stimulus Response signal 1 2 3