Dynamics of Reward Bias Effects in Perceptual Decision Making

Slides:

Advertisements

Similar presentations

Quasi-Continuous Decision States in the Leaky Competing Accumulator Model Jay McClelland Stanford University With Joel Lachter, Greg Corrado, and Jim Johnston.

Advertisements

Decision Dynamics and Decision States: the Leaky Competing Accumulator Model Psychology 209 March 4, 2013.

PSYCHOPHYSICS What is Psychophysics? Classical Psychophysics Thresholds Signal Detection Theory Psychophysical Laws.

CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.

From T. McMillen & P. Holmes, J. Math. Psych. 50: 30-57, MURI Center for Human and Robot Decision Dynamics, Sept 13, Phil Holmes, Jonathan.

Inference in Dynamic Environments Mark Steyvers Scott Brown UC Irvine This work is supported by a grant from the US Air Force Office of Scientific Research.

Theory of Decision Time Dynamics, with Applications to Memory.

1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.

Signal Detection Theory October 10, 2013 Some Psychometrics! Response data from a perception experiment is usually organized in the form of a confusion.

What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

Decision Dynamics and Decision States in the Leaky Competing Accumulator Model Jay McClelland Stanford University With Juan Gao, Marius Usher and others.

Dynamics of Reward and Stimulus Information in Human Decision Making Juan Gao, Rebecca Tortell & James L. McClelland With inspiration from Bill Newsome.

Dynamics of Reward Bias Effects in Perceptual Decision Making Jay McClelland & Juan Gao Building on: Newsome and Rorie Holmes and Feng Usher and McClelland.

Optimal Decision-Making in Humans & Animals Angela Yu March 05, 2009.

Deep Feedforward Networks

From: Rat performance on visual detection task modeled with divisive normalization and adaptive decision thresholds Journal of Vision. 2011;11(9):1. doi: /

Figure 1.16 Detecting a stimulus using the signal detection theory (SDT) approach (Part 1) wolfe2e-fig jpg.

Jay McClelland Stanford University

Piercing of Consciousness as a Threshold-Crossing Operation

Contribution of spatial and temporal integration in heading perception

Dynamical Models of Decision Making Optimality, human performance, and principles of neural information processing Jay McClelland Department of Psychology.

Volume 53, Issue 1, Pages 9-16 (January 2007)

Evoked Response Potential (ERP) and Face Stimuli N170: negative-going potential at 170 ms Largest over the right parietal lobe,

A Classical Model of Decision Making: The Drift Diffusion Model of Choice Between Two Alternatives At each time step a small sample of noisy information.

David L. Barack, Steve W.C. Chang, Michael L. Platt Neuron

Volume 62, Issue 1, Pages (April 2009)

Backward Masking and Unmasking Across Saccadic Eye Movements

On the Nature of Decision States: Theory and Data

Comparison of observed switching behavior to ideal switching performance. Comparison of observed switching behavior to ideal switching performance. Conventions.

Volume 28, Issue 8, Pages e6 (April 2018)

Dynamical Models of Decision Making Optimality, human performance, and principles of neural information processing Jay McClelland Department of Psychology.

Using Time-Varying Motion Stimuli to Explore Decision Dynamics

Human Reward / Stimulus/ Response Signal Experiment: Data and Analysis

Marius Usher, Phil Holmes, Juan Gao, Bill Newsome and Alan Rorie

Recency vs Primacy -- an ongoing project

Value Representations in the Primate Striatum during Matching Behavior

Choice Certainty Is Informed by Both Evidence and Decision Time

Volume 84, Issue 1, Pages (October 2014)

Braden A. Purcell, Roozbeh Kiani Neuron

Learning to Simulate Others' Decisions

Volume 62, Issue 1, Pages (April 2009)

Nils Kolling, Marco Wittmann, Matthew F.S. Rushworth Neuron

Volume 40, Issue 6, Pages (December 2003)

Banburismus and the Brain

A Role for the Superior Colliculus in Decision Criteria

Attentional Modulations Related to Spatial Gating but Not to Allocation of Limited Resources in Primate V1 Yuzhi Chen, Eyal Seidemann Neuron Volume.

Inactivation of Medial Frontal Cortex Changes Risk Preference

Decision Making as a Window on Cognition

Volume 24, Issue 13, Pages (July 2014)

Volume 36, Issue 5, Pages (December 2002)

Dynamic Coding for Cognitive Control in Prefrontal Cortex

Jay and Juan building on Feng and Holmes

Interaction of Sensory and Value Information in Decision-Making

Consequences of the Oculomotor Cycle for the Dynamics of Perception

Franco Pestilli, Marisa Carrasco, David J. Heeger, Justin L. Gardner

Joseph T. McGuire, Matthew R. Nassar, Joshua I. Gold, Joseph W. Kable

Moran Furman, Xiao-Jing Wang Neuron

Neural Mechanisms of Speed-Accuracy Tradeoff

Volume 75, Issue 5, Pages (September 2012)

Consequences of the Oculomotor Cycle for the Dynamics of Perception

Franco Pestilli, Marisa Carrasco, David J. Heeger, Justin L. Gardner

Timescales of Inference in Visual Adaptation

Volume 28, Issue 8, Pages e6 (April 2018)

Learning to Simulate Others' Decisions

Analysis Assumptions -x m - m + c

Volume 92, Issue 2, Pages (October 2016)

Volume 75, Issue 5, Pages (September 2012)

Manuel Jan Roth, Matthis Synofzik, Axel Lindner Current Biology

Metacognitive Failure as a Feature of Those Holding Radical Beliefs

Volume 23, Issue 11, Pages (June 2013)

Presentation transcript:

Dynamics of Reward Bias Effects in Perceptual Decision Making Jay McClelland & Juan Gao Building on: Newsome and Rorie Holmes and Feng Usher and McClelland

Our Questions Can we trace the effect of reward bias on decision making over time? Can we determine what would be the optimal policy, and what constraints there are on this policy? Can we determine how well participants do at achieving optimality? Can we uncover the processing mechanisms that lead to the observed patterns of behavior?

Overview Experiment Results Optimality analysis Abstract dynamical model Mechanistic dynamical model

Human Experiment Examining Reward Bias Effect at Different Time Points after Target Onset Stimuli are rectangles shifted 1,3, or 5 pixels L or R of fixation Reward cue occurs 750 msec before stimulus. Small arrow head visible for 250 msec. Only biased reward conditions (2 vs 1 and 1 vs 2) are considered. Response signal occurs at these times after stimulus onset: 0 75 150 225 300 450 600 900 1200 2000 Participant receives reward (one or two points) if response occurs within 250 msec of response signal and is correct. Participants were run for 15-25 sessions to provide stable data. Data shown are from later sets of sessions in which the biasing effect of reward appeared to be fairly stable.

A participant with very little reward bias Top panel shows probability of response giving larger reward as a function of actual response time for combinations of: Stimulus shift (1 3 5) pixels Reward-stimulus compatibility Lower panel shows data transformed to z scores, and corresponds to the theoretical construct: mean(x1(t)-x2(t))+bias(t) sd(x1(t)-x2(t)) where x1 represents the state of the accumulator associated with greater reward, x2 the same for lesser reward, and S is thought to choose larger reward if x1(t)-x2(t)+bias(t) > 0.

Participants Showing Reward Bias

Abstract optimality analysis

Assumptions At a given time, two distributions, means +mu, -mu, same STD sigma. Choice x >?< X_c For three difficulty levels, same STD sigma, means mu_i (i=1,2,3), same X_c.

Only one diff level Three diff levels Subject’s sensitivity, a definition in theory of signal detectability When response signal delay varies For each subject, fit with function

Subject Sensitivity

Real “bias” Optimal “bias”

Dynamical analysis Based on one dimensional leaky integrator model. Initial condition: x = 0 Chose left if x > 0 when the response signal is detected; otherwise choose right. Accuracy approximates exponential approach to asymptote because of leakage. How is the reward implemented? A time-varying offset that optimizes reward? Offset in initial conditions? An additional term in the input to the decision variable? A fixed offset in the value of the decision variable?

1. Time-varying term that optimizes rewards (No free parameter for reward bias) 0.5 1 1.5 2 2.5 0.2 0.4 0.6 0.8 Time (s) P of choice toward larger reward RSC 1, diff 5 RSC 0, diff 5 RSC 1, diff 3 RSC 0, diff 3 RSC 1, diff 1 RSC 0, diff 1 Notes: Equivalent to a time-varying criterion = -b(t). There is a dip at Prediction and test: higher C level  earlier dip. For multiple C levels, no analytical expressions.

2. Offset in initial conditions Notes: Effect of the bias decays away for lambda<0. Single C level , a dip at Prediction and test: higher C level  earlier dip

3. Reward as a term in the input Reward signal comes -t seconds relative to stimulus. For t<0: input = b; noise sd = s For t>0, input = b+aC; noise continues as before. Notes: Effect of the bias persists. But bias is sub-optimal initially, and there is no dip. They forgot the 2 here. Thoeritically, the dip should happen at 1/lambda* log ( (ac-bk)/(ack^2-bk^2) ), where k=exp(lambda*tau). The t calculated is negative. 18

4. Reward as a constant offset in the decision variable Note: Equivalent to setting criterion at –m0 Effect persists for lambda<0. Single C level , a dip at Prediction and test: higher C level  earlier dip

5. Reward as a term in the input, creating variability at stimulus onset Reward signal comes -t seconds relative to stimulus. For t<0: input = b, noise sd = sb Eor t>0, input = b+aC; noise sd = sb+s. Notes: Effect of the bias persists. If sb = 0, no dip. Prediction and test: given small sb, longer reward period  later and shallower dip. They forgot the 2 here. Thoeritically, the dip should happen at 1/lambda* log ( (ac-bk)/(ack^2-bk^2) ), where k=exp(lambda*tau). The t calculated is negative. 20

Leaky Competing Integrator Model Inputs for: reward stimulus response signal High threshold for