Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamics of Reward Bias Effects in Perceptual Decision Making Jay McClelland & Juan Gao Building on: Newsome and Rorie Holmes and Feng Usher and McClelland.

Similar presentations


Presentation on theme: "Dynamics of Reward Bias Effects in Perceptual Decision Making Jay McClelland & Juan Gao Building on: Newsome and Rorie Holmes and Feng Usher and McClelland."— Presentation transcript:

1 Dynamics of Reward Bias Effects in Perceptual Decision Making Jay McClelland & Juan Gao Building on: Newsome and Rorie Holmes and Feng Usher and McClelland

2 Our Questions Can we trace the effect of reward bias on decision making over time? Can we determine what would be the optimal reward effect? Can we determine how well participants do at achieving optimality? Can we uncover the processing mechanisms that lead to the observed patterns of behavior?

3 Overview Experiment Results Optimality analysis Abstract (one-d) dynamical model Mechanistic (two-d) dynamical model

4 Human Experiment Examining Reward Bias Effect at Different Time Points after Target Onset Reward cue occurs 750 msec before stimulus. –Small arrow head visible for 250 msec. –Only biased reward conditions (2 vs 1 and 1 vs 2) are considered. Stimuli are rectangles shifted 1,3, or 5 pixels L or R of fixation Response signal (tone) occurs at 10 different lags : 0 75 150 225 300 450 600 900 1200 2000 -Participant receives reward if response occurs within 250 msec of response signal and is correct. -Participants were run for 15-25 sessions to provide stable data. -Data are from later sessions in which the effect of reward appeared to be fairly stable. 0-2000 msec Reward Cue Stimulus 750 msec Response Window Response Signal

5 tradeoff Sequential sampling models are really about speed- accuracy tradeoff Air Force has a strong interest in understanding dynamic decision making under time pressure – learn how the brain does it! Slide from Zhang 2007

6 A participant with very little reward bias Top panel shows probability of response giving larger reward as a function of actual response time for combinations of: Stimulus shift (1 3 5) pixels Reward-stimulus compatibility Lower panel shows data transformed to z scores, and corresponds to the theoretical construct: mean(x 1 (t)-x 2 (t))+bias(t) sd(x 1 (t)-x 2 (t)) where x 1 represents the state of the accumulator associated with greater reward, x 2 the same for lesser reward, and S is thought to choose larger reward if x 1 (t)-x 2 (t)+bias(t) > 0.

7 Participants Showing Reward Bias

8

9 Summary Initial bias is high, and tapers off over time, to a fixed low level. Questions –Is this reasonable? –How close to optimal is it? –Are some subjects more optimal that others?

10 Abstract optimality analysis

11 Assumptions (from Signal Detection Theory) At a given time, decision variable comes from one of two distributions, means - , + , same STD . [‘-’ is ‘consistent with high reward’] Choose High reward alternative (H) if x < X c, else choose low reward alternative (L). For three difficulty levels, means ±  i (i=1,2,3), with shared , same choice policy. -10-8-6-4-20246810 0 0.1 0.2 0.3 0.4 0.5 0.6  x c 

12 X opt Optimal Bias Expected Reward c = Likelihood c Reward c Premises: Choose Alternative with larger Expected Reward c Result: X opt /  = log(Reward H /Reward L )/d' (This policy maximizes expected reward overall.)

13 As d’ increases, X opt decreases X opt =∞ X opt d’=0 d’ increases

14 Estimating normalized  and X c values from data at each signal lag, one difficulty level -10-8-6-4-20246810 0 0.1 0.2 0.3 0.4 0.5 0.6  x c  Std normal deviates (  = 1)

15 Estimating normalized  and X c values from data at each signal lag with multiple difficulty levels

16 Actual Criterion Optimal Criterion Optimal Bias with Multiple Difficulties

17 Subject’s sensitivity, a definition in theory of signal detectability When response signal delay varies For each subject, fit with function Empirical Characterization of Time-course of Change in Sensitivity (d’)

18 Subject Sensitivity

19

20 Actual vs. Optimal bias for three S’s Except for sl, all participants show the start with a high bias, then level off, conforming approximately to optimal. All participants are under-biased for short lags At longer lags, some are under, some are over, and some are ~optimal.

21 Our Questions Can we trace the effect of reward bias on decision making over time? Can we determine what would be the optimal reward effect? Can we determine how well participants do at achieving optimality?  Can we uncover the processing mechanisms that lead to the observed patterns of behavior?

22 Two Paths… Qualitative analysis with a one- dimensional decision variable (following Holmes and Feng) asking: –How should reward bias be represented? Possible answers: –Offset in initial conditions? –An additional term in the input to the decision variable? –A time-varying offset that optimizes reward? –A fixed offset in the value of the decision variable? Inverse-Micro-Speed-Accuracy Tradeoff (discovered by Juan) Steps toward a Leaky Competing Accumulator model that addresses this and other aspects of the data. Reward bias Stimulus Response signal

23 Qualitative Dynamical Analysis Based on one dimensional leaky integrator model. Input I = aC; C is chosen from {-5,-3,-1,1,3,5}. Initial condition: x = 0 Chose left if x > 0 when the response signal is detected; otherwise choose right. Accuracy approximates exponential approach to asymptote because of leakage. How is the reward implemented? –Offset in initial conditions? –An additional term in the input to the decision variable? –A time-varying offset that optimizes reward? –A fixed offset in the value of the decision variable?

24 Offset in Initial Conditions Note: Effect of bias decays away as t increases.

25 Reward as a term in the input Reward signal comes at –  ; processing starts at that time For t <  : input = b For t > , input = b+aC Notes: 1.Effect of the bias persists. 2.But bias is sub-optimal initially, and there is no ‘dip’. 3.Initially high bias and dip occurs if  starts low and increases at stimulus onset.

26 Time-varying term that optimizes rewards (No free parameter for reward bias) Expression for b(t) is for a single difficulty level. Bias is equivalent to a time- varying criterion = - b(t). There is a dip at No analytic expression is available for multiple difficulty levels, but numerical simulation is possible. 00.511.522.5 0 0.2 0.4 0.6 0.8 1 Time (s) P of choice toward larger reward RSC 1, diff 5 RSC 0, diff 5 RSC 1, diff 3 RSC 0, diff 3 RSC 1, diff 1 RSC 0, diff 1

27 Reward as a constant offset in the decision variable Notes: Equivalent to setting criterion at –  0 Bias effect persists for <0. With a single C level, a dip at Prediction and test: higher C level  earlier dip Variability in starting point or magnitude of offset can pull initial bias off ceiling.

28 Preliminary Conclusion Fit qual seems possible with one-d model If –We treat reward as a constant offset in the decision variable And The value of the constant varies from trial to trial –Or There is added starting point variability Next step Actually try to fit individual subject data

29 A New Phenomenon Discovered by Juan: Inverse-Micro-SAT! (also occurs in Monkey Data)

30 Consistent with other models? Ratcliff and colleagues, and also Shadlen and colleagues, argue for ‘integration to a bound’, even in response-signal tasks like this one. Once bound is reached, the participant enters a discrete decision state. Our data suggests that the decision variable remains continuous even to the end of the trial. –Time to respond reflects this continuous state.

31 High-Threshold Leaky Competing Accumulator Model Decision variable remains continuous until signal occurs Signal provides additional input to the accumulators, driving to high threshold x1x1 x2x2 Response Signal Response Triggered threshold Reward bias Stimulus Response signal

32 Preliminary Simulations Reward bias Stimulus Response signal

33 Reward, Stimulus and Response Cue All Contribute Input to Accumulators

34 Three Possible Architectures Reward bias Stimulus Response signal Reward bias Stimulus Response signal Reward bias Stimulus Response signal 1 2 3


Download ppt "Dynamics of Reward Bias Effects in Perceptual Decision Making Jay McClelland & Juan Gao Building on: Newsome and Rorie Holmes and Feng Usher and McClelland."

Similar presentations


Ads by Google