Seeing Patterns in Randomness: Irrational Superstition or Adaptive Behavior? Angela J. Yu University of California, San Diego March 9, 2010
“Irrational” Probabilistic Reasoning in Humans … Random stimulus sequence: 1 2 “hot hand” 2AFC: sequential effects (rep/alt) (Gillovich, Vallon, & Tversky, 1985) (Soetens, Boer, & Hueting, 1985) (Wilke & Barrett, 2009)
“Superstitious” Predictions Subjects are “superstitious” when viewing randomized stimuli O o o o o o O O o O o O O… repetitionsalternations slow fast Trials Subjects slower & more error-prone when local pattern is violated Patterns are by chance, not predictive of next stimulus Such “superstitious” behavior is apparently sub-optimal
“Graded” Superstition (Cho et al, 2002) (Soetens et al, 1985) [o o O O O] RARR = or [O O o o o] RT ER Hypothesis: Sequential adjustments may be adaptive for changing environments. t t-1 t-2 t-3
Outline “Ideal predictor” in a fixed vs. changing world Exponential forgetting normative and descriptive Optimal Bayes or exponential filter? Neural implementation of prediction/learning
I. Fixed Belief Model (FBM) A (0) R (1) hidden bias observed stimuli ? … ?
II. Dynamic Belief Model (DBM) A (0)R (1) changing bias observed stimuli ? ?.3.8.3
RA bias What the FBM subject should believe about the bias of the coin, given a sequence of observations: R R A R R R FBM Subject’s Response to Random Inputs
What the FBM subject should believe about the bias of the coin, given a long sequence of observations: R R A R A A R A A R A… RA bias
What the DBM subject should believe about the bias of the coin, given a long sequence of observations: R R A R A A R A A R A… RA bias DBM Subject’s Response to Random Inputs
Randomized Stimuli: FBM > DBM Given a sequence of truly random data ( =.5) … FBM: belief distrib. over Simulated trials Probability DBM: belief distrib. over Simulated trials Probability Driven by long-term average Driven by transient patterns
“Natural Environment”: DBM > FBM In a changing world, where undergoes un-signaled changes … FBM: posterior over Simulated trials Probability Adapt poorly to changes Adapt rapidly to changes DBM: posterior over Simulated trials Probability
Persistence of Sequential Effects Sequential effects persist in data DBM produces R/A asymmetry Subjects=DBM (changing world) FBM P(stimulus) DBM P(stimulus) Human Data (data from Cho et al, 2002) RT
Outline “Ideal predictor” in a fixed vs. changing world Exponential forgetting normative and descriptive Optimal Bayes or exponential filter? Neural implementation of prediction/learning
Bayesian Computations in Neurons? Optimal Prediction What subjects need to compute Too hard to represent, too hard to compute! Generative Model What subjects need to know
(Sugrue, Corrado, & Newsome, 2004) Simpler Alternative for Neural Computation? Inspiration: exponential forgetting in tracking true changes
Exponential Forgetting in Behavior Exponential discounting is a good descriptive model Linear regression: R/A Human Data Trials into the Past Coefficients (re-analysis of Cho et al)
Linear regression: R/A Exponential discounting is a good normative model DBM Prediction Trials into the Past Coefficients Exponential Forgetting Approximates DBM
Discount Rate vs. Assumed Rate of Change … DBM =.95 Simulated trials Probability =.77 Simulated trials
Trials into the Past DBM Simulation Coefficients Human Data Trials into the Past Coefficients =.57 Reverse-engineering Subjects’ Assumptions = p( t = t-1 ) =.57 =.77 changes once every four trials 2/3
Analytical Approximation Quality of approximation vs. nonlinear Bayesian computations3-param model 1-param linear model
Outline “Ideal predictor” in a fixed vs. changing world Exponential forgetting normative and descriptive Optimal Bayes or exponential filter? Neural implementation of prediction/learning
Subjects’ RT vs. Model Stimulus Probability Repetition Trials R A R R R R …
Subjects’ RT vs. Model Stimulus Probability Repetition Trials R A R R R R … RT
Subjects’ RT vs. Model Stimulus Probability Repetition Trials Alternation Trials R A R R R R … RT
Subjects’ RT vs. Model Stimulus Probability Repetition vs. Alternation Trials
Multiple-Timescale Interactions Optimal discrimination (Wald, 1947) 2 1 discrete time, SPRT continuous-time, DDM DBM (Yu, NIPS 2007) (Frazier & Yu, NIPS 2008) (Gold & Shadlen, Neuron 2002)
SPRT/DDM & Linear Effect of Prior on RT Timesteps RT hist Bias: P(s 1 ) Bias : P(s 1 ) x tanh x 0
SPRT/DDM & Linear Effect of Prior on RT Empirical RT vs. Stim Probability Bias: P(s 1 ) Predicted RT vs. Stim Probability
Outline “Ideal predictor” in a fixed vs. changing world Exponential forgetting normative and descriptive Optimal Bayes or exponential filter? Neural implementation of prediction/learning
Neural Implementation of Prediction Leaky-integrating neuron: Perceptual decision-making (Grice, 1972; Smith, 1995; Cook & Maunsell, 2002; Busmeyer & Townsend, 1993; McClelland, 1993; Bogacz et al, 2006; Yu, 2007; …) Trial-to-trial interactions (Kim & Myung, 1995; Dayan & Yu, 2003; Simen, Cohen & Holmes, 2006; Mozer, Kinoshita, & Shettel, 2007; …) bias input recurrent = 1/2 (1- )1/3 2/3
Neuromodulation & Dynamic Filters Leaky-integrating neuron: bias input recurrent Norepinephrine (NE) (Hasselmo, Wyble, & Wallenstein 1996; Kobayashi, 2000) Trials NE: Unexpected Uncertainty (Yu & Dayan, Neuron, 2000)
Learning the Value of Humans (Behrens et al, 2007) and rats (Gallistel & Latham, 1999) may encode meta-changes in the rate of change, Bayesian Learning … … … … Iteratively compute joint posterior Marginal posterior over Marginal posterior over
Neurons don’t need to represent probabilities explicitly Just need to estimate Stochastic gradient descent ( -rule) Neural Parameter Learning? learning rate error gradient
Learning Results Trials Stochastic Gradient Descent Trials Bayesian Learning
Summary H: “ Superstition” reflects adaptation to changing world Exponential “memory” near-optimal & fits behavior; linear RT Neurobiology: leaky integration, stochastic -rule, neuromodulation Random sequence and changing biases hard to distinguish Questions: multiple outcomes? Explicit versus implicit prediction?
Unlearning Temporal Correlation is Slow Marginal posterior over Marginal posterior over Trials Probability (see Bialek, 2005)
Insight from Brain’s “Mistakes” Ex: visual illusions (Adelson, 1995)
lightness depth context Neural computation specialized for natural problems Ex: visual illusions Insight from Brain’s “Mistakes”
Discount Rate vs. Assumed Rate of Change Iterative form of linear exponential Exact inference is non-linear Linear approximation Empirical distribution
Bayesian Inference Posterior Generative Model (what subject “knows”) 1: repetition 0: alternation Optimal Prediction (Bayes’ Rule)
Bayesian Inference Optimal Prediction (Bayes’ Rule) Generative Model (what subject “knows”)
Power-Law Decay of Memory Human memory Stationary process! Hierarchical Chinese Restaurant Process 1074 … (Teh, 2006) Natural (language) statistics (Anderson & Schooler, 1991)
Ties Across Time, Space, and Modality Sequential effects RT Stroop GREEN SSHSS Eriksen time modality space (Yu, Dayan, Cohen, JEP: HPP 2008) (Liu, Yu, & Holmes, Neur Comp 2008)
Sequential Effects Perceptual Discrimination Optimal discrimination (Wald, 1947) R A discrete time, SPRT continuous-time, DDM DBM PFC (Yu & Dayan, NIPS 2005) (Yu, NIPS 2007) (Frazier & Yu, NIPS 2008) (Gold & Glimcher, Neuron 2002)
Monkey G Coefficients Trials into past =.72 Exponential Discounting for Changing Rewards Monkey F Coefficients Trials into past =.63 (Sugrue, Corrado, & Newsome, 2004)
Monkey G Coefficients Trials into past =.72 Monkey F Coefficients Trials into past =.63 Human & Monkey Share Assumptions? MonkeyHuman ≈ ! =.68 =.80
Simulation Results Trials Learning via stochastic -rule
Monkeys’ Discount Rates in Choice Task (Sugrue, Corrado, & Newsome, 2004) Monkey F Coefficients Trials into past = Monkey G Coefficients Trials into past =
Human & Monkey Share Assumptions? MonkeyHuman ≈ !