Optimal Decision-Making in Humans & Animals Angela Yu March 05, 2009.

Slides:



Advertisements
Similar presentations
Bayesian inference Lee Harrison York Neuroimaging Centre 01 / 05 / 2009.
Advertisements

Brief introduction on Logistic Regression
Statistical Decision Theory Abraham Wald ( ) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.
Biointelligence Laboratory, Seoul National University
CMPUT 466/551 Principal Source: CMU
Chapter 4: Linear Models for Classification
Quasi-Continuous Decision States in the Leaky Competing Accumulator Model Jay McClelland Stanford University With Joel Lachter, Greg Corrado, and Jim Johnston.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Decision Dynamics and Decision States: the Leaky Competing Accumulator Model Psychology 209 March 4, 2013.
The loss function, the normal equation,
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Visual Recognition Tutorial
HON207 Cognitive Science Sequential Sampling Models.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Chapter 2: Bayesian Decision Theory (Part 1) Introduction Bayesian Decision Theory–Continuous Features All materials used in this course were taken from.
Statistical Decision Theory, Bayes Classifier
Classification with reject option in gene expression data Blaise Hanczar and Edward R Dougherty BIOINFORMATICS Vol. 24 no , pages
Sequential Hypothesis Testing under Stochastic Deadlines Peter Frazier, Angela Yu Princeton University TexPoint fonts used in EMF. Read the TexPoint manual.
Machine Learning CMPT 726 Simon Fraser University
Visual Recognition Tutorial
From T. McMillen & P. Holmes, J. Math. Psych. 50: 30-57, MURI Center for Human and Robot Decision Dynamics, Sept 13, Phil Holmes, Jonathan.
Distinguishing Evidence Accumulation from Response Bias in Categorical Decision-Making Vincent P. Ferrera 1,2, Jack Grinband 1,2, Quan Xiao 1,2, Joy Hirsch.
Collaborative Filtering Matrix Factorization Approach
Theory of Decision Time Dynamics, with Applications to Memory.
Seeing Patterns in Randomness: Irrational Superstition or Adaptive Behavior? Angela J. Yu University of California, San Diego March 9, 2010.
Statistical Decision Theory
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Decision Making Theories in Neuroscience Alexander Vostroknutov October 2008.
On optimal quantization rules for some sequential decision problems by X. Nguyen, M. Wainwright & M. Jordan Discussion led by Qi An ECE, Duke University.
Dynamic Decision Making in Complex Task Environments: Principles and Neural Mechanisms Annual Workshop Introduction August, 2008.
D ECIDING WHEN TO CUT YOUR LOSSES Matt Cieslak, Tobias Kluth, Maren Stiels & Daniel Wood.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
The Computing Brain: Focus on Decision-Making
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.
Optimal Eye Movement Strategies In Visual Search.
Simultaneous integration versus sequential sampling in multiple-choice decision making Nate Smith July 20, 2008.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Bayesian inference Lee Harrison York Neuroimaging Centre 23 / 10 / 2009.
Psychology and Neurobiology of Decision-Making under Uncertainty Angela Yu March 11, 2010.
Dynamics of Reward Bias Effects in Perceptual Decision Making Jay McClelland & Juan Gao Building on: Newsome and Rorie Holmes and Feng Usher and McClelland.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Lecture 2. Bayesian Decision Theory
Lecture 1.31 Criteria for optimal reception of radio signals.
Bayesian Estimation and Confidence Intervals
Deep Feedforward Networks
Variational filtering in generated coordinates of motion
Dynamics of Reward Bias Effects in Perceptual Decision Making
Dynamical Models of Decision Making Optimality, human performance, and principles of neural information processing Jay McClelland Department of Psychology.
A Classical Model of Decision Making: The Drift Diffusion Model of Choice Between Two Alternatives At each time step a small sample of noisy information.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Collaborative Filtering Matrix Factorization Approach
Discrete Event Simulation - 4
Dynamic Causal Model for evoked responses in M/EEG Rosalyn Moran.
Dynamical Models of Decision Making Optimality, human performance, and principles of neural information processing Jay McClelland Department of Psychology.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Using Time-Varying Motion Stimuli to Explore Decision Dynamics
Braden A. Purcell, Roozbeh Kiani  Neuron 
Probabilistic Population Codes for Bayesian Decision Making
Banburismus and the Brain
Decision Making as a Window on Cognition
The loss function, the normal equation,
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Mathematical Foundations of BME Reza Shadmehr
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Presentation transcript:

Optimal Decision-Making in Humans & Animals Angela Yu March 05, 2009

Introduction A simple decision-making (DM) task binary decision rate of information flow constant over time task difficulty easily manipulated

Random-Dot Coherent Motion Paradigm 30% Coherence5% Coherence

Neurobiology Saccade generation system LIP response vs. coherence (Roitman & Shalden, 2002; Gold & Shadlen, 2004)

Behavioral Data Accuracy vs. Coherence vs. Coherence (Roitman & Shadlen, 2002)

A Need for Theoretical Framework for DM Speed Accuracyvs. Faster responses save time, but more errors What constitutes an optimal policy? Are humans/animals optimal? How is it neurally implemented?

Review of Bayesian decision theory: examples Quantifying temporal cost Optimal policy for 2-alternative forced choice – optimal policy (SPRT) – relation to data Deciding under a deadline Outline

Bayesian Decision Theory: the Idea s x hidden variable observation ? loss function

mean of p(s|x) is optimal Bayesian Decision Theory: Other Examples median of p(s|x) is optimal Ex. 2: loss function is absolute error: Ex. 1: loss function is mean square error:

Sequential Data  Repeated Decisions wait +: more info -: more time (Bayes’ Rule)

Loss function penalizes both error and delay: Optimal Policy (Bellman equation): Loss Function and Optimal Policy Stopping cost: Expected loss: Continuation cost: After observing inputs x 1, …, x t, continue (observe x t+1 ) only if the continuation cost < stopping cost

An Intractable Solution in Practice Stopping Cost Too hard!

b a Optimal Decision Policy: SPRT Wald & Wolfowitz (1948): Continuation region is a fixed interval (a, b) on the unit interval: At time t, continue if b > q t > a, stop and choose = 1 if q t > b, report = 0 if q t < a. example trajectories

Intuition for Optimality of SPRT Cost qtqt 01 c ab Q s (0) = w q t Q s (1) = v (1 - q t ) Q c is concave decide 0decide 1continue

Sequential Probability Ratio Test Log posterior ratio (monotonic to q t ): b’ a’ 0 rtrt Additive updates (random walk):

Relating Back to Data AccuracyMean RT (Data from Roitman & Shadlen, 2002; analysis from Ditterich, 2007) BehaviorBiology LIP response vs. coherence

Mid-way Summary Optimal binary DM = evidence accumulation to thresholds Perceptual DM behavior similar to that expected from SPRT Neural activities in LIP  Bayesian evidence integration SPRT a formal link between biology and behavior in DM

Review of Bayesian decision theory: examples Quantifying temporal cost Optimal policy for 2-alternative forced choice – optimal policy (SPRT) – relation to data Deciding under a deadline Outline

Caveat: Model Fit Imperfect Mean RTRT Distributions (Data from Roitman & Shadlen, 2002; analysis from Ditterich, 2007) SPRT predicts same RT distribution for correct & error trials Real error trials are slower

b a SPRT: Same RT for Correct and Errors

Fix 1: Variable Drift Rate (Data from Roitman & Shadlen, 2002; analysis from Ditterich, 2007) (idea from Ratcliff & Rouder, 1998) AccuracyMean RTRT Distributions

Fix 2: Increasing Drift Rate (Data from Roitman & Shadlen, 2002; analysis from Ditterich, 2007) AccuracyMean RTRT Distributions

A Principled Approach Trials aborted w/o response or reward if monkey breaks fixation (Data from Roitman & Shadlen, 2002) (Analysis from Ditterich, 2007) More “urgency” over time as risk of aborting trial increases Can we model this in a principled way?

Optimal Policy is a pair of of monotonically decaying thresholds Probability 1 0 Generalized SPRT Classic SPRT Time Intuition for decay With limited time, there is limited opportunity to observe enough evidence to bias the evidence in the opposite direction Optimal Decision Making Under a Deadline (Frazier & Yu, 2007) Loss function depends on error, delay, and whether deadline exceed

Threshold Decay  Fast RT & Slower Errors Probability 1 0 Generalized SPRT Classic SPRT Time (Frazier & Yu, 2007) Earlier deadline  greater urgency Time Probability

Timing Uncertainty (Frazier & Yu, 2007) Timing uncertainty  more conservative policy (lower threhsolds) Time Probability This leads to a set of testable predictions

Generalizing the Cost Function Loss function increases faster than linear in time f(  ) is convex and increasing. Nonlinear trade-off between accuracy and time (e.g. maximizing reward rate)

Summary Bayesian decision theory as a formal framework for DM Inadequacies of SPRT to account for behavior Inherent time pressure  decaying thresholds Neural implementation of decaying thresholds?

Perceptual DM + Saccade Planning Task: find & identify target sensory processing cost (time, accuracy) learning/memory attention Current/Future Project BehaviorTheoryNeurobiology Human Rats Bayesian modeling Stochastic control theory fMRI Recording

Bayesian Decision Theory Decision policy Loss function: Expected loss: A policy  : x  d is a mapping from input x into decision d, where x is a sampled from the distribution p(x|s). The loss function l(d(x); s) depends on the decision d(x) and the hidden variable s. The expected loss associated with a policy  is averaged over the prior distribution p(s) and the likelihood p(x|s): The optimal policy minimizes this expected loss.

Examples Ex. 1: loss function is mean square error For each observation x, we need to minimize: Differentiating and setting to 0, we find loss is minimized if: Optimal policy is to report the mean of posterior p(s|x).

Examples Ex. 2: loss function is absolute error: Optimal policy is to report the median of posterior p(s|x). Ex. 3: loss function is 0-1 error: 0 if correct, 1 all other choices Optimal policy is to report the mode of posterior p(s|x).

An Example: Binary Hypothesis Testing Decision Problem: xx … x1x1 s x2x2 x3x3 Incorporate evidence iteratively (Bayes’ Rule):

An Intractable Solution in Practice Stopping Cost Too hard!

Generalize Loss Function We assumed loss is linear in error and delay: Wald also proved a dual statement: Amongst all decision policies satisfying the criterion SPRT (with some thresholds) minimizes the expected sample size. This implies that the SPRT is optimal for all loss functions that increase with inaccuracy and (linearly in) delay (proof by contradiction). (Bogacz et al, 2006) What if it’s non-linear, e.g. maximize reward rate:

Examples Ex. 2: loss function is absolute error: Optimal policy is to report the median of posterior p(s|x). Ex. 3: loss function is 0-1 error: 0 if correct, 1 all other choices Optimal policy is to report the mode of posterior p(s|x).

Some Intuitions for the Proof (Frazier & Yu, 2007) Continuation Region Shrinking!