What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

Slides:



Advertisements
Similar presentations
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Advertisements

Distributional Property Estimation Past, Present, and Future Gregory Valiant (Joint work w. Paul Valiant)
Model Assessment and Selection
Model assessment and cross-validation - overview
Quasi-Continuous Decision States in the Leaky Competing Accumulator Model Jay McClelland Stanford University With Joel Lachter, Greg Corrado, and Jim Johnston.
Decision Dynamics and Decision States: the Leaky Competing Accumulator Model Psychology 209 March 4, 2013.
CHARDD Kickoff Meeting, Princeton University, September 13, 2007 Toward a Theory of Protocols for Communication Through Action John Baillieul C.I.S.E.
Visual Recognition Tutorial
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
x – independent variable (input)
HON207 Cognitive Science Sequential Sampling Models.
Sequential Hypothesis Testing under Stochastic Deadlines Peter Frazier, Angela Yu Princeton University TexPoint fonts used in EMF. Read the TexPoint manual.
Decision making. ? Blaise Pascal Probability in games of chance How much should I bet on ’20’? E[gain] = Σgain(x) Pr(x)
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
From T. McMillen & P. Holmes, J. Math. Psych. 50: 30-57, MURI Center for Human and Robot Decision Dynamics, Sept 13, Phil Holmes, Jonathan.
Distinguishing Evidence Accumulation from Response Bias in Categorical Decision-Making Vincent P. Ferrera 1,2, Jack Grinband 1,2, Quan Xiao 1,2, Joy Hirsch.
Does Math Matter to Gray Matter? (or, The Rewards of Calculus). Philip Holmes, Princeton University with Eric Brown (NYU), Rafal Bogacz (Bristol, UK),
CONTROL of NONLINEAR SYSTEMS with LIMITED INFORMATION Daniel Liberzon Coordinated Science Laboratory and Dept. of Electrical & Computer Eng., Univ. of.
Theory of Decision Time Dynamics, with Applications to Memory.
An Integrated Model of Decision Making and Visual Attention Philip L. Smith University of Melbourne Collaborators: Roger Ratcliff, Bradley Wolfgang.
沈致远. Test error(generalization error): the expected prediction error over an independent test sample Training error: the average loss over the training.
Seeing Patterns in Randomness: Irrational Superstition or Adaptive Behavior? Angela J. Yu University of California, San Diego March 9, 2010.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
IE241: Introduction to Hypothesis Testing. We said before that estimation of parameters was one of the two major areas of statistics. Now let’s turn to.
Xiao-Jing Wang Department of Neurobiology Yale University School of Medicine The Concept of a Decision Threshold in Sensory-Motor Processes.
Properties of OLS How Reliable is OLS?. Learning Objectives 1.Review of the idea that the OLS estimator is a random variable 2.How do we judge the quality.
Curve Registration The rigid metric of physical time may not be directly relevant to the internal dynamics of many real-life systems. Rather, there can.
Decision Making Theories in Neuroscience Alexander Vostroknutov October 2008.
Collective neural dynamics and drift-diffusion models for simple decision tasks. Philip Holmes, Princeton University. Eric Brown (NYU), Rafal Bogacz (Bristol,
Dynamic Decision Making in Complex Task Environments: Principles and Neural Mechanisms Annual Workshop Introduction August, 2008.
Dynamic Decision Making in Complex Task Environments: Principles and Neural Mechanisms Progress and Future Directions November 17, 2009.
BCS547 Neural Decoding.
Image Stabilization by Bayesian Dynamics Yoram Burak Sloan-Swartz annual meeting, July 2009.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
The Computing Brain: Focus on Decision-Making
Modeling interactions between visually responsive and movement related neurons in frontal eye field during saccade visual search Braden A. Purcell 1, Richard.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Decision Dynamics and Decision States in the Leaky Competing Accumulator Model Jay McClelland Stanford University With Juan Gao, Marius Usher and others.
Chapter 3. Stochastic Dynamics in the Brain and Probabilistic Decision-Making in Creating Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning.
1 Cost Drivers and Cost Behavior CHAPTER 5 © 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
Machine Learning 5. Parametric Methods.
Statistics Presentation Ch En 475 Unit Operations.
Response dynamics and phase oscillators in the brainstem
Progress in MURI 15 ( ) Mathematical modeling of decision behavior. AFOSR, Alexandria, VA, Nov 17th, 2009 Phil Holmes 1. Optimizing monkeys? Balancing.
Simultaneous integration versus sequential sampling in multiple-choice decision making Nate Smith July 20, 2008.
The Physics of Decision-Making: Cognitive Control as the Optimization of Behavior Gary Aston-Jones ∞ Rafal Bogacz * † ª Eric Brown † Jonathan D. Cohen.
GWAS Data Analysis. L1 PCA Challenge: L1 Projections Hard to Interpret (i.e. Little Data Insight) Solution: 1)Compute PC Directions Using L1 2)Compute.
Psychology and Neurobiology of Decision-Making under Uncertainty Angela Yu March 11, 2010.
Does the brain compute confidence estimates about decisions?
Dynamics of Reward Bias Effects in Perceptual Decision Making Jay McClelland & Juan Gao Building on: Newsome and Rorie Holmes and Feng Usher and McClelland.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Optimal Decision-Making in Humans & Animals Angela Yu March 05, 2009.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Mechanisms of Simple Perceptual Decision Making Processes
Dynamics of Reward Bias Effects in Perceptual Decision Making
Jay McClelland Stanford University
Dynamical Models of Decision Making Optimality, human performance, and principles of neural information processing Jay McClelland Department of Psychology.
A Classical Model of Decision Making: The Drift Diffusion Model of Choice Between Two Alternatives At each time step a small sample of noisy information.
Classification Discriminant Analysis
Dynamic Causal Model for evoked responses in M/EEG Rosalyn Moran.
Dynamical Models of Decision Making Optimality, human performance, and principles of neural information processing Jay McClelland Department of Psychology.
Using Time-Varying Motion Stimuli to Explore Decision Dynamics
Human Reward / Stimulus/ Response Signal Experiment: Data and Analysis
Marius Usher, Phil Holmes, Juan Gao, Bill Newsome and Alan Rorie
Data Structures Sorting Haim Kaplan & Uri Zwick December 2014.
Banburismus and the Brain
Decision Making as a Window on Cognition
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks to NSF & NIMH.

Neuro-inspired decision-making models* 1. The two-alternative forced-choice task (2-AFC). Optimal decisions: SPRT, LAM and DDM*. 2. Optimal performance curves. 3. MSPRT: an asymptotically optimal scheme for n > 2 choices (Dragalin et al., ). 4. LAM realizations of n-AFC; mean RT vs ER; Hick’s law. 5. Summary (the maximal order statistics) * Optimality viewpoint: maybe animals can’t do it, but they can’t do better. ** Sequential probability ratio test, leaky accumulator model, drift-diffusion model.

2-AFC, SPRT, LAM & DDM p 1 (x) p 2 (x) Choosing between 2 alternatives with noisy incoming data Set thresholds +Z, -Z and form running product of likelihood ratios: Decide 1 (resp. 2) when R n first falls below -Z (resp. exceeds +Z). Theorem ( Wald, 1947; Barnard, 1946 ): SPRT is optimal among fixed or variable sample size tests in the sense that, for a given error rate (ER), expected # samples to decide is minimal. (Or, for given # samples, ER is minimal.)

DDM is the continuum limit of SPRT. Let +Z -Z Drift, a Extensive modeling of behavioral data (Stone, Laming, Ratcliff et al., ~ ).

There’s also increasing neural evidence for DDM: FEF: Schall, Stuphorn & Brown, Neuron, LIP: Gold & Shadlen, Neuron, 2002.

Balanced LAM reduces to DDM on invariant line: (linearized: race model if  ). Uncouple via stable OU flow in y 1 if  large, DD in y 2 if . Absolute thresholds in (x 1, x 2 ) become relative (x 2 - x 1 )! +Z -Z

LAM sample paths collapse towards an attracting invariant manifold. (cf. C. Brody: Machens et al., Science, 2005) First passage across threshold determines choice.  

Simple expressions for first passage times and ERs: Redn to 2 params: Can compute thresholds that maximize reward rate: (Gold-Shadlen, 2002; Bogacz et al., ) This leads to … (1)

Optimal performance curves (OPCs): Human behavioral data: the best are optimal, but what about the rest? Bad objective function, or bad learners? Left: RR defined previously; Right: a family of RR’s weighted for accuracy. Learning not considered here. (Bogacz et al., 2004; Simen, 2005.) Increasing acc. wt.

N-AFC: MSPRT & LAM MSPRT chooses among n alternatives by a max vs. next test: MSPRT is asymptotically optimal in the sense that # samples is minimal in the limit of low ERs (Dragalin et al, IEEE trans., ). A LAM realization of MSPRT (Usher-McClelland 2001) asymptotically predicts (cf. Usher et al, 2002)

The log(n-1) dependence is similar to Hick’s Law: RT = A + B log n or RT = B log (n+1). W.E. Hick, Q.J. Exp. Psych, We can provide a theoretical basis and predict explicit SNR and ER dependence in the coefficients A, B.

Multiplicative constants blow up log-ly as ER -> 0. Behavior for small and larger ERs: Empirical formula, generalizes (1), (2)

But a running max vs next test is computationally costly (?). LAM can approximately execute a max vs average test via absolute thresholds. n-unit LAM decoupled by: y 1 attracted to hyperplane y 1 = A, so max vs average becomes an absolute test! Attraction is faster for larger n: stable eigenvalue  1 ~ n. DD on hyperplane

Max vs average is not optimal, but it’s not so bad: absolute max vs average max vs next absolute max vs average max vs next Unbalanced LAMs - OU processes Max vs next and max vs ave coincide for n=2. As n increases, max vs ave deteriorates, approaching absolute test performance. But it’s still better for n < 8-10!

Simple LAM/DD predicts log (n-1), not log n or log (n+1) as in Hick’s law: but a distribution of starting points gives approx log n scaling for 2 < n < 8, and ER and SNR effects may also enter.

The effect of nonlinear activation functions, bounded below, is to shift scaling toward linear in n: The limited dynamic range degrades performance, but can be offset by suitable bias (recentering). Nonlinear LAMs Linearized LAM

Summary: N-AFC MSPRT max vs next test is asymptotically optimal in low ER limit. LAM (& race model) can perform max vs next test. Hick’s law: emerges for max vs next, max vs ave & absolute tests. A, B smallest for max vs next, OK for max vs ave. LAM executes a max vs average test on its attracting hyperplane using absolute thresholds. Variable start points give log n scaling for `small n.’ Nonlinear LAMs degrade performance: RT ~ n for sufficiently small dynamic range. More info: