Institute for Theoretical Physics and Mathematics Tehran January, 2006 Value based decision making: behavior and theory.

Slides:



Advertisements
Similar presentations
Object Persistence for Synthetic Characters Damian Isla Bungie Studios Microsoft Corp. Bruce Blumberg Synthetic Characters MIT Media Lab.
Advertisements

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Cognitive Systems, ICANN panel, Q1 What is machine intelligence, as beyond pattern matching, classification and prediction. What is machine intelligence,
Design of Experiments Lecture I
V1 Physiology. Questions Hierarchies of RFs and visual areas Is prediction equal to understanding? Is predicting the mean responses enough? General versus.
Biological Modeling of Neural Networks: Week 9 – Coding and Decoding Wulfram Gerstner EPFL, Lausanne, Switzerland 9.1 What is a good neuron model? - Models.
Chapter 4: Linear Models for Classification
Quasi-Continuous Decision States in the Leaky Competing Accumulator Model Jay McClelland Stanford University With Joel Lachter, Greg Corrado, and Jim Johnston.
Introduction: Neurons and the Problem of Neural Coding Laboratory of Computational Neuroscience, LCN, CH 1015 Lausanne Swiss Federal Institute of Technology.
Decision Dynamics and Decision States: the Leaky Competing Accumulator Model Psychology 209 March 4, 2013.
1Neural Networks B 2009 Neural Networks B Lecture 1 Wolfgang Maass
EE141 1 Broca’s area Pars opercularis Motor cortexSomatosensory cortex Sensory associative cortex Primary Auditory cortex Wernicke’s area Visual associative.
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
Decision making. ? Blaise Pascal Probability in games of chance How much should I bet on ’20’? E[gain] = Σgain(x) Pr(x)
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
From T. McMillen & P. Holmes, J. Math. Psych. 50: 30-57, MURI Center for Human and Robot Decision Dynamics, Sept 13, Phil Holmes, Jonathan.
Robert M. Saltzman © DS 851: 4 Main Components 1.Applications The more you see, the better 2.Probability & Statistics Computer does most of the work.
CH 1 Introduction Prof. Ming-Shaung Ju Dept. of Mechanical Engineering NCKU.
Predicting switching behavior: using leaky integrator model Jinsook Roh Dan Corson In Dezhe Jin’s team.
Inference in Dynamic Environments Mark Steyvers Scott Brown UC Irvine This work is supported by a grant from the US Air Force Office of Scientific Research.
Biological Modeling of Neural Networks: Week 15 – Population Dynamics: The Integral –Equation Approach Wulfram Gerstner EPFL, Lausanne, Switzerland 15.1.
Neural circuits for bias and sensitivity in decision-making Jan Lauwereyns Associate Professor, Victoria University of Wellington, New Zealand Long-term.
by B. Zadrozny and C. Elkan
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 / 41 Inference and Computation with Population Codes 13 November 2012 Inference and Computation with Population Codes Alexandre Pouget, Peter Dayan,
2 2  Background  Vision in Human Brain  Efficient Coding Theory  Motivation  Natural Pictures  Methodology  Statistical Characteristics  Models.
Prognosis of Gear Health Using Gaussian Process Model Department of Adaptive systems, Institute of Information Theory and Automation, May 2011, Prague.
Optimal Therapy After Stroke: Insights from a Computational Model Cheol Han June 12, 2007.
1 S ystems Analysis Laboratory Helsinki University of Technology Kai Virtanen, Tuomas Raivio and Raimo P. Hämäläinen Systems Analysis Laboratory Helsinki.
October, 2000.A Self Organsing NN for Job Scheduling in Distributed Systems I.C. Legrand1 Iosif C. Legrand CALTECH.
Decision Making Theories in Neuroscience Alexander Vostroknutov October 2008.
A neural mechanism of response bias Johan Lauwereyns Laboratory of Sensorimotor Research National Eye Institute, NIH.
Motor Control. Beyond babbling Three problems with motor babbling: –Random exploration is slow –Error-based learning algorithms are faster but error signals.
Noise reduction and addition in sensory-motor processing Stephen G. Lisberger Howard Hughes Medical Institute Department of Physiology, UCSF.
Seeing motion : From neural circuits to perceptual decisions.
Cooperative Q-Learning Lars Blackmore and Steve Block Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents Tan, M Proceedings of the.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
BCS547 Neural Decoding.
Chapter 7. Learning through Imitation and Exploration: Towards Humanoid Robots that Learn from Humans in Creating Brain-like Intelligence. Course: Robots.
Choosing an Investment Portfolio
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
NTU & MSRA Ming-Feng Tsai
Gabrielle J. Gutierrez 1, Larry F. Abbott 2, Eve Marder 1 1 Volen Center for Complex Systems, Brandeis University 2 Department of Neuroscience, Department.
Psychology and Neurobiology of Decision-Making under Uncertainty Angela Yu March 11, 2010.
Instructor: Chen-Hsiung Yang Dynamic System Modeling Analysis and Control 動態系統建模分析與控制 Lecture1 Introduction to System Dynamics.
Does the brain compute confidence estimates about decisions?
Process and System Characterization Describe and characterize transport and transformation phenomena based reactor dynamics ( 반응공학 ) – natural and engineered.
Optimal Decision-Making in Humans & Animals Angela Yu March 05, 2009.
Neural Coding of Basic Reward Terms of Animal Learning Theory, Game Theory, Microeconomics and Behavioral Ecology Wolfram Schultz Current Opinion in Neurobiology.
Prepared by Fayes Salma.  Introduction: Financial Tasks  Data Mining process  Methods in Financial Data mining o Neural Network o Decision Tree  Trading.
Computer Simulation Henry C. Co Technology and Operations Management,
Spontaneous activity in V1: a probabilistic framework
Brain Initiative Informational Conference Call
Dynamical Models of Decision Making Optimality, human performance, and principles of neural information processing Jay McClelland Department of Psychology.
The origins of motor noise
Dr. Unnikrishnan P.C. Professor, EEE
Volume 87, Issue 1, Pages (July 2015)
Volume 40, Issue 6, Pages (December 2003)
Activity in Posterior Parietal Cortex Is Correlated with the Relative Subjective Desirability of Action  Michael C. Dorris, Paul W. Glimcher  Neuron 
Confidence as Bayesian Probability: From Neural Origins to Behavior
Decision Making as a Window on Cognition
Interaction of Sensory and Value Information in Decision-Making
Multiple Timescales of Memory in Lateral Habenula and Dopamine Neurons
Perceptual learning Nisheeth 15th February 2019.
Christine Fry and Alex Park March 30th, 2004
Deep Reinforcement Learning: Learning how to act using a deep neural network Psych 209, Winter 2019 February 12, 2019.
Richard A. Andersen, He Cui  Neuron 
Volume 28, Issue 18, Pages e3 (September 2018)
Presentation transcript:

Institute for Theoretical Physics and Mathematics Tehran January, 2006 Value based decision making: behavior and theory

Greg Corrado Leo Sugrue

SENSORY INPUT DECISION MECHANISMS ADAPTIVE BEHAVIOR low level sensory analyzers motor output structures

SENSORY INPUT DECISION MECHANISMS ADAPTIVE BEHAVIOR low level sensory analyzers motor output structures REWARD HISTORY representation of stimulus/ action value

How do we measure value? Herrnstein RJ, 1961

The Matching Law Choice Fraction

Behavior: What computation does the monkey use to ‘match’? Theory: Can we build a model that replicates the monkeys’ behavior on the matching task? How can we validate the performance of the? model? Why is a model useful? Physiology: What are the neural circuits and signal transformations within the brain that implement the computation?

An eye movement matching task Baiting Fraction 1:11:1 6:16:1 1:61:6 6:16:1 1:21:2 2:12:1 1:21:2

Dynamic Matching Behavior

Rewards Dynamic Matching Behavior

Responses Rewards Dynamic Matching Behavior

Relation Between Reward and Choice is Local Responses Rewards

How do they do this? What local mechanism underlies the monkey’s choices in this game? To estimate this mechanism we need a modeling framework.

Linear-Nonlinear-Poisson (LNP) Models of choice behavior Strategy estimation is straightforward

How do animals weigh past rewards in determining current choice? Estimating the form of the linear stage

How is differential value mapped onto the animal’s instantaneous probability of choice? Estimating the form of the nonlinear stage Differential Value

Differential Value (rewards) Monkey F Monkey G Probability of Choice (red)

Our LNP Model of Choice Behavior Model Validation Can the model predict the monkey’s next choice? Can the model generate behavior on its own?

Can the model predict the monkey’s next choice?

Predicting the next choice: single experiment

Predicting the next choice: all experiments

Can the model generate behavior on its own?

Model generated behavior: single experiment

Distribution of stay durations summarizes behavior across all experiments Stay Duration (trials)

Model generated behavior: all experiments Stay Duration (trials)

Model generated behavior: all experiments Stay Duration (trials)

1.Explore second order behavioral questions 2.Explore neural correlates of valuation Ok, now that you have a reasonable model what can you do with it?

1.Explore second order behavioral questions 2.Explore neural correlates of valuation Ok, now that you have a reasonable model what can you do with it?

choice history: Surely ‘not getting a reward’ also has some influence on the monkey’s behavior? reward history: Choice of Model Input

choice history: reward history:  the value of an unrewarded choice hybrid history:  Choice of Model Input

Systematically vary the value of  Estimate new L and N stages for the model Test each new model’s ability to a) predict choice and b) generate behavior hybrid history:  Can we build a better model by taking unrewarded choices into account?

Value of Unrewarded Choices (  ) Predictive Performance Generative Performance Unrewarded choices: The value of nothin’

Value of Unrewarded Choices (  ) Predictive Performance Generative Performance Stay Duration Histogram Overlap (%) Unrewarded choices: The value of nothin’

Contrary to our intuition inclusion of information about unrewarded choices does not improve model performance Choice of Model Input

Optimality of Parameters

Weighting of past rewards Is there an ‘optimal’ weighting function to maximize the rewards a player can harvest in this game?

The tuning of the  2 (long) component of the L- stage affects foraging efficiency. Monkeys have found this optimum. Weighting of past rewards The  1 (short) component of the L-stage does not affect foraging efficiency. Why do monkeys overweight recent rewards? The tuning of the , the nonlinear function relating value to p(choice) affects foraging efficiency. The monkeys have found this optimum also.

The differential model is a better predictor of monkey choice

Monkeys match; best LNP model Model predicts and generates choices Monkeys find optimal  2 and  ;  1 not critical Unrewarded choices have no effect Differential value predicts choices better than fractional value

?

Best LNP model: Candidate decision variable, differential value: g(v1 - v2) = pc

Aside: what would Bayes do? 1) maintain beliefs over baiting probabilities 2) be greedy or use dynamic programming

Firing rates in LIP are related to target value on a trial-by-trial basis LIP gm020b into RF out of RF Target Value

The differential model also accounts for more variance in LIP firing rates

How we control/measure value An experimental task based on that principle A simple model of value based choice How we validate that model How we use the model to explore behavior How we use the model to explore value related signals in the brain What I’ve told you: Our Linear-Nonlinear-Poisson model Hybrid models, optimality of reward weights Neural firing in area LIP correlates with ‘differential value’ on a trial-by-trial basis A dynamic foraging task The matching law Predictive and generative validation

Foraging Efficiency Varies as a Function of  2

Foraging Efficiency Does Not Vary as a Function of  1

What do animals do? Matching is a probabilistic policy: Matching is almost optimal within the set of probabilistic policies. Animals match.

+ the change over delay

Greg Corrado

How do we implement the change over delay? only one ‘live’ target at a time