Decision making. ? Blaise Pascal 1623 - 1662 Probability in games of chance How much should I bet on ’20’? E[gain] = Σgain(x) Pr(x)

Slides:

Advertisements

Similar presentations

Unconditioned stimulus (food) causes unconditioned response (saliva) Conditioned stimulus (bell) causes conditioned response (saliva)

Advertisements

Markov Decision Process

Data Mining Classification: Alternative Techniques

Institute for Theoretical Physics and Mathematics Tehran January, 2006 Value based decision making: behavior and theory.

Reinforcement learning (Chapter 21)

CPSC 502, Lecture 11Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 11 Oct, 18, 2011.

Decision Theory: Single Stage Decisions Computer Science cpsc322, Lecture 33 (Textbook Chpt 9.2) March, 30, 2009.

Reinforcement learning

1 Decision making. 2 How does the brain learn the values?

CS 589 Information Risk Management 23 January 2007.

Decision-making II choosing between gambles neural basis of decision-making.

Do we always make the best possible decisions?

Dopamine, Uncertainty and TD Learning CNS 2004 Yael Niv Michael Duff Peter Dayan Gatsby Computational Neuroscience Unit, UCL.

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Introduction: What does phasic Dopamine encode ? With asymmetric coding of errors, the mean TD error at the time of reward is proportional to p(1-p) ->

Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.

Glimcher Decision Making. Signal Detection Theory With Gaussian Assumption Without Gaussian Assumption Equivalent to Maximum Likelihood w/o Cost Function.

Reward processing (1) There exists plenty of evidence that midbrain dopamine systems encode errors in reward predictions (Schultz, Neuron, 2002) Changes.

Decision-making I choosing between gambles neural basis of decision-making.

FIGURE 4 Responses of dopamine neurons to unpredicted primary reward (top) and the transfer of this response to progressively earlier reward-predicting.

The fussy brain: What makes one option more attractive than another? Steve Fleming and Louise Whiteley.

Reinforcement learning This is mostly taken from Dayan and Abbot ch. 9 Reinforcement learning is different than supervised learning in that there is no.

Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein MTAT Seminar in Computational Neuroscience Zurab Bzhalava.

Chapter 16. Basal Ganglia Models for Autonomous Behavior Learning in Creating Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from Humans.

Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 1: Generalized policy iteration.

Chapter 50 The Prefrontal Cortex and Executive Brain Functions Copyright © 2014 Elsevier Inc. All rights reserved.

Decision Making Theories in Neuroscience Alexander Vostroknutov October 2008.

Unconditioned stimulus (food) causes unconditioned response (saliva) Conditioned stimulus (bell) causes conditioned response (saliva)

Chapter 16: Making Simple Decision March 23, 2004.

Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.

Reinforcement learning (Chapter 21)

1 Automated Planning and Decision Making 2007 Automated Planning and Decision Making Prof. Ronen Brafman Various Subjects.

Neural correlates of risk sensitivity An fMRI study of instrumental choice behavior Yael Niv, Jeffrey A. Edlund, Peter Dayan, and John O’Doherty Cohen.

Does the brain compute confidence estimates about decisions?

Behavioral Finance Preferences Part I Feb 16 Behavioral Finance Economics 437.

Optimal Decision-Making in Humans & Animals Angela Yu March 05, 2009.

Neural Coding of Basic Reward Terms of Animal Learning Theory, Game Theory, Microeconomics and Behavioral Ecology Wolfram Schultz Current Opinion in Neurobiology.

Artificial Neural Networks

Reinforcement learning (Chapter 21)

Reinforcement learning (Chapter 21)

An Overview of Reinforcement Learning

Soyoun Kim, Jaewon Hwang, Daeyeol Lee Neuron

Heterogeneous Coding of Temporally Discounted Values in the Dorsal and Ventral Striatum during Intertemporal Choice Xinying Cai, Soyoun Kim, Daeyeol.

Neuroimaging of associative learning

Rational Decisions and

The Neurobiology of Decision: Consensus and Controversy

Homework Schultz, Dayan, & Montague, Science, 1997

Choosing Goals, Not Rules: Deciding among Rule-Based Action Plans

Dr. Unnikrishnan P.C. Professor, EEE

Jung Hoon Sul, Hoseok Kim, Namjung Huh, Daeyeol Lee, Min Whan Jung

Value Representations in the Primate Striatum during Matching Behavior

Decision Making: From Neuroscience to Psychiatry

Neuroimaging of associative learning

13. Acting under Uncertainty Wolfram Burgard and Bernhard Nebel

The Psychology and Neuroscience of Curiosity

Activity in Posterior Parietal Cortex Is Correlated with the Relative Subjective Desirability of Action Michael C. Dorris, Paul W. Glimcher Neuron

Volume 19, Issue 18, Pages (September 2009)

Decision Making: From Neuroscience to Psychiatry

Decoding Cognitive Processes from Neural Ensembles

The Neurobiology of Decision: Consensus and Controversy

Volume 65, Issue 6, Pages (March 2010)

Erie D. Boorman, John P. O’Doherty, Ralph Adolphs, Antonio Rangel

Neuroimaging of associative learning

Value-Based Modulations in Human Visual Cortex

Orbitofrontal Cortex Uses Distinct Codes for Different Choice Attributes in Decisions Motivated by Curiosity Tommy C. Blanchard, Benjamin Y. Hayden,

Orbitofrontal Cortex as a Cognitive Map of Task Space

Matthew R. Roesch, Adam R. Taylor, Geoffrey Schoenbaum Neuron

Making Simple Decisions

Orbitofrontal Cortex Uses Distinct Codes for Different Choice Attributes in Decisions Motivated by Curiosity Tommy C. Blanchard, Benjamin Y. Hayden,

Presentation transcript:

Decision making

?

Blaise Pascal Probability in games of chance How much should I bet on ’20’? E[gain] = Σgain(x) Pr(x)

Decisions under uncertainty Maximize expected value (Pascal) Bets should be assessed according to

Decisions under uncertainty The value of an alternative is a monotonous function of the Probability of reward Magnitude of reward

Do Classical Decision Variables Influence Brain Activity in LIP? LIP

Varying Movement Value Platt and Glimcher 1999

What Influences LIP? Related to Movement Desirability Value/Utility of Reward Probability of Reward

Varying Movement Probability

What Influences LIP? Related to Movement Desirability Value/Utility of Reward Probability of Reward

Decisions under uncertainty Neural activity in area LIP depends on: Probability of reward Magnitude of reward

Dorris and Glimcher 2004 Relative or absolute reward?

?

$X $Y $Z

$A $B $C $D $E

Consider a set of alternatives X and a binary relation on it,, interpreted as “preferred at least as”. Consider the following three axioms: C1. Completeness: For every C2. Transitivity: For every C3. Separability Maximization of utility

Theorem: A binary relation can be represented by a real-valued function if and only if it satisfies C1-C3 Under these conditions, the function u is unique up to increasing transformation (Cantor 1915)

A face utility function?

In there an explicit representation of ‘value’ of a choice in the brain?

Neurons in the orbitofrontal cortex encode value Padoa-Schioppa and Assad, 2006

Examples of neurons encoding the chosen value

A neuron encoding the value of A

A neuron encoding the value of B

A neuron encoding the chosen juice taste

Encoding takes place at different times post-offer (a, d, e, blue), pre-juice (b, cyan), post-juice (c, f, black)

How does the brain learn the values?

The computational problem The goal is to maximize the sum of rewards

The computational problem The value of the state S 1 depends on the policy If the animal chooses ‘right’ at S 1,

How to find the optimal policy in a complicated world?

If values of the different states are known then this task is easy

How to find the optimal policy in a complicated world? If values of the different states are known then this task is easy How can the values of the different states be learned?

V(S t ) = the value of the state at time t r t = the (average) reward delivered at time t V(S t+1 ) = the value of the state at time t+1

where is the TD error. The TD (temporal difference) learning algorithm

Schultz, Dayan and Montague, Science, 1997

CS Reward Before trial 1: In trial 1: no reward in states 1-7 reward of size 1 in states 8

CS Reward Before trial 2: In trial 2, for states 1-6 For state 7,

CS Reward Before trial 2: For state 8,

CS Reward Before trial 3: In trial 2, for states 1-5 For state 6,

CS Reward For state 7, Before trial 3: For state 8,

CS Reward After many trials Except for the CS whose time is unknown

Schultz, 1998

Bayer and Glimcher, 1998 “We found that these neurons encoded the difference between the current reward and a weighted average of previous rewards, a reward prediction error, but only for outcomes that were better than expected”.

Bayer and Glimcher, 1998