Decision making
?
Blaise Pascal Probability in games of chance How much should I bet on ’20’? E[gain] = Σgain(x) Pr(x)
Decisions under uncertainty Maximize expected value (Pascal) Bets should be assessed according to
Decisions under uncertainty The value of an alternative is a monotonous function of the Probability of reward Magnitude of reward
Do Classical Decision Variables Influence Brain Activity in LIP? LIP
Varying Movement Value Platt and Glimcher 1999
What Influences LIP? Related to Movement Desirability Value/Utility of Reward Probability of Reward
Varying Movement Probability
What Influences LIP? Related to Movement Desirability Value/Utility of Reward Probability of Reward
Decisions under uncertainty Neural activity in area LIP depends on: Probability of reward Magnitude of reward
Dorris and Glimcher 2004 Relative or absolute reward?
?
$X $Y $Z
$A $B $C $D $E
Consider a set of alternatives X and a binary relation on it,, interpreted as “preferred at least as”. Consider the following three axioms: C1. Completeness: For every C2. Transitivity: For every C3. Separability Maximization of utility
Theorem: A binary relation can be represented by a real-valued function if and only if it satisfies C1-C3 Under these conditions, the function u is unique up to increasing transformation (Cantor 1915)
A face utility function?
In there an explicit representation of ‘value’ of a choice in the brain?
Neurons in the orbitofrontal cortex encode value Padoa-Schioppa and Assad, 2006
Examples of neurons encoding the chosen value
A neuron encoding the value of A
A neuron encoding the value of B
A neuron encoding the chosen juice taste
Encoding takes place at different times post-offer (a, d, e, blue), pre-juice (b, cyan), post-juice (c, f, black)
How does the brain learn the values?
The computational problem The goal is to maximize the sum of rewards
The computational problem The value of the state S 1 depends on the policy If the animal chooses ‘right’ at S 1,
How to find the optimal policy in a complicated world?
If values of the different states are known then this task is easy
How to find the optimal policy in a complicated world? If values of the different states are known then this task is easy How can the values of the different states be learned?
V(S t ) = the value of the state at time t r t = the (average) reward delivered at time t V(S t+1 ) = the value of the state at time t+1
where is the TD error. The TD (temporal difference) learning algorithm
Schultz, Dayan and Montague, Science, 1997
CS Reward Before trial 1: In trial 1: no reward in states 1-7 reward of size 1 in states 8
CS Reward Before trial 2: In trial 2, for states 1-6 For state 7,
CS Reward Before trial 2: For state 8,
CS Reward Before trial 3: In trial 2, for states 1-5 For state 6,
CS Reward For state 7, Before trial 3: For state 8,
CS Reward After many trials Except for the CS whose time is unknown
Schultz, 1998
Bayer and Glimcher, 1998 “We found that these neurons encoded the difference between the current reward and a weighted average of previous rewards, a reward prediction error, but only for outcomes that were better than expected”.
Bayer and Glimcher, 1998