Download presentation
Presentation is loading. Please wait.
1
Decision making
2
?
3
Blaise Pascal 1623 - 1662 Probability in games of chance How much should I bet on ’20’? E[gain] = Σgain(x) Pr(x)
4
Decisions under uncertainty Maximize expected value (Pascal) Bets should be assessed according to
5
Decisions under uncertainty The value of an alternative is a monotonous function of the Probability of reward Magnitude of reward
6
Do Classical Decision Variables Influence Brain Activity in LIP? LIP
7
Varying Movement Value Platt and Glimcher 1999
11
What Influences LIP? Related to Movement Desirability Value/Utility of Reward Probability of Reward
12
Varying Movement Probability
15
What Influences LIP? Related to Movement Desirability Value/Utility of Reward Probability of Reward
17
Decisions under uncertainty Neural activity in area LIP depends on: Probability of reward Magnitude of reward
18
Dorris and Glimcher 2004 Relative or absolute reward?
19
?
20
$X $Y $Z
21
$A $B $C $D $E
22
Consider a set of alternatives X and a binary relation on it,, interpreted as “preferred at least as”. Consider the following three axioms: C1. Completeness: For every C2. Transitivity: For every C3. Separability Maximization of utility
23
Theorem: A binary relation can be represented by a real-valued function if and only if it satisfies C1-C3 Under these conditions, the function u is unique up to increasing transformation (Cantor 1915)
24
A face utility function?
25
In there an explicit representation of ‘value’ of a choice in the brain?
26
Neurons in the orbitofrontal cortex encode value Padoa-Schioppa and Assad, 2006
30
Examples of neurons encoding the chosen value
31
A neuron encoding the value of A
32
A neuron encoding the value of B
33
A neuron encoding the chosen juice taste
34
Encoding takes place at different times post-offer (a, d, e, blue), pre-juice (b, cyan), post-juice (c, f, black)
35
How does the brain learn the values?
36
The computational problem The goal is to maximize the sum of rewards
37
The computational problem The value of the state S 1 depends on the policy If the animal chooses ‘right’ at S 1,
38
How to find the optimal policy in a complicated world?
39
If values of the different states are known then this task is easy
40
How to find the optimal policy in a complicated world? If values of the different states are known then this task is easy How can the values of the different states be learned?
41
V(S t ) = the value of the state at time t r t = the (average) reward delivered at time t V(S t+1 ) = the value of the state at time t+1
42
where is the TD error. The TD (temporal difference) learning algorithm
43
Schultz, Dayan and Montague, Science, 1997
44
CS Reward Before trial 1: 1 234 5 678 9 In trial 1: no reward in states 1-7 reward of size 1 in states 8
45
CS Reward Before trial 2: 1 234 5 678 9 In trial 2, for states 1-6 For state 7,
46
CS Reward Before trial 2: 1 234 5 678 9 For state 8,
47
CS Reward Before trial 3: 1 234 5 678 9 In trial 2, for states 1-5 For state 6,
48
CS Reward 1 234 5 678 9 For state 7, Before trial 3: For state 8,
49
CS Reward After many trials 1 234 5 678 9 Except for the CS whose time is unknown
51
Schultz, 1998
52
Bayer and Glimcher, 1998 “We found that these neurons encoded the difference between the current reward and a weighted average of previous rewards, a reward prediction error, but only for outcomes that were better than expected”.
53
Bayer and Glimcher, 1998
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.