Download presentation
Presentation is loading. Please wait.
1
Goal-Directed Feature and Memory Learning
Cornelius Weber Frankfurt Institute for Advanced Studies (FIAS) Sheffield, 3rd November 2009 Collaborators: Sohrab Saeb and Jochen Triesch
2
for taking action, we need only the relevant features
z x
3
actor state space unsupervised learning in cortex reinforcement
in basal ganglia Doya, 1999
4
- gradient descent methods generalize RL to several layers
background: - gradient descent methods generalize RL to several layers Sutton&Barto RL book (1998); Tesauro (1992;1995) - reward-modulated Hebb Triesch, Neur Comp 19, (2007), Roelfsema & Ooyen, Neur Comp 17, (2005); Franz & Triesch, ICDL (2007) - reward-modulated activity leads to input selection Nakahara, Neur Comp 14, (2002) - reward-modulated STDP Izhikevich, Cereb Cortex 17, (2007), Florian, Neur Comp 19/6, (2007); Farries & Fairhall, Neurophysiol 98, (2007); ... - RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)
5
reinforcement learning
go up? go right? go down? go left?
6
reinforcement learning
action a weights input s
7
action a input s reinforcement learning
q(s,a) value of a state-action pair (coded in the weights) action a weights input s minimizing value estimation error: d q(s,a) ≈ 0.9 q(s’,a’) - q(s,a) d q(s,a) ≈ q(s,a) repeated running to goal: in state s, agent performs best action a (with random), yielding s’ and a’ moving target value fixed at goal --> values and action choices converge
8
actor input (state space) reinforcement learning simple input
complex input go right! go right? go left? can’t handle this!
9
complex input scenario: bars controlled by actions, ‘up’, ‘down’, ‘left’, ‘right’; reward given if horizontal bar at specific position sensory input action reward
10
need another layer(s) to pre-process complex data
a action action selection Q weight matrix encodes q s state position of relevant bar feature detection W weight matrix feature detector I input network definition: s = softmax(W I) P(a=1) = softmax(Q s) q = a Q s
11
a action Q weight matrix action selection s state feature detection
W weight matrix I input network training: E = (0.9 q(s’,a’) - q(s,a))2 = δ2 d Q ≈ dE/dQ = δ a s d W ≈ dE/dW = δ Q s I + ε minimize error w.r.t. current target reinforcement learning δ-modulated unsupervised learning
12
Details: network training minimizes error w.r.t. target Vπ
identities used: note: non-negativity constraint on weights
13
SARSA with WTA input layer
(v should be q here)
14
learning the ‘short bars’ data
feature weights RL action weights data action reward
15
short bars in 12x12 average # of steps to goal: 11
16
learning ‘long bars’ data
RL action weights feature weights data input reward actions (not shown)
17
WTA non-negative weights SoftMax non-negative weights SoftMax no weight constraints
18
extension to memory ...
19
if there are detection failures of features ...
grey bars are invisible to the network ... it would be good to have memory or a forward model
20
a action Q action weights a(t-1) s state s(t-1) W feature weights I input network training by gradient descent as previously softmax function used; no weight constraint
21
learnt feature detectors
22
the network updates its trajectory internally
23
network performance
24
discussion - two-layer SARSA RL performs gradient descent on value estimation error - approximation with winner-take-all leads to local rule with δ-feedback - learns only action-relevant features - non-negative coding aids feature extraction - memory weights develop into a forward model - link between unsupervised- and reinforcement learning - demonstration with more realistic data still needed
25
video
27
Thank you! Collaborators: Sohrab Saeb and Jochen Triesch Sponsors: Bernstein Focus Neurotechnology, BMBF grant 01GQ0840 EU project “IM-CLeVeR”, call FP7-ICT Frankfurt Institute for Advanced Studies, FIAS
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.