Goal-Directed Feature and Memory Learning Cornelius Weber Frankfurt Institute for Advanced Studies (FIAS) Sheffield, 3 rd November 2009 Collaborators:

Goal-Directed Feature and Memory Learning Cornelius Weber Frankfurt Institute for Advanced Studies (FIAS) Sheffield, 3 rd November 2009 Collaborators: Sohrab Saeb and Jochen Triesch

for taking action, we need only the relevant features x y z

unsupervised learning in cortex reinforcement learning in basal ganglia state space actor Doya, 1999

background: - gradient descent methods generalize RL to several layers Sutton&Barto RL book (1998); Tesauro (1992;1995) - reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007) - reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002) - reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007);... - RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)

reinforcement learning go up? go right? go down? go left?

reinforcement learning input s action a weights

action a reinforcement learning minimizing value estimation error: d q(s,a) ≈ 0.9 q(s’,a’) - q(s,a) d q(s,a) ≈ 1 - q(s,a) moving target value fixed at goal q(s,a) value of a state-action pair (coded in the weights) repeated running to goal: in state s, agent performs best action a (with random), yielding s’ and a’ --> values and action choices converge input s weights

actor go right? go left? can’t handle this! simple input go right! complex input reinforcement learning input (state space)

sensory input reward action complex input scenario: bars controlled by actions, ‘up’, ‘down’, ‘left’, ‘right’; reward given if horizontal bar at specific position

need another layer(s) to pre-process complex data feature detection action selection network definition: s = softmax(W I) P(a=1) = softmax(Q s) q = a Q s a action s state I input Q weight matrix W weight matrix position of relevant bar encodes q feature detector

feature detection action selection network training: E = (0.9 q(s’,a’) - q(s,a)) 2 = δ 2 d Q ≈ dE/dQ = δ a s d W ≈ dE/dW = δ Q s I + ε a action s state I input W weight matrix minimize error w.r.t. current target reinforcement learning δ-modulated unsupervised learning Q weight matrix

note: non-negativity constraint on weights Details: network training minimizes error w.r.t. target V π identities used:

SARSA with WTA input layer (v should be q here)

RL action weights feature weights data learning the ‘short bars’ data reward action

short bars in 12x12 average # of steps to goal: 11

RL action weights feature weights input reward 2 actions (not shown) data learning ‘long bars’ data

WTA non-negative weights SoftMax non-negative weights SoftMax no weight constraints

extension to memory...

if there are detection failures of features...... it would be good to have memory or a forward model grey bars are invisible to the network

s(t-1) a(t-1) network training by gradient descent as previously softmax function used; no weight constraint a action s state I input W feature weights Q action weights

learnt feature detectors

the network updates its trajectory internally

network performance

discussion - two-layer SARSA RL performs gradient descent on value estimation error - approximation with winner-take-all leads to local rule with δ-feedback - learns only action-relevant features - non-negative coding aids feature extraction - memory weights develop into a forward model - link between unsupervised- and reinforcement learning - demonstration with more realistic data still needed

Bernstein Focus Neurotechnology, BMBF grant 01GQ0840 EU project 231722 “IM-CLeVeR”, call FP7-ICT-2007-3 Frankfurt Institute for Advanced Studies, FIAS Sponsors: Thank you! Collaborators: Sohrab Saeb and Jochen Triesch

Goal-Directed Feature and Memory Learning Cornelius Weber Frankfurt Institute for Advanced Studies (FIAS) Sheffield, 3 rd November 2009 Collaborators:

Similar presentations

Presentation on theme: "Goal-Directed Feature and Memory Learning Cornelius Weber Frankfurt Institute for Advanced Studies (FIAS) Sheffield, 3 rd November 2009 Collaborators:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Goal-Directed Feature and Memory Learning Cornelius Weber Frankfurt Institute for Advanced Studies (FIAS) Sheffield, 3 rd November 2009 Collaborators:

Similar presentations

Presentation on theme: "Goal-Directed Feature and Memory Learning Cornelius Weber Frankfurt Institute for Advanced Studies (FIAS) Sheffield, 3 rd November 2009 Collaborators:"— Presentation transcript:

Similar presentations

About project

Feedback