Presentation is loading. Please wait.

Presentation is loading. Please wait.

On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber FIGSS talk, FIAS, 20 th April 2009.

Similar presentations


Presentation on theme: "On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber FIGSS talk, FIAS, 20 th April 2009."— Presentation transcript:

1 On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber FIGSS talk, FIAS, 20 th April 2009

2 for taking action, we need only the relevant features x y z

3 unsupervised learning in cortex reinforcement learning in basal ganglia state space actor Doya, 1999

4 reinforcement learning go up? go right? go down? go left?

5 reinforcement learning input s action a weights

6 action a reinforcement learning minimizing value estimation error: d v(s,a) ≈ 0.9 v(s’,a’) - v(s,a)‏ d v(s,a) ≈ 1 - v(s,a)‏ moving target value fixed at goal v(s,a) value of a state-action pair (coded in the weights)‏ repeated running to goal: in state s, agent performs best action a (with random), yielding s’ and a’ --> values and action choices converge input s weights

7 actor go right? go left? can’t handle this! simple input go right! complex input reinforcement learning input (state space)‏

8 sensory input reward action complex input scenario: bars controlled by actions, ‘up’, ‘down’, ‘left’, ‘right’; reward given if horizontal bar at specific position

9 need another layer(s) to pre-process complex data feature detection action selection network definition: s = softmax(W I)‏ P(a=1) = softmax(Q s)‏ v = a Q s a action s state I input Q weight matrix W weight matrix position of relevant bar encodes v feature detector

10 feature detection action selection network training: E = (0.9 v(s’,a’) - v(s,a)) 2 = δ 2 d Q ≈ dE/dQ = δ a s d W ≈ dE/dW = δ Q s I + ε a action s state I input W weight matrix minimize error w.r.t. current target reinforcement learning δ-modulated unsupervised learning Q weight matrix

11 note: non-negativity constraint on weights network training: minimize error w.r.t. target V π identities used:

12 SARSA with WTA input layer

13 RL action weights feature weights data learning the ‘short bars’ data reward action

14 short bars in 12x12 average # of steps to goal: 11

15 RL action weights feature weights input reward 2 actions (not shown)‏ data learning ‘long bars’ data

16 WTA non-negative weights SoftMax non-negative weights SoftMax no weight constraints

17 models’ background: - gradient descent methods generalize RL to several layers Sutton&Barto RL book (1998); Tesauro (1992;1995) - reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007)‏ - reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002) - reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007);... - RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)‏

18 Discussion - two-layer SARSA RL performs gradient descent on value estimation error - approximation with winner-take-all leads to local rule with δ-feedback - learns only action-relevant features - non-negative coding aids feature extraction - link between unsupervised- and reinforcement learning - demonstration with more realistic data still needed Bernstein Focus Neurotechnology, BMBF grant 01GQ0840 EU project 231722 “IM-CLeVeR”, call FP7-ICT-2007-3 Frankfurt Institute for Advanced Studies, FIAS Sponsors

19

20

21 Bernstein Focus Neurotechnology, BMBF grant 01GQ0840 EU project 231722 “IM-CLeVeR”, call FP7-ICT-2007-3 Frankfurt Institute for Advanced Studies, FIAS Sponsors thank you...


Download ppt "On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber FIGSS talk, FIAS, 20 th April 2009."

Similar presentations


Ads by Google