Presentation is loading. Please wait.

Presentation is loading. Please wait.

On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS presented at Honda HRI, Offenbach, 17 th March 2009.

Similar presentations


Presentation on theme: "On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS presented at Honda HRI, Offenbach, 17 th March 2009."— Presentation transcript:

1 On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS presented at Honda HRI, Offenbach, 17 th March 2009

2 for taking action, we need only the relevant features x y z

3 unsupervised learning in cortex reinforcement learning in basal ganglia state space actor Doya, 1999

4 actor state space 1-layer RL model of BG... go left? go right?... is too simple to handle complex input

5 complex input (cortex)‏ need another layer(s) to pre-process complex data feature detection action selection actor state space

6 models’ background: - gradient descent methods generalize RL to several layers Sutton&Barto RL book (1998); Tesauro (1992;1995) - reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007)‏ - reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002) - reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007);... - RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)‏

7 sensory input reward action scenario: bars controlled by actions, ‘up’, ‘down’, ‘left’, ‘right’; reward given if horizontal bar at specific position

8 model that learns the relevant features top layer: SARSA RL lower layer: winner-take-all feature learning both layers: modulate learning by δ RL weights feature weights input action

9 SARSA with WTA input layer

10 note: non-negativity constraint on weights Energy function: estimation error of state-action value identities used:

11 RL action weights feature weights data learning the ‘short bars’ data reward action

12 short bars in 12x12 average # of steps to goal: 11

13 RL action weights feature weights input reward 2 actions (not shown)‏ data learning ‘long bars’ data

14 WTA non-negative weights SoftMax non-negative weights SoftMax no weight constraints

15 Discussion - simple model: SARSA on winner-take-all network with δ-feedback - learns only the features that are relevant for action strategy - theory behind: derivation of value function estimation (approx.)‏ - non-negative coding aids feature extraction - link between unsupervised- and reinforcement learning - demonstration with more realistic data needed Bernstein Focus Neurotechnology, BMBF grant 01GQ0840 EU project 231722 “IM-CLeVeR”, call FP7-ICT-2007-3 Frankfurt Institute for Advanced Studies, FIAS Sponsors

16

17

18 Bernstein Focus Neurotechnology, BMBF grant 01GQ0840 EU project 231722 “IM-CLeVeR”, call FP7-ICT-2007-3 Frankfurt Institute for Advanced Studies, FIAS Sponsors thank you...


Download ppt "On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS presented at Honda HRI, Offenbach, 17 th March 2009."

Similar presentations


Ads by Google