Download presentation
Presentation is loading. Please wait.
1
On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS presented at Honda HRI, Offenbach, 17 th March 2009
2
for taking action, we need only the relevant features x y z
3
unsupervised learning in cortex reinforcement learning in basal ganglia state space actor Doya, 1999
4
actor state space 1-layer RL model of BG... go left? go right?... is too simple to handle complex input
5
complex input (cortex) need another layer(s) to pre-process complex data feature detection action selection actor state space
6
models’ background: - gradient descent methods generalize RL to several layers Sutton&Barto RL book (1998); Tesauro (1992;1995) - reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007) - reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002) - reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007);... - RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)
7
sensory input reward action scenario: bars controlled by actions, ‘up’, ‘down’, ‘left’, ‘right’; reward given if horizontal bar at specific position
8
model that learns the relevant features top layer: SARSA RL lower layer: winner-take-all feature learning both layers: modulate learning by δ RL weights feature weights input action
9
SARSA with WTA input layer
10
note: non-negativity constraint on weights Energy function: estimation error of state-action value identities used:
11
RL action weights feature weights data learning the ‘short bars’ data reward action
12
short bars in 12x12 average # of steps to goal: 11
13
RL action weights feature weights input reward 2 actions (not shown) data learning ‘long bars’ data
14
WTA non-negative weights SoftMax non-negative weights SoftMax no weight constraints
15
Discussion - simple model: SARSA on winner-take-all network with δ-feedback - learns only the features that are relevant for action strategy - theory behind: derivation of value function estimation (approx.) - non-negative coding aids feature extraction - link between unsupervised- and reinforcement learning - demonstration with more realistic data needed Bernstein Focus Neurotechnology, BMBF grant 01GQ0840 EU project 231722 “IM-CLeVeR”, call FP7-ICT-2007-3 Frankfurt Institute for Advanced Studies, FIAS Sponsors
18
Bernstein Focus Neurotechnology, BMBF grant 01GQ0840 EU project 231722 “IM-CLeVeR”, call FP7-ICT-2007-3 Frankfurt Institute for Advanced Studies, FIAS Sponsors thank you...
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.