Download presentation
Presentation is loading. Please wait.
1
Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition in Vision” Workshop at the Frankfurt Institute for Advanced Studies (FIAS), November 27-28, 2008
2
for taking action, we need only the relevant features x y z
3
models’ background & overview: - unsupervised feature learning models are enslaved by bottom-up input - reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002) - reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007);... - RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996) - reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007) (model 3 presented here, extending to delayed reward) - feature-pruning models learn all features but forget the irrelevant ones (models 1 & 2 presented here)
4
sensory input reward action purely sensory data, in which one feature type is linked to reward the action is not controlled by the network
5
model 1: obtaining the relevant features 1) build a feature detecting model 2) learn associations between features 3) register the average features’ reward 4) spread value along associative connections 5) check whether actions in-/decrease value 6) remove features where action doesn’t matter irrelevant relevant
6
Földiák, Biol Cybern 64, 165-70 (1990) → homogeneous activity distr. features thresholds lateral weights (decorrelation) selected features associative weights action effect Weber & Triesch, Proc ICANN, 740-9 (2008); Witkowski, Adap Behav, 15(1), 73-97 (2007); Toussaint, Proc NIPS, 929-36 (2003); Weber, Proc ICANN, 1147-52 (2001) → relevant features indentified
7
sensory inputreward motor-sensory data (again, one feature type is linked to reward) the network selects the action (to get reward) irrelevant subspace relevant subspace
8
model 2: removing the irrelevant inputs 1) initialize feature detecting model (but continue learning) 2) perform actor-critic RL, taking the features’ outputs as state representation - works despite irrelevant features - challenge: relevant features will occur at different frequencies - nevertheless, features may remain stable 3) observe the critic: puts negative value on irrelevant features after long training 4) modulate (multiply) learning by critic’s value frequency value
9
Lücke & Bouecke, Proc ICANN, 31-7 (2005) features critic value action weights → relevant subspace discovered
10
model 3: learning only the relevant inputs 1) top level: reinforcement learning model (SARSA) 2) lower level: feature learning model (SOM / K-means) 3) modulate learning by δ, in both layers RL weights feature weights input action
11
model 3: SARSA with SOM-like activation and update
12
relevant subspace RL action weights subspace coverage feature weights
13
RL action weights feature weights input reward 2 actions (not shown) data learning ‘long bars’ data
14
RL action weights feature weights input data: bars controlled by actions ‘up’, ‘down’, ‘left’, ‘right’ learning the ‘short bars’ data reward action
15
short bars in 12x12 average # of steps to goal: 11
16
cortex striatum GPi (output of basal ganglia) biological interpretation - no direct feedback from striatum to cortex - convergent mapping → little receptive field overlap, consistent with subspace discovery feature/subspace detection action selection
17
Discussion - models 1 and 2 learn all features and identify the relevent features - either requires homogeneous feature distribution (model 1) - or can do only subspace- (no real feature) detection (model 2) - model 3 is very simple: SARSA on SOM with δ-feedback - learns only the relevant subspace or features in the first place - link between unsupervised- and reinforcement learning Sponsors Bernstein Focus Neurotechnology EU project 231722 “IM-CLeVeR” call FP7-ICT-2007-3 Frankfurt Institute for Advanced Studies FIAS
18
early learninglate learning Jog et al, Science, 286, 1158-61 (1999) relevant features change during learning units in the basal ganglia are active at the junction during early task acquisition but not at a later stage T - maze decision task (rat)
19
evidence for reward/action modulated learning in the visual system Shuler & Bear, "Reward timing in the primary visual cortex", Science, 311, 1606-9 (2006) Schoups et al. "Practising orientation identification improves orientation coding in V1 neurons" Nature, 412, 549-53 (2001)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.