Presentation is loading. Please wait.

Presentation is loading. Please wait.

Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.

Similar presentations


Presentation on theme: "Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition."— Presentation transcript:

1 Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition in Vision” Workshop at the Frankfurt Institute for Advanced Studies (FIAS), November 27-28, 2008

2 for taking action, we need only the relevant features x y z

3 models’ background & overview: - unsupervised feature learning models are enslaved by bottom-up input - reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002) - reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007);... - RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)‏ - reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007)‏ (model 3 presented here, extending to delayed reward)‏ - feature-pruning models learn all features but forget the irrelevant ones (models 1 & 2 presented here)‏

4 sensory input reward action purely sensory data, in which one feature type is linked to reward the action is not controlled by the network

5 model 1: obtaining the relevant features 1) build a feature detecting model 2) learn associations between features 3) register the average features’ reward 4) spread value along associative connections 5) check whether actions in-/decrease value 6) remove features where action doesn’t matter irrelevant relevant

6 Földiák, Biol Cybern 64, 165-70 (1990) → homogeneous activity distr. features thresholds lateral weights (decorrelation)‏ selected features associative weights action effect Weber & Triesch, Proc ICANN, 740-9 (2008); Witkowski, Adap Behav, 15(1), 73-97 (2007); Toussaint, Proc NIPS, 929-36 (2003); Weber, Proc ICANN, 1147-52 (2001)‏ → relevant features indentified

7 sensory inputreward motor-sensory data (again, one feature type is linked to reward)‏ the network selects the action (to get reward)‏ irrelevant subspace relevant subspace

8 model 2: removing the irrelevant inputs 1) initialize feature detecting model (but continue learning)‏ 2) perform actor-critic RL, taking the features’ outputs as state representation - works despite irrelevant features - challenge: relevant features will occur at different frequencies - nevertheless, features may remain stable 3) observe the critic: puts negative value on irrelevant features after long training 4) modulate (multiply) learning by critic’s value frequency value

9 Lücke & Bouecke, Proc ICANN, 31-7 (2005) features critic value action weights → relevant subspace discovered

10 model 3: learning only the relevant inputs 1) top level: reinforcement learning model (SARSA)‏ 2) lower level: feature learning model (SOM / K-means)‏ 3) modulate learning by δ, in both layers RL weights feature weights input action

11 model 3: SARSA with SOM-like activation and update

12 relevant subspace RL action weights subspace coverage feature weights

13 RL action weights feature weights input reward 2 actions (not shown)‏ data learning ‘long bars’ data

14 RL action weights feature weights input data: bars controlled by actions ‘up’, ‘down’, ‘left’, ‘right’ learning the ‘short bars’ data reward action

15 short bars in 12x12 average # of steps to goal: 11

16 cortex striatum GPi (output of basal ganglia)‏ biological interpretation - no direct feedback from striatum to cortex - convergent mapping → little receptive field overlap, consistent with subspace discovery feature/subspace detection action selection

17 Discussion - models 1 and 2 learn all features and identify the relevent features - either requires homogeneous feature distribution (model 1)‏ - or can do only subspace- (no real feature) detection (model 2)‏ - model 3 is very simple: SARSA on SOM with δ-feedback - learns only the relevant subspace or features in the first place - link between unsupervised- and reinforcement learning Sponsors Bernstein Focus Neurotechnology EU project 231722 “IM-CLeVeR” call FP7-ICT-2007-3 Frankfurt Institute for Advanced Studies FIAS

18 early learninglate learning Jog et al, Science, 286, 1158-61 (1999)‏ relevant features change during learning units in the basal ganglia are active at the junction during early task acquisition but not at a later stage T - maze decision task (rat)‏

19 evidence for reward/action modulated learning in the visual system Shuler & Bear, "Reward timing in the primary visual cortex", Science, 311, 1606-9 (2006)‏ Schoups et al. "Practising orientation identification improves orientation coding in V1 neurons" Nature, 412, 549-53 (2001)‏


Download ppt "Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition."

Similar presentations


Ads by Google