Jochen Triesch, UC San Diego, 1 Part 3: Hebbian Learning and the Development of Maps Outline: kinds of plasticity Hebbian learning rules weight normalization intrinsic plasticity spike-timing dependent plasticity developmental models based on Hebbian learning self-organizing maps
Jochen Triesch, UC San Diego, 2 A Taxonomy of Learning Settings Unsupervised Self-supervised Reinforcement Imitation Instruction Supervised increasing amount of “help” from the environment
Jochen Triesch, UC San Diego, 3 Unsupervised Hebbian Learning Hebb’s Idea (1949): If firing of neuron A contributes to firing of neuron B, strengthen connection from A to B. Possible uses: (after Hertz,Krogh,Palmer) Familiarity detection Principal Component Analysis Clustering Encoding Donald Hebb
Jochen Triesch, UC San Diego, 4 Network Self-organization Three ingredients for self-organization: positive feedback loops (self-amplification) limited resources leading to competition between elements cooperation between some elements possible correlated activity leads to weight growth (Hebb) weight growth leads to more correlated activity weight growth limited due to competition connection weights activity patterns
Jochen Triesch, UC San Diego, 5 Long term potentiation (LTP) and Long term depression (LTD) observed in neocortex, cerebellum, hippocampus, … requires paired pre- and postsynaptic activity
Jochen Triesch, UC San Diego, 6 Local Learning Biological plausibility of learning rules: want “local learning” rule. Weight change should be computed only from information that is locally available at the synapse: pre-synaptic activity post-synaptic activity strength of this synapse … But not: any other pre-synaptic activity any other weight … Also: weight change only on information available at that time. Knowledge of detailed history of pre- and post-synaptic activity is not plausible. (Locality in time domain) These are problems for many learning rules derived from information theoretic ideas.
Jochen Triesch, UC San Diego, 7 Single linear unit inputs draw from some probability distribution simple Hebb rule moves weight vector in direction of current input frequent inputs have bigger impact on resulting weight vector: familiarity Problem: weights can grow without bounds, need competition … simple Hebbian learning, η is learning rate Called correlation based learning, because average weight change proportional to correlation between pre- and post-synaptic activity:
Jochen Triesch, UC San Diego, 8 Reminder: Correlation and Covariance correlation: covariance: mean:
Jochen Triesch, UC San Diego, 9 Continuous time formulation another good reason to call this a correlation based rule! averaging across stimulus ensemble
Jochen Triesch, UC San Diego, 10 Covariance rules justifying the name “covariance” rule! Simple Hebbian rule only allows for weight growth (pre- and post-synaptic firing rates are non-negative numbers: no account of LTD. Covariance rules are one way of fixing this. Averaging RHS over the stimulus ensemble and using the mean input as the threshold gives:
Jochen Triesch, UC San Diego, 11 Illustration of correlation and covariance rules: A.either rule on zero mean data B.correlation rule for non-zero mean C.covariance rule for non-zero mean Note: weight vector will grow without bounds in both cases
Jochen Triesch, UC San Diego, 12 The need to limit weight growth simple Hebb and covariance rules are unstable: lead to unbounded weight growth Remedy 1: weight clipping: don’t let weight grow above/below certain limits w max /w min Remedy 2: some form of weight normalization Multiplicative: subtract something from each weight w i that is proportional to that weight w i Subtractive: subtract something from each weight w i that is the same for all weights w i.. This leads to strong competition between weights, typically resulting in all weights going to w max or w min
Jochen Triesch, UC San Diego, 13 Weight normalization Two most popular schemes: sum of weights equals one (all weights assumed positive) sum of squared weights equals one Idea: force weight vector to lie on a “constraint surface”
Jochen Triesch, UC San Diego, 14 Subtractive vs. Multiplicative Different ways of going back to constraint surface
Jochen Triesch, UC San Diego, 15 Idea: Neuron may try to maintain certain average firing rate (homoeostatic plasticity) Can be combined with both multiplicative and subtractive constraints Activity Dependent Synaptic Normalization Turrigiano and Nelson, Nature Rev Neuro. 5: (2004)
Jochen Triesch, UC San Diego, 16 Turrigiano and Nelson, Nature Rev Neuro. 5: (2004) Observation: reducing activity leads to changes causing increased spontaneous activity increasing activity leads to changes causing reduced spontaneous activity
Jochen Triesch, UC San Diego, 17 Turrigiano and Nelson, Nature Rev Neuro. 5: (2004) Observation: synaptic strengths (mEPSC: miniature excitatory postsynaptic current, caused by release of single transmitter vesicle) are scaled multiplicatively
Jochen Triesch, UC San Diego, 18 Turrigiano and Nelson, Nature Rev Neuro. 5: (2004) Observation: similar changes to inhibitory synapses, consistent with homeostasis idea
Jochen Triesch, UC San Diego, 19 Applications of Hebbian Learning: Ocular Dominance fixed lateral weights, Hebbian learning of feedforward weights, exact form of weight competition is very important! Such models can qualitatively predict effects of blocking input from one eye, etc.
Jochen Triesch, UC San Diego, 20 Oja’s and Yuille’s rules idea: subtract term proportional to V 2 to limit weight growth special form of multiplicative normalization leads to extraction of first principal component “non-local” Oja: Yuille: some more…
Jochen Triesch, UC San Diego, 21 Hebbian Learning Rules for Principal Component Analysis Oja: Sanger: both rules give the first Eigenvectors of the correlation matrix
Jochen Triesch, UC San Diego, 22 Trace rules, slowness, and temporal coherence Alternative goal for coding: find slowly varying sources Motivation: pixel brightness changes fast on retina due to shifts, rotation, change in lighting, etc. but identity of person in front of you stays the same for prolonged time Invariance learning: want filters that are invariant with respect to such transformations. Note: these invariances always require non-linear filters (linear ICA won’t help) … output should vary slowly Trace rule (Foldiak, 1990): force unit to be slow+Hebbian learning Models for development of complex cells in V1: “keep unit active while edge moves across receptive field”
Jochen Triesch, UC San Diego, 23 Spike timing dependent plasticity pre-synaptic spike before post-synaptic spike: potentiation pre-synaptic spike after post-synaptic spike: potentiation “predictive learning”
Jochen Triesch, UC San Diego, 24 A: simulated place field shift in model (light gray curve to heavy black curve) B: predictive learning in rat hippocampal place cell: during successive laps, the place field of a neuron shifts to earlier and earlier locations along the track