1 8. Auto-associative memory and network dynamics Lecture Notes on Brain and Computation Byoung-Tak Zhang Biointelligence Laboratory School of Computer.

1 8. Auto-associative memory and network dynamics Lecture Notes on Brain and Computation Byoung-Tak Zhang Biointelligence Laboratory School of Computer Science and Engineering Graduate Programs in Cognitive Science, Brain Science and Bioinformatics Brain-Mind-Behavior Concentration Program Seoul National University E-mail: btzhang@bi.snu.ac.kr This material is available online at http://bi.snu.ac.kr/ Fundamentals of Computational Neuroscience, T. P. Trappenberg, 2002.

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr Outline 2 8.1 8.2 8.3 8.4 8.5 8.6 8.7 Short-term memory and reverberating network activity Long-term memory and auto-associators Point-attractor networks: the Grossberg-Hopfield model The phase diagram and the Grossberg-Hopfield model Sparse attractor neural networks Chaotic networks: a dynamic systems view Biologically more realistic variations of attractor networks

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.1 Short-term memory and reverberating network activity 8.1.1 Short-term memory Short-term memory  Ability to hold information temporarily  Recency effect  Tendency to remember recent objects Physiological level  Hold corresponding neural activity over a certain duration Working memory  Another type of short-term memory  Discuss in Chapter 11 3

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.1.2 Maintenance of neural activity Monkey experiment: A monkey was trained that maintain its eyes on a central fixed spot until a ‘go’ signal(tone) is given The target location for each trial had to be remembered during the delay period The dorsolateral prefrontal cortex activity Fig. 8.1: The neurons were sensitive to the particular target direction of 270 deg 4 Fig. 8.1 Maintenance of delay activity in physiological experiments

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.1.3 Recurrences Delay activity can be used to store activity over some period of time The question is how such activity can be maintained by a neural network  Recurrences 1. A recurrent node is able to maintain its firing as long as the recurrent pathway is strong enough; 2. There is some delay in the feedback so that the re-entry signal does not fall within a refractory time of the node; and 3. A possible leakage in the recurrent pathway is small enough so that it is possible to fire the node again. 5 Fig. 8.2 (A) Schematic illustration of an auto- associative node that is distinguished from the associative node as illustrated in Fig. 7.1A in that it has, in addition, a recurrent feedback connection. (B) An auto-associative network that consist of associative nodes that not only receive external input from other neural layers but, in addition, have many recurrent collateral connections between the nodes in the neural layer.

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.2 Long-term memory and auto- associators Auto-associative memory  The input of each node is fed back to all of the other nodes in the network A recurrent network model  Tune the recurrent connections The back-projections in this associative network  Enhances the pattern completion ability The recurrent model  Anatomically faithful  Collateral connections  Intercortical back-projections 6

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.2.1 The hippocampus and episodic memory The hippocampus has been associated with the form of LTM called episodic memory  The storage of events  Area CA3 has well developed axon collaterals connecting  A recurrent auto-associative network The acquisition of episodic long-term memory Lesion experiment  Amnesia  Inability to form new long-term memory Intermediate-term memory 7

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.2.2 Learning and retrieval phase A difficulty that occurs when combining associative Hebbian mechanisms with recurrences in the networks  A training phase and a retrieval phase Switching between the learning and retrieval phase could be accomplished in the brain:  Mossy fibres command the firing the firing of specific CA3 neurons  Chemical agents  Acetylcoholine (Ach) and noradrealine  Neuromodulators 8

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.3 Point-attractor networks: The Grossberg-Hopfield model Grossberg-Hopfield model  The understanding and popularization of recurrent auto- associative memory  The dynamic properties Attractor states  A dynamic system reaches asymptotically The brain does not rely entirely on the necessity of settling into an attractor states The rapid path towards an attractor state  Sufficient to use such networks as associative memory devices 9

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.3.1 Network dynamics Considering the time domain A network of sigma nodes governed by the leaky integrator dynamics The change of the internal state of node I A discrete version Set Δt=τ=1 The stationary states, dh i /dt = 0, of the continuous system (eqn. 8.1) are also the fixpoints of the discrete system (eqn. 8.3) The transient response dynamics of model 10 : Continuous system (8.1) (8.2) (8.3)

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.3.2 Hebbian auto-correlation learning The weights are not random  Specifically self-organized by Hebbian learning Capacity analysis and recall abilities A binary system, r i ∈ {0,1},  The variables Train the system on a set of random patterns ξ i μ ∈ {-1,1},  The index μ labels the pattern, μ = 1,…, P  The Hebbian rule The threshold gain function of the net input h i, 11 (8.4) (8.5) (8.6)

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.3.3 Signal-to-noise analysis How the network behaves on its own The external input to zero, μ = 1 for demonstration, First term is signal Second term is the cross-talk term  ‘Noise’ 12 (8.7) (8.8) (8.9) (8.10)

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.3.4 One pattern Only one imprinted pattern  Not have a cross-talk term The dynamics of this network Not start the network with the trained pattern  A noisy version  Retrieve the learned pattern  in a moderately noisy version of the trained pattern A point attractor  The trained pattern 13 (8.11)

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.3.5 Many patterns The cross-talk term is a random variable  Estimate the variance Each part of the sum in the cross talk term is distributed number with 1 to -1 The variance, The error function, A load parameter, 14 Fig. 8.3 The probability distribution of the cross-talk term is well approximated by a Gaussian with zero and variance σ. The value of the shaded area marked P error is the probability that the cross-talk term changes the state of the node. The table lists examples of this probability for different values of the load parameter α. (8.12) (8.13) (8.14)

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.3.6 Point attractors The pattern completion ability of the associative nodes The distance  a = ξ 1, b = s 15 Fig. 8.4 Simulation results for an auto-associative network with quasicontinuous leaky-integrator dynamics, N = 1000 node, and a time constant of τ = 10 ms. The time step was set to Δt = 0.01 ms. (A) Distance between a trained pattern and the state of the network after the network was initialized with a noisy version of one of the 100 trained patterns. (B) Dependence of the distance at t = 1 ms on the initial distance at time t = 0 ms (8.15)

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.4 The phases diagram and the Grossberg-Hopfield model 8.4.1 The load capacity α c Sparse connectivity  The number of connections per node, C, The load capacity α c 16 Fig. 8.5 Simulation results for an auto- associative network equivalent to the network used in Fig. 8.4 but with different numbers of patterns I the training set N pat. The distance of the network state from the first training pattern at time t = 1 ms is shown. The network was initialized with a version of the first trained pattern with 1% reversed component (8.16)

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.4.2 The spin model analogy Spin models (developed in statistical physics) Central to the correspondence of spin models and the recurrent models  The binary state Thermal noise, T A sharp transition between  a paramagnetic phase and  No dominant direction of the magnets  the ferromagnetic phase  A dominating direction of the elementary magnets Phase transition 17

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.4.3 Frustration and spin glasses The situation in auto-associative network is further complicated  The force between the nodes is not consistently positive The Hebbian rule  Positive and negative weights  complicated spin states of the system  Frustrated systems or spin glasses Mean field theory Replica method 18

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.4.4 The phase diagram The phase diagram of the Grossberg-Hopfield model Noise in the network A probabilistic version 19 Fig. 8.6 Phase diagram of the attractor network trained on a binary pattern with Hebbian imprinting. The abscissa represents the values of the load parameter α = N pat /C, where N pat is the number of trained patterns and C is the number of connections per node. The ordinate represents the amount of noise in the system. The shaded region is where point attractors proportional to the trained pattern exit. The network in this region can therefore function as associative memory. (8.17) (8.18)

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.4.5 Spurious states Noise can help the memory performance The network with a pattern that has the sign of the majority of the first three patterns: The state of the node after one update of this node If the components ξ 1,ξ 2, ξ 3 all have the same value, which happen with the probability of ¼, then we can pull out this value from the sum in the signal term, if ξ 3 has different sign Average a signal that has the strength of times the signal when updating a trained pattern 20 ξ1ξ1 ξ2ξ2 ξ3ξ3 ξ1+ξ2+ξ3ξ1+ξ2+ξ3 1 1 1 1 1 1 1 3 1 1 -3 (8.19) (8.20) (8.21) (8.22)

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.4.6 The advantage of noise Spurious states  attractors under the network dynamics The average strength of the signal for the spurious states is less than the signal for a trained pattern  The spurious states under normal conditions are less stable than attractors related to trained patterns With an appropriate level of noise  Kick the system out of the basin of attraction of some spurious states and into the basin of attraction of another attractor Noise can help to destabilize undesired memory states The average behavior of the network  Good assumption in the large network The behavior of particular network depends strongly on the specific realization of the training pattern  The phase diagram is specific to the choice of training patterns 21

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.5 Sparse attractor neural networks The load capacity for the noiseless Grossberg-Hopfield model with starndard Hebbian learning is about 0.138  Training patterns are uncorrelated The sensory signals are often correlated  A fish image and water image  The cross-talk term can yield high values  Solution  The training patterns get modified to yield orthogonal patterns  α c = 1 22 (8.23) (8.24)

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.5.1 Expansion coding Orthogonalization  Expansion coding 23 Fig. 8.7 Example of expansion coding that can orthogonalize a pattern representation with a single- layer perceptron. The nodes in the perceptron are threshold units, and we have included a bias with a separate node with constant input. The orthogonal output can be fed into a recurrent attractor network where all these inputs are fixpoints of the attractor dynamics.

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.5.2 Sparse pattern The expansion coding  The load capacities of attractor networks can be larger for patterns with sparse representations. The sparseness The storage capacity of attractor networks  k is a constant (roughly on the order of 0.2~0.3)  Sparseness a = 0.1, 10,000 synapses  The number of patterns that can be stored exceed 20,000 24 (8.25) (8.26)

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.5.3 Alternative learning schemes The cross-talk term  Minimized with different choices of weight matrices  The overlap matrix  The pseudo-inverse method The inverse orthogonalized the input pattern and the storage capacity is α c = 1 Not very plausible biologically 25 (8.27) (8.28)

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.5.4 The best storage capacity with a sparse pattern There are many possible learning algorithms  Each learning rule can lead to a different weight matrix A certain training set we can ask what the load capacity of the network  Try out all the possible weight matrices  A daunting task The maximal storage capacity of auto-associative network with a binary patterns The simplest Hebbian rule comes close to givig the maximum value 26 (8.29)

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.5.5 Control of sparseness in attractor networks Training recurrent networks on patters with sparseness a with the basic Hebbian covariance rule,  Not enough to ensure that the state that is retrieved also has sparseness of a ret = a Mean Variance A weight matrix with Gaussian-distributed components means that the amount of inhibition is equal to the amount of excitation 27 riri rjrj δwδwP(w)P(w) 00110011 01010101 a 2 -a(1-a) (1-a) 2 a(1-a) a 2 (8.30) (8.31) (8.32) (8.33) (8.34) (8.35) (8.36) (8.37) (8.38)

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.5.6 Global inhibition (1) 28 Activity-dependent global inhibition, The retrieval sparseness Fig. 8.8 (A) A Gaussian function centerd at a value –c. Such a curve describes the distribution of Hebbian weight values trained on random patterns and includes some global inhibition with strength value c. The shaded area is given by the Gaussian error function described in Appendix B. (B) Theoretical retrieval sparseness a ret as a function of global inhibition c is plotted as thin lines for different values of the sparseness of the pattern set a, from a = 0.05 (lower curve) to a = 0.5 (upper curve) I steps of Δa = 0.05. We assumed therein 40 imprinted patterns with Gaussian-distributed components of the weight matrix. The thick line shows where the retrieval sparseness matches the sparseness of the imprinted pattern (a ret = a). (8.39) (8.40) (8.41) (8.42)

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.5.6 Global inhibition (2) 29 Fig. 8.9 (A) The simulated curve for pattern with sparseness a = 0.1. The plateau is due to the attractor dynamics not taken into account in the analysis that led to Fig. 8.8B. The lower curve indicates the average Hamming distance between the imprinted pattern and the network state updating the network. Correct recalls were indeed achieved for inhibition constants that coincide with the plateau in the retrieval sparseness. (B) normalized Histogram of weight components for 40 patterns trained with the Hebbian covariance rule. (C) Normalized histogram of weight components for 400 patterns trained with the Hebbian covariance rule.

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.6 Chaotic networks: a dynamic systems view The theory of dynamic systems  Auto-associative memories = ‘point attractors’ Equations of motion The dynamic of a recurrent network with continuous dynamics Dimensionality  The number of equations  The number of nodes in the networks The vector x is state vector State space Trajectory 30 (8.43)

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.6.1 Attractors Lorenz system A recurrent network of three nodes 31 Fig. 8.10 Example of a trajectory of the Lorenz system from a numerical integration within the time interval 0 ≤ t ≤ 100. The parameters used were a = 10, b = 28, and c = 8/3. (8.44) (8.45) (8.46) (8.47) (8.48)

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.6.2 Lyapunov functions (1) A system has a point attractor if a Lyapunov function (energy function) exists ‘Landscape’ If there is a function V(x) that never increases under the dynamics of the system, 32 Fig. 8.11 A ball in an ‘energy’ landscape. (8.49)

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.6.3 The Cohen-Grossberg theorem General systems with continuous dynamics Lyapunov function under the conditions  Positivity a i ≥ 0: The dynamics must be a leaky integrator rather than an amplifying integrator  Symmetry w ij = w ji : The influence of one node on another has to be the same as the reverse influence  Monotonicity sign(dg(x)/dx) = const: The activation function has to be a monotonic function 34 (8.55)

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.7 Biologically more realistic variations of attractor networks The synaptic weights between neurons in the nervous system  Cannot be expected to fulfill the conditions of symmetry of the weight matrix required to guarantee stable attractors Neurons receive a mixture of input from excitatory and inhibitory presynaptic neuorns Dale’s principle The Cohen-Grossberg theorem  An inhibitory node could only receive inhibitory connections and vice versa 35

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.7.1 Asymmetric networks Simple case of non-symmetric weight matrices  A symmetric and an anti-symmetric part The difference between two consecutive time steps 36 Fig. 8. 12 (A) convergence indicator for networks with an asymmetric weight matrix where the individual components of the matrix are chosen from the unit strength. (B) Similar to (A) expect that the individual components of the weight matrix are chosen from a Gaussian distribution. (C) Overlap of the network state with a trained pattern in a Hebbian auto-associative network that satisfies Dale’s principle. (8.56) (8.57) (8.58) (8.59) (8.60)

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.7.2 Random and Hebbian matrices with asymmetries The weight components of strength |w ij | = 1 with random variables (Fig. 8.10 B) Dales’ principle (Fig. 8.10 C) 37 Fig. 8. 12 (A) convergence indicator for networks with an asymmetric weight matrix where the individual components of the matrix are chosen from the unit strength. (B) Similar to (A) expect that the individual components of the weight matrix are chosen from a Gaussian distribution. (C) Overlap of the network state with a trained pattern in a Hebbian auto- associative network that satisfies Dale’s principle.

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 8.7.3 Non-monotonic networks The Cohen-Grossberg theorem indicates that networks can also behave chaotic when violating other constraints Networks of Hebbian trained networks with non-monotinic activation functions  Point attractors still exist  The profoundly enhanced storage capacities of those networks Basins of attraction that seem to be surrounded by chaotic regimes Non-monotonic activation functions An effective non-monotoincity  Appropriate activation of excitatory and inhibitory connections  Some neurons may have non-monotonic gain functions 38

(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr Conclusion Short-term memory and long-term memory Recurrent neural networks Point-attractor networks  Dynamics  Hebbian learning Phase diagram Sparse attractor neural networks Dynamical systems  Chaotic networks  Lyapunov function Asymmetric networks 39

1 8. Auto-associative memory and network dynamics Lecture Notes on Brain and Computation Byoung-Tak Zhang Biointelligence Laboratory School of Computer.

Similar presentations

Presentation on theme: "1 8. Auto-associative memory and network dynamics Lecture Notes on Brain and Computation Byoung-Tak Zhang Biointelligence Laboratory School of Computer."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 8. Auto-associative memory and network dynamics Lecture Notes on Brain and Computation Byoung-Tak Zhang Biointelligence Laboratory School of Computer.

Similar presentations

Presentation on theme: "1 8. Auto-associative memory and network dynamics Lecture Notes on Brain and Computation Byoung-Tak Zhang Biointelligence Laboratory School of Computer."— Presentation transcript:

Similar presentations

About project

Feedback