Hopfield Networks MacKay - Chapter 42
The Story So Far Feedforward networks: All connections are directed – the activity from each neuron only influences downstream neurons.
Feedback Networks Feedback Networks: Any network that is not a feedforward network. Connections may be bi-directional.
Hopfield Networks Consists of I fully connected neurons. All connections are bidirectional such that wij = wji. There are no self-connections (i.e. wii = 0). Activation: 𝑎 𝑖 = 𝑗 𝑤 𝑖𝑗 𝑥 𝑗 State Updates: 𝑥 𝑖 =𝑡𝑎𝑛ℎ 𝑎 𝑖 2 w12 1 w21 4 w34 3 w43
Hopfield Energy Function Optional bias term 𝐸=− 𝑖<𝑗 𝑥 𝑖 𝑥 𝑗 𝑤 𝑖𝑗 − 𝑖 𝑥 𝑖 𝑤 0 This objective of the Hopfield network is to minimize the energy function. The effect of a single 𝑥 𝑖 on the global energy can be computed as: Δ 𝐸 𝑖 =𝐸 𝑥 𝑖 =0 −𝐸 𝑥 𝑖 =1 = 𝑤 0 + 𝑗 𝑥 𝑗 𝑤 𝑖𝑗 Energy when 𝑥 𝑖 is off Energy when 𝑥 𝑖 is on
Settling to an Energy Minimum -4 To find an energy minimum in this net, start from a random state and then update the units one at a time in random order. Update each unit to whichever of its two states (0, or 1) gives the lowest global energy. 3 2 3 3 -1 -1
Settling to an Energy Minimum Start from a random state -4 1 To find an energy minimum in this net, start from a random state and then update the units one at a time in random order. Update each unit to whichever of its two states (0, or 1) gives the lowest global energy. 3 2 3 3 1 -1 -1 -E = goodness = 3 (Look at all pairs of units that are on and add up the weights between them).
Settling to an Energy Minimum What state should this unit be in given the states of all the other units? -4 1 ? To find an energy minimum in this net, start from a random state and then update the units one at a time in random order. Update each unit to whichever of its two states (0, or 1) gives the lowest global energy. 3 2 3 3 1 -1 -1 -E = goodness = 3 (Look at all pairs of units that are on and add up the weights between them).
Settling to an Energy Minimum Input = 1x-4 + 0x3 + 0x3 = -4 Since -4 < 0, we turn the unit off. -4 1 ? To find an energy minimum in this net, start from a random state and then update the units one at a time in random order. Update each unit to whichever of its two states (0, or 1) gives the lowest global energy. 3 2 3 3 1 -1 -1 -E = goodness = 3 (Look at all pairs of units that are on and add up the weights between them).
Settling to an Energy Minimum Input = 1x-4 + 0x3 + 0x3 = -4 Since -4 < 0, we turn the unit off. -4 1 To find an energy minimum in this net, start from a random state and then update the units one at a time in random order. Update each unit to whichever of its two states (0, or 1) gives the lowest global energy. 3 2 3 3 1 -1 -1 -E = goodness = 3 (Look at all pairs of units that are on and add up the weights between them).
Settling to an Energy Minimum -4 1 To find an energy minimum in this net, start from a random state and then update the units one at a time in random order. Update each unit to whichever of its two states (0, or 1) gives the lowest global energy. 3 2 3 3 ? -1 -1 -E = goodness = 3 (Look at all pairs of units that are on and add up the weights between them).
Settling to an Energy Minimum -4 1 To find an energy minimum in this net, start from a random state and then update the units one at a time in random order. Update each unit to whichever of its two states (0, or 1) gives the lowest global energy. 3 2 3 3 ? -1 -1 Input = 1x3 + 0x-1 = 3 Since 3 > 0, we turn the unit on. -E = goodness = 3 (Look at all pairs of units that are on and add up the weights between them).
Settling to an Energy Minimum -4 1 To find an energy minimum in this net, start from a random state and then update the units one at a time in random order. Update each unit to whichever of its two states (0, or 1) gives the lowest global energy. 3 2 3 3 1 -1 -1 Input = 1x3 + 0x-1 = 3 Since 3 > 0, we turn the unit on. -E = goodness = 3 (Look at all pairs of units that are on and add up the weights between them).
Settling to an Energy Minimum -4 1 To find an energy minimum in this net, start from a random state and then update the units one at a time in random order. Update each unit to whichever of its two states (0, or 1) gives the lowest global energy. 3 2 3 3 1 -1 ? -1 Input = 1x2 + 1x-1 + 0x3 + 0x-1 = 1 Since 1 > 0, we turn the unit on. -E = goodness = 3 (Look at all pairs of units that are on and add up the weights between them).
Settling to an Energy Minimum -4 1 To find an energy minimum in this net, start from a random state and then update the units one at a time in random order. Update each unit to whichever of its two states (0, or 1) gives the lowest global energy. 3 2 3 3 1 -1 1 -1 Input = 1x2 + 1x-1 + 0x3 + 0x-1 = 1 Since 1 > 0, we turn the unit on. -E = goodness = 4 (Look at all pairs of units that are on and add up the weights between them).
Settling to an Energy Minimum -4 1 To find an energy minimum in this net, start from a random state and then update the units one at a time in random order. Update each unit to whichever of its two states (0, or 1) gives the lowest global energy. 3 2 3 3 1 -1 1 -1 Now the network has settled to a minimum (none of the units want to change states). -E = goodness = 4 (Look at all pairs of units that are on and add up the weights between them).
A Deeper Energy Minimum -4 1 This network has a second stable energy minimum. 3 2 3 3 -1 1 -1 1 -E = goodness = 5
A Deeper Energy Minimum -4 1 The net has two triangles in which the three units mostly support each other. Each triangle mostly dislikes the other triangle. The triangle on the left differs from the one on the right by having a weight of 2 where the other one has a weight of 3. So turning on the units in the triangle on the right gives the deepest minimum. 3 2 3 3 -1 1 -1 1 -E = goodness = 5
Hopfield Networks Weights are initialized using the Hebb rule: E.g. For k patterns: 𝑤 𝑖𝑗 = 𝑘 𝑥 𝑖 (𝑘) 𝑥 𝑗 (𝑘) Where 𝑥 𝑖 , 𝑥 𝑗 ∈ −1,1 E.g. 𝑥 𝑖 1−4 =[1 1 −1 1] 𝑥 𝑗 1−4 =[1 1 −1 1] 𝑤 𝑖𝑗 = 1∙1 + 1∙1 + −1∙−1 + 1∙1 =4 𝑥 𝑗 1−4 =[−1 −1 1 −1] 𝑤 𝑖𝑗 = 1∙−1 + 1∙−1 + −1∙1 + 1∙−1 =−4
Hopfield Networks For a single corrupted pattern 𝑋 (𝑘) , convergence to one of the memorized patterns proceeds as follows: 𝑎 𝑖 = 𝑗 𝑤 𝑖𝑗 𝑥 𝑗 𝑥 𝑖 =𝑡𝑎𝑛ℎ 𝑎 𝑖 𝜖 𝑖 = 𝑥 𝑖 𝑛𝑒𝑤 − 𝑥 𝑖 𝑜𝑙𝑑 𝜕𝑤= 𝑥 𝑛𝑒𝑤 ∗ 𝜖 𝑇 𝜕𝑤←𝜕𝑤+ 𝜕𝑤 𝑇 𝑊←𝑊+𝜂 𝜕𝑤−𝛼𝑊 Until the summed error ( 𝑖 𝜖 𝑖 ) reaches some desired minimum.
Matlab Code M = 5; % Number of rows in memories. N = 5; % Number of columns in memories. MN = M*N; % Number of nodes in the network eta = 1/MN; % Learning rate alpha = 1; % Regularizer constant max_iter = 100; % Maximum number of iterations. epsilon = 1e-4; % Tolerance on convergence.
Matlab Code % Memories ----------------------------------- ----- % These are the images we want to network to remember and % to be attracted to on future presentations of noisy stimuli Mems = ones(M,N,4); % Character 'D': Mems(1,1:4,1) = -1; Mems(2:4,[2,5],1) = -1; Mems(5,2:4,1) = -1; % Character 'J': Mems(1,:,2) = -1; Mems(2:4,4,2) = -1; Mems(4:5,1,2) = -1; Mems(5,2:3,2) = -1; % Character 'C': Mems([1,5],2:5,3) = -1; Mems(2:4,1,3) = -1; % Character 'M': Mems(:,[1,5],4) = -1; Mems(2,[2,4],4) = -1; Mems(3,3,4) = -1; nMem = size(Mems,3);
Matlab Code % Plot the memories figure('Color','w'); for n = 1:nMem subplot(1,nMem,n); imagesc(Mems(:,:,n)); colormap('gray'); axis equal; axis tight; axis off; end
Matlab Code tmpWi = zeros(MN,MN,nMem); tmpWj = zeros(MN,MN,nMem); for k = 1:nMem [tmpWj(:,:,k),tmpWi(:,:,k)] = meshgrid(reshape(Mems(:,:,k),MN,1)); end % ------------------------------------------------- % Initialize the weights using Hebb rule W = sum(tmpWi .* tmpWj,3); % Set diagonal to zero: W = W .* ~eye(MN,MN); % Ensure the self-weights are zero
Matlab Code % The letter number to corrupt num_to_corr = 1; % Number of bits to flip num_to_flip = 5; % Randomly flip num_to_flip pixels in the num_to_corr image memory to create % Xinit, the initial state matrix: Xinit = Mems(:,:,num_to_corr); if num_to_flip>0 randinds = randperm(MN); flipinds = randinds(1:num_to_flip); Xinit(flipinds) = -1 * Xinit(flipinds); end
Matlab Code % Plot the corrupted letter that we're going to pass to the network figure; imagesc(Xinit); axis off; axis image; colormap gray; title('X at iteration 0');
Matlab Code % Initialize variables converged = 0; iter = 1; Xhat = Xinit; Xold = ones(M,N);
Matlab Code while ~converged && iter<=max_iter % a(j) = sum(w(j,l)*x(l)) + theta(j) Activations = W * Xhat(:); % compute all activations % h(j) = tanh(a(j)) Xhat = reshape(tanh(Activations),M,N); % compute all outputs 𝑎 𝑖 = 𝑗 𝑤 𝑖𝑗 𝑥 𝑗 𝑥 𝑖 =𝑡𝑎𝑛ℎ 𝑎 𝑖
Matlab Code % Show the current state of the network figure; imagesc(Xhat); axis off; axis image; colormap gray; title(['X at iteration ',num2str(iter)]);
Matlab Code % Measure how much the weights have changed since the last iteration err = sum(abs(Xhat(:)-Xold(:))); if err<=epsilon converged = 1; fprintf('Network has converged at iteration %i.\n',iter); else fprintf('Difference between current and previous state: %f.\n',err); end
Matlab Code 𝜖 𝑖 = 𝑥 𝑖 𝑛𝑒𝑤 − 𝑥 𝑖 𝑜𝑙𝑑 𝜕𝑤= 𝑥 𝑛𝑒𝑤 ∗ 𝜖 𝑇 𝜕𝑤←𝜕𝑤+ 𝜕𝑤 𝑇 𝜖 𝑖 = 𝑥 𝑖 𝑛𝑒𝑤 − 𝑥 𝑖 𝑜𝑙𝑑 e = Xhat(:)-Xold(:); % compute all errors gw = Xhat(:)*e(:)'; % compute the gradient on the errors with respect to the weights gw = gw + gw'; % symmetrize gradient W = W + eta * ( gw - alpha * W ); % make step iter = iter + 1; Xold = Xhat; end 𝜕𝑤= 𝑥 𝑛𝑒𝑤 ∗ 𝜖 𝑇 𝜕𝑤←𝜕𝑤+ 𝜕𝑤 𝑇 𝑊←𝑊+𝜂 𝜕𝑤−𝛼𝑊
Hopfield Networks Can easily be adapted to match a degraded pattern to a given pattern from a set.
Hopfield Networks dBD = distance between B and D. Hopfield & Tank (1985) Hopfield Networks dBD = distance between B and D.
Hopfield Networks Hopfield and Tank (1985) showed how Hopfield networks can be used to solve the traveling salesman problem.
Homework Try adapting the Hopfield network code to handle a 5th letter (e.g. “L”). -1 1 1 1 1 -1 -1 -1 -1 -1