September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 1 Sigmoidal Neurons In backpropagation networks, we typically choose = 1 and = f i (net i (t)) net i (t) = 1 = 0.1
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 2 Sigmoidal Neurons This leads to a simplified form of the sigmoid function: We do not need a modifiable threshold , because we will use “dummy” inputs as we did for perceptrons. The choice = 1 works well in most situations and results in a very simple derivative of S(net).
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 3 Sigmoidal Neurons This result will be very useful when we develop the backpropagation algorithm.
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 4 Backpropagation Learning Similar to the Adaline, the goal of the Backpropagation learning algorithm is to modify the network’s weights so that its output vector o p = (o p,1, o p,2, …, o p,K ) is as close as possible to the desired output vector d p = (d p,1, d p,2, …, d p,K ) for K output neurons and input patterns p = 1, …, P. The set of input-output pairs (exemplars) {(x p, d p ) | p = 1, …, P} constitutes the training set.
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 5 Backpropagation Learning Also similar to the Adaline, we need a cumulative error function that is to be minimized: We can choose the mean square error (MSE) once again (but the 1/P factor does not matter): where
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 6Terminology Example: Network function f: R 3 R 2 output layer hidden layer input layer input vector output vector x1x1x1x1 x2x2x2x2 o2o2o2o2 o1o1o1o1 x3x3x3x3
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 7 Backpropagation Learning For input pattern p, the i-th input layer node holds x p,i. Net input to j-th node in hidden layer: Network error for p: Output of k-th node in output layer: Net input to k-th node in output layer: Output of j-th node in hidden layer:
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 8 Backpropagation Learning As E is a function of the network weights, we can use gradient descent to find those weights that result in minimal error. For individual weights in the hidden and output layers, we should move against the error gradient (omitting index p): Output layer: Derivative easy to calculate Hidden layer: Derivative difficult to calculate
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 9 Backpropagation Learning When computing the derivative with regard to w k,j (2,1), we can disregard any output units except o k : Remember that o k is obtained by applying the sigmoid function S to net k (2), which is computed by: Therefore, we need to apply the chain rule twice.
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 10 Backpropagation Learning Since We have: We know that: Which gives us:
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 11 Backpropagation Learning For the derivative with regard to w j,i (1,0), notice that E depends on it through net j (1), which influences each o k with k = 1, …, K: Using the chain rule of derivatives again:
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 12 Backpropagation Learning This gives us the following weight changes at the output layer: … and at the inner layer:
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 13 Backpropagation Learning As you surely remember from a few minutes ago: Then we can simplify the generalized error terms: And:
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 14 Backpropagation Learning The simplified error terms k and j use variables that are calculated in the feedforward phase of the network and can thus be calculated very efficiently. Now let us state the final equations again and reintroduce the subscript p for the p-th pattern:
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 15 Backpropagation Learning Algorithm Backpropagation; Start with randomly chosen weights; Start with randomly chosen weights; while MSE is above desired threshold and computational bounds are not exceeded, do while MSE is above desired threshold and computational bounds are not exceeded, do for each input pattern x p, 1 p P, for each input pattern x p, 1 p P, Compute hidden node inputs; Compute hidden node inputs; Compute hidden node outputs; Compute hidden node outputs; Compute inputs to the output nodes; Compute inputs to the output nodes; Compute the network outputs; Compute the network outputs; Compute the error between output and desired output; Compute the error between output and desired output; Modify the weights between hidden and output nodes; Modify the weights between hidden and output nodes; Modify the weights between input and hidden nodes; Modify the weights between input and hidden nodes; end-for end-for end-while. end-while.