Download presentation
1
ERROR BACK-PROPAGATION LEARNING ALGORITHM
Zohreh B. Irannia
2
Single-Layer Perceptron
xi input vector t=c(x) is the target value o is the perceptron output learning rate (a small constant ), assume =1 wi = wi + wi wi = (t - o) xi Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
3
Single-Layer Perceptron
Sigmoid-Function as Activation Function: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
4
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Delta Rule ? Gradient-Descent Delta-Rule Says: But WHY? Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
5
Steepest Descent Method
(w1,w2) (w1+w1,w2 +w2) Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
6
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Delta Rule ? Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
7
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Delta Rule define Finally Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
8
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Delta Rule j V j f( V j ) y j: Desired Target … Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
9
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Delta Rule So we have: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
10
Perceptron learning problem
Only suitable if inputs are linearly separable Consider XOR-Problem: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
11
Non linearly separable problems
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
12
Solution: Multi-layer Networks
New Problem: How to train different layer weights in such networks? Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
13
Idea of Error Back Propagation
Update of weights in output layer: Delta rule Delta rule is not applicable to hidden layers Because we don’t know the desired values for hidden nodes Solution: Propagating errors at output nodes back to hidden nodes Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
14
Intuition by Illustration
3 layer / 2 inputs / 1 output: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
15
Intuition by Illustration
Each neuron composed of 2 units Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
16
Intuition by Illustration
Training Starts through the input layer: The same happens for y2 and y3. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
17
Intuition by Illustration
Propagation of signals through the hidden layer: The same happens for y5. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
18
Intuition by Illustration
Propagation of signals through the output layer: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
19
Intuition by Illustration
Error signal of output layer neuron: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
20
Intuition by Illustration
propagate error signal back to all neurons. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
21
Intuition by Illustration
If propagated errors came from few neurons, they are added: The same happens for neuron-2 and neuron-3. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
22
Intuition by Illustration
Weight updating starts: The same happens for all neurons. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
23
Intuition by Illustration
Weight updating terminates in the last neuron: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
24
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Some Questions How often to update? After each training case? After a full sweep through the training data? How many epochs? How much to update? Use a fixed or variable learning rate? Is it true to use steepest descent method? Does it necessarily converge to global minimum? How long does it take to converge to some minimum? Etc. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
25
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Batch Mode Training Batch mode of weight updates: Weight update once per each epoch (cumulated over all P samples) Smoothing the training sample outliers Learning independent of the order of sample presentations Usually slower than in sequential mode Sometimes more likely to get stuck in local minima. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
26
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Major Problems of EBP Constant learning rate problems: Small Slow convergence Large Overshooting the minimum. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
27
Steepest-Descent’s Problems
Convergence to Local Minima Local Minimum Global Minimum Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
28
Steepest-Descent’s Problems
Slow Convergence (zigzag path) One solution: Steepest Descent Conjugate Gradient Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
29
Modifications to EBP Learning
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
30
Modifications to EBP Learning
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
31
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Speed It Up: Momentum Momentum Adds a percentage of the last movement to the current movement GD with Momentum Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
32
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Speed It Up: Momentum Weight change’s Direction: Combination of current gradient and previous gradient. Advantage: Reduce the role of outliers (Smooth search) But doesn’t adjust learning rate directly. (an indirect method) Disadvantages: May result to over-shooting. Not always reduce the number of iterations. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
33
But some problems remain!
Remaining problem: Equal learning rates for all weights ! Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
34
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Delta-bar-Delta Allows each weight to have its own learning rate. Lets learning rates vary with time. Two heuristics are used to determine appropriate changes : If weight changes is in the same direction for several time steps , learning rate for that weight should be increased. If direction of weight change alternates , the learning rate should be decreased. Note: these heuristics won’t always improve the performance. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
35
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Delta-bar-Delta Learning rate increase linearly and decrease exponentially. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
36
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Training Samples Quality & quantity of training samples Quality of learning results. Samples must represent well the problem space: Random sampling Proportional sampling (prior knowledge of the problem) # of training patterns needed: There is no theoretically idea number. Baum and Haussler (1989): P = W/e W: total # of weights e: acceptable classification error rate If the net can be trained to correctly classify (1 – e/2)P of the P training samples, then classification accuracy of this net is 1 – e for input patterns drawn from the same sample space Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
37
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Activation Functions Sigmoid Activation Function: Saturation regions When some incoming weights become very large input to a node may fall into a saturation region during learning. Possible remedies: Use non-saturating activation functions. Periodically normalize all weights. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
38
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Activation Functions Another sigmoid function with slower saturation rate: Change the range of the logistic function from (0,1) to (a, b) Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
39
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Activation Functions Change the slope of the logistic function Larger slope: Quicker to move to saturation regions // Faster convergence Smaller slope: Slow to move to saturation regions and allows refined weight adjustment // Slow convergence Larger slope Solution Adaptive slope (each node has a learned slope) Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
40
Practical Considerations
Many parameters must be carefully selected to ensure a good performance. Although the deficiencies of BP nets cannot be completely cured, some of them can be eased by some practical means. 2 important issues: Hidden Layers & Hidden Nodes Effect of Initial weights Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
41
Hidden Layers & Hidden Nodes
Theoretically, one hidden layer (possibly with many hidden nodes) is sufficient for any functions. There is no theoretical results on minimum necessary # of hidden nodes Practical rule of thumb: n = # of input nodes; m = # of hidden nodes For binary/bipolar data: m = 2n For real data: m >> 2n Multiple hidden layers with fewer nodes may be trained faster for similar quality in some applications Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
42
Effect of Initial Weights (and Biases)
Fully Random like: [-0.05, 0.05], [-0.1, 0.1], [-1, 1] Problems: Small values Slow learning. Large values Go to saturation (f’(x) 0) Slow learning. Normalize weights for hidden layer (Widrow) Random initial weights for all hidden nodes: [-0.5, 0.5] For each hidden node j, normalize its weight: m: # of input neurons n: # of hidden nodes For bias choose a random value: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
43
Effect of Initial Weights (and Biases)
Initialization of output weights shouldn’t result in small weighs. If small “contribution of hidden layer neurons to the output error”, and “effect of the hidden layer weights” is not visible enough. If small, deltas (of hidden layer) become very small small changes in the hidden layer weights. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
44
of different BP variants
NOW A Comparison of different BP variants Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
45
Comparison of different BP variants
Four versions: BP Pattern Mode Learning Algorithm BP Batch Mode Learning Algorithm BP Delta-Bar-Delta Learning Algorithm Problem: classification of breast cancer problem 9 attributes 699 examples: 458 benign and 241 malignant 16 instances with missing attribute rejected Attributes normalized with respect to their highest value Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 45
46
BP pattern mode results for different η
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 46
47
BP pattern mode results for different η
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 47
48
BP pattern mode results for different α
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 48
49
BP pattern mode results for different α
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 49
50
BP pattern mode results for different network structure
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 50
51
BP pattern mode results for different range values
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 51
52
BP pattern mode results for 9-2-1 net & range [-0.1,0.1]
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 52
53
BP pattern mode results for 9-2-1 net & range [-1,1]
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 53
54
BP batch mode results for different η and α
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 54
55
BP batch mode results for different network structure
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 55
56
BP batch mode results for different range values
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 56
57
BP batch mode results for 9-3-1 net, η=α=0.1, range[-1,1]
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 57
58
BP Delta-Bar-Delta results
For network, α = ξ = 0.1, Κ = β = 0.2, Training epochs =100 Range of random numbers for the values of the synaptic weights and thresholds : [-0.1,0.1] Range for the learning rate parameters ηji of the synaptic weights and the thresholds : [0, 0.2] Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 58
59
BP Delta-Bar-Delta results
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 59
60
Error-Back-Propagation Learning Algorithm
Conclusions On Error-Back-Propagation Learning Algorithm Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
61
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Summary of BP Nets Architecture: Multi-layer Feed-forward (full connection between nodes in adjacent layers, no connection within a layer) One or more hidden layers with non-linear activation function (most commonly used are sigmoid functions) Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
62
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Summary of BP Nets Back-Propagation learning algorithm: Supervised learning Approach: Gradient descent to reduce the total error (why it is also called generalized delta rule) Error terms at output nodes and Error terms at hidden nodes (why it is called error BP) Ways to speed up the learning process (next slide) Adding momentum terms Adaptive learning rate (delta-bar-delta) Quickprop Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
63
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Conclusions Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
64
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Conclusions Strengths of EBP learning: Wide practical applicability Easy to implement Good generalization power Great representation power Etc. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
65
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Conclusions Problems of EBP learning: Often takes a long time to converge Gradient descent approach only guarantees a local minimum error Selection of learning parameters can only be done by trial-and-error Network paralysis may occur (learning is stopped Saturation case) BP learning is non-incremental (to include new training samples, the network must be re-trained with all old and new samples) Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
66
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
References Dilip Sarkar, “Methods to Speed Up Error Back-Propagation Learning Algorithm”, ACM Computing Surveys. Vol. 27, No. 4, December 1995 Sergios Theodoridis, Konstantinos Koutroumbas, “Pattern Recognitions, 2nd Edition”. Laurene Fausett, “Fundamentals of Neural Networks”. M. Jiang, G. Gielen, B. Zhang, Z. Luo, “Fast Learning Algorithms for Feedforward Neural Networks”, Applied Intelligence 18, 37–54, 2003. Konstantinos Adamopoulos, “Application of Back Propagation Learning Algorithms On Multi-Layer Perceptrons”, Final Year Project, Department of Computing, University of Bradford. And many more related articles. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
67
Any Question? Thanks for your attention.
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.