Download presentation
Presentation is loading. Please wait.
Published byBlake Malone Modified over 9 years ago
1
1 Back-Propagation
2
2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train multilayer networks. Backpropagation is an approximate steepest descent algorithm, in which the performance index is mean square error. In order to calculate the derivatives, we need to use the chain rule of calculus.
3
3 Motivation The perceptron learning and the LMS algorithm were designed to train single-layer perceptron-like networks. They are only able to solve linearly separable classification problems. Parallel Distributed Processing The multilayer perceptron, trained by the backpropagation algorithm, is currently the most widely used neural network.
4
4 Three-Layer Network Number of neurons in each layer: R: number of inputs S n : number of neurons in n layer
5
5 Pattern Classification: XOR gate The limitations of the single-layer perceptron (Minsky & Papert, 1969)
6
6 Two-Layer XOR Network Two-layer, 1-2-1 network AND Individual Decisions
7
7 Solved Problem P11.1 Design a multilayer network to distinguish these categories. Class I Class II There is no hyperplane that can separate these two categories.
8
8 Solution of Problem P11.1 AND OR
9
9 Function Approximation Two-layer, 1-2-1 network
10
10 Function Approximation The centers of the steps occur where the net input to a neuron in the first layer is zero. The steepness of each step can be adjusted by changing the network weights.
11
11 Effect of Parameter Changes
12
12 Effect of Parameter Changes
13
13 Effect of Parameter Changes
14
14 Effect of Parameter Changes
15
15 Function Approximation Two-layer networks, with sigmoid transfer functions in the hidden layer and linear transfer functions in the output layer, can approximate virtually any function of interest to any degree accuracy, provided sufficiently many hidden units are available.
16
16 Backpropagation Algorithm For multilayer networks the outputs of one layer becomes the input to the following layer.
17
17 Performance Index Training Set: Mean Square Error: Vector Case: Approximate Mean Square Error: Approximate Steepest Descent Algorithm
18
18 Chain Rule If f(n) = e n and n = 2w, so that f(n(w)) = e 2w. Approximate mean square error:
19
19 Sensitivity & Gradient The net input to the ith neurons of layer m: The sensitivity of to changes in the ith element of the net input at layer m: Gradient:
20
20 Steepest Descent Algorithm The steepest descent algorithm for the approximate mean square error: Matrix form: s m F n m ---------- F n 1 m --------- F n 2 m --------- F n S m m ----------- =
21
21 BP the Sensitivity Backpropagation: a recurrence relationship in which the sensitivity at layer m is computed from the sensitivity at layer m +1. Jacobian matrix:
22
22 Matrix Repression The i,j element of Jacobian matrix
23
23 Recurrence Relation The recurrence relation for the sensitivity The sensitivities are propagated backward through the network from the last layer to the first layer.
24
24 Backpropagation Algorithm At the final layer:
25
25 Summary The first step is to propagate the input forward through the network: The second step is to propagate the sensitivities backward through the network: Output layer: Hidden layer: The final step is to update the weights and biases:
26
26 BP Neural Network
27
27 Ex: Function Approximation 1-2-1 Network + t e p
28
28 Network Architecture 1-2-1 Network a p
29
29 G(p) = 1+sin( /4 * p) for -2<=p <=2 P =1
30
30 Initial Values Initial Network Response:
31
31 Forward Propagation Initial input: Output of the 1st layer: Output of the 2nd layer: error:
32
32 Transfer Func. Derivatives
33
33 Backpropagation The second layer sensitivity: The first layer sensitivity:
34
34 Weight Update Learning rate
35
35 ex
36
36 Choice of Network Structure Multilayer networks can be used to approximate almost any function, if we have enough neurons in the hidden layers. We cannot say, in general, how many layers or how many neurons are necessary for adequate performance.
37
37 Illustrated Example 1 1-3-1 Network
38
38 Illustrated Example 2 1-5-1 1-2-11-3-1 1-4-1
39
39 Convergence 1 2 3 4 5 0 1 2 3 4 5 0 Convergence to Global Min. Convergence to Local Min. The numbers to each curve indicate the sequence of iterations.
40
40 Generalization In most cases the multilayer network is trained with a finite number of examples of proper network behavior: This training set is normally representative of a much larger class of possible input/output pairs. Can the network successfully generalize what it has learned to the total population?
41
41 Generalization Example 1-2-1 1-9-1 For a network to be able to generalize, it should have fewer parameters than there are data points in the training set. Generalize well Not generalize well
42
42 Objectives The neural networks, trained in a supervised manner, require a target signal to define correct network behavior. The unsupervised learning rules give networks the ability to learn associations between patterns that occur together frequently. Associative learning allows networks to perform useful tasks such as pattern recognition (instar) and recall (outstar).
43
43 What is an Association? An association is any link between a system’s input and output such that when a pattern A is presented to the system it will respond with pattern B. When two patterns are link by an association, the input pattern is referred to as the stimulus and the output pattern is to referred to as the response.
44
44 Classic Experiment Ivan Pavlov He trained a dog to salivate at the sound of a bell, by ringing the bell whenever food was presented. When the bell is repeatedly paired with the food, the dog is conditioned to salivate at the sound of the bell, even when no food is present. B. F. Skinner He trained a rat to press a bar in order to obtain a food pellet.
45
45 Associative Learning Anderson and Kohonen independently developed the linear associator in the late 1960s and early 1970s. Grossberg introduced nonlinear continuous-time associative networks during the same time period.
46
46 Simple Associative Network Single-Input Hard Limit Associator Restrict the value of p to be either 0 or 1, indicating whether a stimulus is absent or present. The output a indicates the presence or absence of the network’s response.
47
47 Two Types of Inputs Unconditioned Stimulus Analogous to the food presented to the dog in Pavlov’s experiment. Conditioned Stimulus Analogous to the bell in Pavlov’s experiment. The dog salivates only when food is presented. This is an innate that does not have to be learned.
48
48 Banana Associator An unconditioned stimulus (banana shape) and a conditioned stimulus (banana smell) The network is to associate the shape of a banana, but not the smell.
49
49 Associative Learning Both animals and humans tend to associate things occur simultaneously. If a banana smell stimulus occurs simultaneously with a banana concept response (activated by some other stimulus such as the sight of a banana shape), the network should strengthen the connection between them so that later it can activate its banana concept in response to the banana smell alone.
50
50 Unsupervised Hebb Rule Increasing the weighting w ij between a neuron’s input p j and output a i in proportion to their product: Hebb rule uses only signals available within the layer containing the weighting being updated. Local learning rule Vector form: Learning is performed in response to the training sequence
51
51 Ex: Banana Associator Initial weights: Training sequence: Learning rule: ShapeSmell Fruit Network Banana ? Smell Sight
52
52 Ex: Banana Associator First iteration (sight fails): (no response) Second iteration (sight works): (banana)
53
53 Ex: Banana Associator Third iteration (sight fails): (banana) From now on, the network is capable of responding to bananas that are detected either sight or smell. Even if both detection systems suffer intermittent faults, the network will be correct most of the time.
54
54 Problems of Hebb Rule Weights will become arbitrarily large Synapses cannot grow without bound. There is no mechanism for weights to decrease If the inputs or outputs of a Hebb network experience ant noise, every weight will grow (however slowly) until the network responds to any stimulus.
55
55 Hebb Rule with Decay , the decay rate, is a positive constant less than one. This keeps the weight matrix from growing without bound, which can be found by setting both a i and p j to 1, i.e., The maximum weight value is determined by the decay rate .
56
56 Ex: Banana Associator First iteration (sight fails): no response Second iteration (sight works): banana Third iteration (sight fails): banana
57
57 Ex: Banana Associator Hebb RuleHebb with Decay
58
58 Prob. of Hebb Rule with Decay Associations will decay away if stimuli are not occasionally presented. If a i = 0, then If = 0.1, this reduces to The weight decays by 10% at each iteration for which a i = 0 (no stimulus)
59
59 Instar (Recognition Network) A neuron that has a vector input and a scalar output is referred to as an instar. This neuron is capable of pattern recognition. Instar is similar to perceptron, ADALINE and linear associator.
60
60 Instar Operation Input-output expression: The instar is active when or where is the angle between two vectors. If, the inner product is maximized when the angle is 0. Assume that all input vectors have the same length (norm).
61
61 Vector Recognition If, then the instar will be only active when = 0. If, then the instar will be active for a range of angles. The larger the value of b, the more patterns there will be that can activate the instar, thus making it the less discriminatory. Forgetting problem in Hebb rule with decay: it requires stimuli to be repeated or associations would be lost.
62
62 Instar Rule Hebb rule: Hebb rule with decay: Instar rule: a decay term, the forgetting problem, is add that is proportion to (a rule allow weight decay only when the instar is active (a !=0) If
63
63 Graphical Representation For the case where the instar is active ( ), For the case where the instar is inactive ( ),
64
64 Ex: Orange Recognizer The elements of p will be contained to 1 values. Sight Fruit Network Orange ? Measure b = -2, a value slightly more positive than –||p|| 2 = -3
65
65 Initialization & Training Initial weights: The instar rule ( =1 ): Training sequence: First iteration:
66
66 Second Training Iteration Second iteration: The network can now recognition the orange by its measurements.
67
67 Third Training Iteration Third iteration: Orange will now be detected if either set of sensors works.
68
68 P13.5 Consider the instar network shown in slide 64, the reaining sequence for this network will consist of following inputs: {p 0 (1) = 0, p(1)= }, {p 0 (2) = 1, p(2)= }, These two sets of inputs are repeatedly presented to the network until the weight matrix w converges. 1. Perform the first four iterations of the instar rule, with learning rate =0.5, Assume that the initial w maytrix is set to all zeros. 2. Display the results of each iteration of the instar rule in graphical form.
69
69 Kohonen Rule Kohonen rule: Learning occurs when the neuron’s index i is a member of the set X(q). The Kohonen rule can be made equivalent to the instar rule by defining X(q) as the set of all i such that The Kohonen rule allows the weights of a neuron to learn an input vector and is therefore suitable for recognition applications.
70
70 Ourstar (Recall Network) The outstar network has a scalar input and a vector output. It can perform pattern recall by associating a stimulus with a vector response.
71
71 Outstar Operation Input-output expression: If we would like the outstar network to associate a stimulus (an input of 1) with a particular output vector a *, set W = a *. If p = 1, a = satlins(Wp) = satlins(a * p) = a * Hence, the pattern is correctly recalled. The column of a weight matrix represents the pattern to be recalled.
72
72 Outstar Rule In instar rule, the weight decay term of Hebb rule is proportional to the output of network, a i. In outstar rule, the weight decay term of Hebb rule is proportional to the input of network, p j. If = , Learning occurs whenever p j is nonzero (instead of a i ). When learning occurs, column w j moves toward the output vector. (complimentary to instar rule)
73
73 Ex: Pineapple Recaller Any set of p 0 (with 1 values) will be copied to a. Sight Fruit Network Measurement? Measure
74
74 Initialization The outstar rule ( =1 ): Training sequence: Pineapple measurements:
75
75 First Training Iteration First iteration:
76
76 Second Training Iteration Second iteration: The network forms an association between the sight and the measurements.
77
77 Third Training Iteration Third iteration: Even if the measurement system fail, the network is now able to recall the measurements of the pineapple when it sees it.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.