Download presentation
Presentation is loading. Please wait.
Published byLauren Goodwin Modified over 5 years ago
1
Multilayer Perceptron: Learning : {(xi, f(xi)) | i = 1 ~ N} → W
Concept Map for Ch.3 Feed forward Network Nonlayered Layered Learning by BP Sigmoid Multilayer Perceptron: y = F(x,W) f(x) ALC Single Layer Multilayer Ch2,1 Ch 2 Ch 1 Learning : {(xi, f(xi)) | i = 1 ~ N} → W Old W Gradient Descent Actual Output Min E(W) Input - Backpropagation (BP) + Desired Output Scalar wij Matrix-Vector W New W
2
Chapter 3. Multilayer Perceptron
MLP Architecture – Extension of Perceptron to Many layers and Sigmoidal Activation functions – for real-valued mapping/classification
3
Learning: Discrete → Find W* → Continuous F(x, W*) f(x)
6
u j S 1 -1 1 Smaller Logistic Hyperbolic Tangent
7
NN Approximating Function
2. Weight Learning Rule – Backpropagation of Error Training Data ( ) Weights (W) : Curve (Data) Fitting (Modeling, NL Regression) NN Approximating Function True Function (2) Mean Squared Error E for 1-D function as an Example
10
Iteration = One scan of the training set
(3) Gradient Descent Learning (4) Learning Curve Number of Iterations , n E{ W(n), weight track } E Iteration = One scan of the training set (Epoch)
13
) ( y - xi d w (5) Backpropagation Learning Rule u
Features: Locality of Computation, No Centralized Control, 2-Pass j d ) ( k y - ij w jk u i xi A. Output Layer Weights B. Inner Layer Weights where where (Credit assignment)
15
Water Flow Analogy to Backpropagation
( Drop Object Here ) River Flow w1 Input Flow wl - Many weights (Flows) - If the error is very sensitive to a weight change, then change that weight a lot, and vice versa. → Gradient Descent , Minimum Disturbance Principle ( Fetch Object Here ) Output
16
h (6) Computation Example : MLP(2-1-2) A. Forward Processing :
Comp. Function Signals No desired response is needed for hidden nodes. must exist = sigmoid [tanh or logistic] For classification, d = ± 0.9 for tanh, d = 0.1, 0.9 for logistic. h
17
v w sum y d e - = h B. Backward Processing - Comp. Error Signals
1 v w 2 sum 22 21 y d e - = h has been computed in forward processing
20
If we knew f(x,y), it would be a lot faster to use it to calculate the output than to use the NN.
24
Student Questions: Does the output error become more uncertain in case of complex multilayer than simple layer ? Should we use only up to 3 layers ? Why can oscillation occur in the learning curve ? Do we use the old weights for calculating the error signal δ ? What does ANN mean ? Which makes more sense, error gradient or the weight gradient considering the equation for weight change ? What becomes the error signal to train the weights in forward mode ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.