Multilayer Perceptron: Learning : {(xi, f(xi)) | i = 1 ~ N} → W

Multilayer Perceptron: Learning : {(xi, f(xi)) | i = 1 ~ N} → W
Concept Map for Ch.3 Feed forward Network Nonlayered Layered Learning by BP Sigmoid    Multilayer Perceptron: y = F(x,W)  f(x) ALC Single Layer Multilayer Ch2,1 Ch 2 Ch 1 Learning : {(xi, f(xi)) | i = 1 ~ N} → W Old W Gradient Descent Actual Output Min E(W) Input - Backpropagation (BP) + Desired Output Scalar wij Matrix-Vector W New W

Chapter 3. Multilayer Perceptron
MLP Architecture – Extension of Perceptron to Many layers and Sigmoidal Activation functions – for real-valued mapping/classification

Learning: Discrete → Find W* → Continuous F(x, W*)  f(x)

 u j S 1 -1 1 Smaller Logistic Hyperbolic Tangent

NN Approximating Function
2. Weight Learning Rule – Backpropagation of Error Training Data ( )  Weights (W) : Curve (Data) Fitting (Modeling, NL Regression) NN Approximating Function True Function (2) Mean Squared Error E for 1-D function as an Example

Iteration = One scan of the training set
(3) Gradient Descent Learning (4) Learning Curve Number of Iterations , n E{ W(n), weight track } E Iteration = One scan of the training set (Epoch)

) ( y - xi d w (5) Backpropagation Learning Rule u
Features: Locality of Computation, No Centralized Control, 2-Pass j d ) ( k y - ij w jk u i xi A. Output Layer Weights B. Inner Layer Weights where where (Credit assignment)

Water Flow Analogy to Backpropagation
( Drop Object Here ) River Flow w1 Input Flow wl - Many weights (Flows) - If the error is very sensitive to a weight change, then change that weight a lot, and vice versa. → Gradient Descent , Minimum Disturbance Principle ( Fetch Object Here ) Output

h (6) Computation Example : MLP(2-1-2) A. Forward Processing :
Comp. Function Signals No desired response is needed for hidden nodes. must exist  = sigmoid [tanh or logistic] For classification, d = ± 0.9 for tanh, d = 0.1, 0.9 for logistic. h

v w sum y d e - = h B. Backward Processing - Comp. Error Signals
1 v w 2 sum 22 21 y d e - = h has been computed in forward processing

If we knew f(x,y), it would be a lot faster to use it to calculate the output than to use the NN.

Student Questions: Does the output error become more uncertain in case of complex multilayer than simple layer ? Should we use only up to 3 layers ? Why can oscillation occur in the learning curve ? Do we use the old weights for calculating the error signal δ ? What does ANN mean ? Which makes more sense, error gradient or the weight gradient considering the equation for weight change ? What becomes the error signal to train the weights in forward mode ?

Multilayer Perceptron: Learning : {(xi, f(xi)) | i = 1 ~ N} → W

Similar presentations

Presentation on theme: "Multilayer Perceptron: Learning : {(xi, f(xi)) | i = 1 ~ N} → W"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multilayer Perceptron: Learning : {(xi, f(xi)) | i = 1 ~ N} → W

Similar presentations

Presentation on theme: "Multilayer Perceptron: Learning : {(xi, f(xi)) | i = 1 ~ N} → W"— Presentation transcript:

Similar presentations

About project

Feedback