Presentation is loading. Please wait.

Presentation is loading. Please wait.

Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.

Similar presentations


Presentation on theme: "Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as."— Presentation transcript:

1 Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as universal approximators  The Backpropagation algorithm

2 Neural Networks - lecture 52 Multi-layer neural networks  Motivation  One layer neural networks have limited approximation capacity  Example: XOR (parity function) cannot be represented by using one layer but it can be represented by using two layers Two different architectures to solve the same problem

3 Neural Networks - lecture 53 Choosing the architecture Let us consider an association problem (classification or approximation) characterized by: N input data M output data The neural network will have: N input units M output units How many hidden units ? Difficult problem Heuristic hint: use as few hidden units as possible ! Example: one hidden layer with units

4 Neural Networks - lecture 54 Architecture and notations Feedforward network with K layers 01k Input layer Hidden layersOutput layer Y 0 =X 0 … … K W1W1 W2W2 WkWk W k+1 WKWK X1Y1F1X1Y1F1 XkYkFkXkYkFk XKYKFKXKYKFK

5 Neural Networks - lecture 55 Functioning Computation of the output vector FORWARD Algorithm (propagation of the input signal toward the output layer) Y[0]:=X (X is the input signal) FOR k:=1,K DO X[k]:=W[k]Y[k-1] Y[k]:=F(X[k]) ENDFOR Rmk: Y[K] is the output of the network

6 Neural Networks - lecture 56 A particular case One hidden layer Adaptive parameters: W1, W2

7 Neural Networks - lecture 57 Neural networks – universal approximators Theoretical result: Any continuous function T:D  R N ->R M can be approximated with arbitrary accuracy by a neural network having the following architecture: M input units N output units “Enough” hidden units having activation monotonously increasing and bounded activation functions (e.g. sigmoidal functions) The accuracy of the approximation depends on the number of hidden units.

8 Neural Networks - lecture 58 Neural networks – universal approximators Typical problems when solving approximation functions using neural networks: Representation problem: “can the network represent the desired function ?” –See the previous result Learning problem: “it is possible to find values for the adaptive parameters such that the desired function is approximated with the desired accuracy ?” –A training set and learning algorithm is needed Generalization problem: “ is the neural network able to extrapolate the knowledge extracted from the training set ?” –The training process should be carefully controlled in order to avoid overtraining and enhance the generalization ability

9 Neural Networks - lecture 59 Neural networks – universal approximators Applications which can be interpreted as association (approximation problems): Classification problems (association between a pattern and a class label) –Architecture: input size = patterns size; output size = no. of classes; hidden layer size = depending on the problem Prediction problems (based on a set of previous values of a time series estimate the next value) –Architecture: input size = number of previous values (predictors) ouput size = 1 (one-dimensional prediction) hidden layer size = depending on the problem Example: y(t)=T(y(t-1),y(t-2), …, y(t-N))

10 Neural Networks - lecture 510 Neural networks – universal approximators Compression problems (compress and decompress vectorial data) Input data Compressed date Output data Input size = output size Hidden layer size = input sized * compression ratio Example: for a compression ratio of 1:2 the hidden layer will have the half size if the input layer Training set: {(X 1,X 1 ),…,(X L,X L )} W1 W2

11 Neural Networks - lecture 511 Learning process Learning based on minimizing a error function Training set: {(x 1,d 1 ), …, (x L,d L )} Error function (one hidden layer): Aim of learning process: find W which minimizes the error function Minimization method: gradient method

12 Neural Networks - lecture 512 Learning process Gradient based adjustement xkxk ykyk xixi yiyi E l (W)

13 Neural Networks - lecture 513 Learning process Partial derivatives computation xkxk ykyk xixi yiyi

14 Neural Networks - lecture 514 The BackPropagation Algorithm Main idea: For each example in the training set: - compute the output signal - compute the error corresponding to the output level - propagate the error back into the network and store the corresponding delta values for each layer - adjust each weight by using the error signal and input signal for each layer Computation of the output signal (FORWARD) Computation of the error signal (BACKWARD)

15 Neural Networks - lecture 515 The BackPropagation Algorithm General structure Random initialization of weights REPEAT FOR l=1,L DO FORWARD stage BACKWARD stage weights adjustement ENDFOR Error (re)computation UNTIL Rmk.. The weights adjustment depends on the learning rate The error computation needs the recomputation of the output signal for the new values of the weights The stopping condition depends on the value of the error and on the number of epochs This is a so-called serial (incremental) variant: the adjustment is applied separately for each example from the training set epoch

16 Neural Networks - lecture 516 The BackPropagation Algorithm Batch variant Random initialization of weights REPEAT initialize the variables which will contain the adjustments FOR l=1,L DO FORWARD stage BACKWARD stage cumulate the adjustments ENDFOR Apply the cumulated adjustments Error (re)computation UNTIL Rmk.. The incremental variant can be sensitive to the presentation order of the training examples The batch variant is not sensitive to this order and is more robust to the errors in the training examples It is the starting algorithm for more elaborated variants, e.g. momentum variant epoch

17 Neural Networks - lecture 517 Problems of BackPropagation Low convergence rate (the error decreases too slow) Oscillations (the error value oscillates instead of continuously decreasing) Local minima problems (the learning process is stuck in a local minima of the error function) Stagnation (the learning process stagnates even if it is not a local minima) Overtraining and limited generalization

18 Neural Networks - lecture 518 Generalization capacity The generalization capacity of a neural network depends on the: Network architecture (e.g. number of hidden units) –A large number of hidden units can lead to overtraining (the network extracts not only the useful knowledge but also the noise in data) The size of the training set –Too few examples are not enough to train the network The number of epochs (accuracy on the training set) –Too many epochs could lead to overtraining


Download ppt "Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as."

Similar presentations


Ads by Google