Download presentation
Presentation is loading. Please wait.
Published byLeslie Collins Modified over 6 years ago
1
One-layer neural networks Approximation problems
Architecture and functioning (ADALINE, MADALINE) Learning based on error minimization The gradient algorithm Widrow-Hoff and “delta” algorithms Neural Networks - lecture 4
2
Approximation problems
Approximation (regression): Problem: estimate a functional dependence between two variables The training set contains pairs of corresponding values Linear approximation Nonlinear approximation Neural Networks - lecture 4
3
Neural Networks - lecture 4
Architecture One layer NN = one layer of input units and one layer of functional units Fictive unit -1 W X Y Total connectivity Output vector Input vector N input units M functional units (output units Neural Networks - lecture 4
4
Neural Networks - lecture 4
Functioning Computing the output signal: Usually the activation function is linear Examples: ADALINE (ADAptive LINear Element) MADALINE (Multiple ADAptive LINear Element) Neural Networks - lecture 4
5
Learning based on error minimization
Training set: {(X1,d1),…,(XL,dL)}, Xl - vector from RN, dl – vector from RM Error function: measure of the “distance between the output produced by the network and the desired output Notations: Neural Networks - lecture 4
6
Learning based on error minimization
Learning = optimization task = find W which minimizes E(W) Variants: In the case of linear activation functions W can be computed by using tools from linear algebra In the case of nonlinear functions the minimum can be estimated by using a numerical method Neural Networks - lecture 4
7
Learning based on error minimization
First variant. Particular case: M=1 (one output unit with linear activation function) L=1 (one example) Neural Networks - lecture 4
8
Learning based on error minimization
First variant: Neural Networks - lecture 4
9
Learning based on error minimization
Second variant: use of a numerical minimization method Gradient method: Is an iterative method based on the idea that the gradient of a function indicates the direction on which the function is increasing In order to estimate the minimum of a function the current position is moved in the opposite direction of the gradient Neural Networks - lecture 4
10
Learning based on error minimization
Gradient method: Direction opposite to the gradient Direction opposite to the gradient f’(x)<0 f’(x)>0 xk-1 x1 x0 Neural Networks - lecture 4
11
Learning based on error minimization
Algorithm to minimize E(W) based on the gradient method: Initialization: W(0):=initial values, k:=0 (iteration counter) Iterative process REPEAT W(k+1)=W(k)-eta*grad(E(W(k))) k:=k+1 UNTIL a stopping condition is satisfied Neural Networks - lecture 4
12
Learning based on error minimization
Remark: the gradient method is a local optimization method = it can be easily trapped in local minima Neural Networks - lecture 4
13
Widrow-Hoff algorithm
= learning algorithm for a linear network = it minimizes E(W) by applying a gradient-like adjustment for each example from the training set Gradient computation: Neural Networks - lecture 4
14
Widrow-Hoff algorithm
Algorithm’s structure: Initialization: wij(0):=rand(-1,1) (the weights are randomly initialized in [-1,1]), k:=0 (iteration counter) Iterative process REPEAT FOR l:=1,L DO Compute yi(l) and deltai(l)=di(l)-yi(l), i=1,M Adjust the weights: wij:=wij+eta*deltai(l)*xj(l) Compute the E(W) for the new values of the weights k:=k+1 UNTIL E(W)<E* OR k>kmax Neural Networks - lecture 4
15
Widrow-Hoff algorithm
Remarks: If the error function has only one optimum the algorithm converges (but not in a finite number of steps) to the optimal values of W The convergence speed is influenced by the value of the learning rate (eta) The value E* is a measure of the accuracy we expect to obtain Is one of the simplest learning algorithms but it can by applied only for one-layer networks with linear activation functions Neural Networks - lecture 4
16
Neural Networks - lecture 4
Delta algorithm = algorithm similar with Widrow-Hoff but for networks with nonlinear activation functions = the only difference is in the gradient computation Gradient computation: Neural Networks - lecture 4
17
Neural Networks - lecture 4
Delta algorithm Particularities: 1. The error function can have many minima, thus the algorithm can be trapped in one of these (meaning that the learning is not complete) 2. For sigmoidal functions the derivates can be computed in an efficient way by using the following relations Neural Networks - lecture 4
18
Limits of one-layer networks
The one layer networks have limited capability being able only to: Solve simple (e.g. linearly separable) classification problems Approximate simple (e.g. linear) dependences Solution: include hidden layers Remark: the hidden units should have nonlinear activation functions Neural Networks - lecture 4
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.