Presentation is loading. Please wait.

Presentation is loading. Please wait.

Disadvantages of Discrete Neurons

Similar presentations


Presentation on theme: "Disadvantages of Discrete Neurons"— Presentation transcript:

1 Disadvantages of Discrete Neurons
Only boolean valued functions can be computed A simple learning algorithm for multi-layer discrete-neuron perceptrons is lacking The computational capabilities of single-layer discrete-neuron perceptrons is limited These disadvantages disappear when we consider multi-layer continuous-neuron perceptrons 23-Nov-18 Rudolf Mak TU/e Computer Science

2 Preliminaries A continuous-neuron perceptron with n input and m outputs computes: a function Rn ! [0,1]m ,when the sigmoid activation function is used a function Rn ! Rm ,when a linear activation function is used The learning rules for continuous-neuron perceptrons are based on optimization techniques for error-functions. This requires a continuous and differentiable error function. [0,1] denotes an interval Single-layer cn-perceptrons are also limited. Two-layers can approximate any continuous function 23-Nov-18 Rudolf Mak TU/e Computer Science

3 Sigmoid transfer function
Similar property for tanh. For that function derivative can also be expressed in the original function. d tanh(x)/dx = tanh2(x) -1 Tanh(z/2) = 2 sig(z) -1 Small practical advantage using tanh 23-Nov-18 Rudolf Mak TU/e Computer Science

4 Computational Capabilities
Let g:[0,1]n!R be a continuous function and let Then there exists a two layer perceptron with: First layer build from neurons with threshold and standard sigmoid activation function Second layer build from one neuron without threshold and linear activation function such that the function G computed by this network satis- fies g(x) = Σxn/n! g(n)(o) G(x) = Σwngn(x) Truncated Taylor series gn(x) = xn Other basis function are possible Sin cosine (Fourier) Orthogonal polynomials How-many neurons needed? We start with single-layer (single neuron) networks 23-Nov-18 Rudolf Mak TU/e Computer Science

5 Single-layer networks
Compute function from Rn to [0, 1]m Sufficient to consider a single neuron Compute a function f(w0 + 1 · j · n wjxj ) Assume x0 = 1 then compute a function f(0 · j · n wjxj ) Limited capabilities for single layer networks 23-Nov-18 Rudolf Mak TU/e Computer Science

6 Error function Again weights are extended with bias w_0 and inputs with component xo = 1 We do not use the prime notation any longer Factor ½ is for computational convenience 23-Nov-18 Rudolf Mak TU/e Computer Science

7 Gradient Descent 23-Nov-18 Rudolf Mak TU/e Computer Science
Least mean square error function LMS 23-Nov-18 Rudolf Mak TU/e Computer Science

8 Update of Weight i by Training Pair q
Hence Δw is in the direction of x Simple cases arise when f is the sigmoid or tanh Even simpler when f is the identity function f(z) = z. Then f’(z) = 1. 23-Nov-18 Rudolf Mak TU/e Computer Science

9 Delta Rule Learning (incremental version, arbitrary transfer function)
In the lecture notes vector manipulation is replaced by a repetition For i:= 0 to n do wi := wi + alpha (t-y) dy xi 23-Nov-18 Rudolf Mak TU/e Computer Science

10 Stopcriteria The mean square error becomes small enough
The mean square error does not decrease any- more, i.e. the gradient has become very small or even changes sign The maximum number of iterations has been exceeded 23-Nov-18 Rudolf Mak TU/e Computer Science

11 Remarks Delta rule learning is also called L(east) M(ean) S(quare) learning or Widrow Hoff learning Note that the incremental version of the delta rule is strictly not a gradient descent algorithm, because in each step a different error function E(q) is used Convergence of the incremental version can only be guaranteed if the learning parameter a goes to 0 during learning 23-Nov-18 Rudolf Mak TU/e Computer Science

12 Perceptron Learning Rule (batch version, arbitrary transfer function)
23-Nov-18 Rudolf Mak TU/e Computer Science

13 Perceptron Learning Delta Rule (batch version, sigmoidal transfer function)
23-Nov-18 Rudolf Mak TU/e Computer Science

14 Perceptron Learning Rule (batch version, linear transfer function)
23-Nov-18 Rudolf Mak TU/e Computer Science

15 Convergence of the batch version
For small enough learning parameter the batch version of the delta rule always converges. The resulting weights, however, may correspond to a local minimum of the error function, instead of the global minimum Batch always converges, for linear neuron we will Analyze this further 23-Nov-18 Rudolf Mak TU/e Computer Science

16 Linear Neurons and Least Squares
23-Nov-18 Rudolf Mak TU/e Computer Science

17 Linear Neurons and Least Squares
23-Nov-18 Rudolf Mak TU/e Computer Science

18 C is non-singular 23-Nov-18 Rudolf Mak TU/e Computer Science

19 Linear Least Squares Convergence
23-Nov-18 Rudolf Mak TU/e Computer Science

20 Rudolf Mak TU/e Computer Science
Gradient is a linear operator Recall alpha’ = P alpha Inspect batch version X = <x(1), …, x(P)> 23-Nov-18 Rudolf Mak TU/e Computer Science

21 Linear Least Squares Convergence
23-Nov-18 Rudolf Mak TU/e Computer Science

22 Find the line: 23-Nov-18 Rudolf Mak TU/e Computer Science

23 Solution: 23-Nov-18 Rudolf Mak TU/e Computer Science


Download ppt "Disadvantages of Discrete Neurons"

Similar presentations


Ads by Google