Disadvantages of Discrete Neurons

Disadvantages of Discrete Neurons
Only boolean valued functions can be computed A simple learning algorithm for multi-layer discrete-neuron perceptrons is lacking The computational capabilities of single-layer discrete-neuron perceptrons is limited These disadvantages disappear when we consider multi-layer continuous-neuron perceptrons 23-Nov-18 Rudolf Mak TU/e Computer Science

Preliminaries A continuous-neuron perceptron with n input and m outputs computes: a function Rn ! [0,1]m ,when the sigmoid activation function is used a function Rn ! Rm ,when a linear activation function is used The learning rules for continuous-neuron perceptrons are based on optimization techniques for error-functions. This requires a continuous and differentiable error function. [0,1] denotes an interval Single-layer cn-perceptrons are also limited. Two-layers can approximate any continuous function 23-Nov-18 Rudolf Mak TU/e Computer Science

Sigmoid transfer function
Similar property for tanh. For that function derivative can also be expressed in the original function. d tanh(x)/dx = tanh2(x) -1 Tanh(z/2) = 2 sig(z) -1 Small practical advantage using tanh 23-Nov-18 Rudolf Mak TU/e Computer Science

Computational Capabilities
Let g:[0,1]n!R be a continuous function and let Then there exists a two layer perceptron with: First layer build from neurons with threshold and standard sigmoid activation function Second layer build from one neuron without threshold and linear activation function such that the function G computed by this network satis- fies g(x) = Σxn/n! g(n)(o) G(x) = Σwngn(x) Truncated Taylor series gn(x) = xn Other basis function are possible Sin cosine (Fourier) Orthogonal polynomials How-many neurons needed? We start with single-layer (single neuron) networks 23-Nov-18 Rudolf Mak TU/e Computer Science

Single-layer networks
Compute function from Rn to [0, 1]m Sufficient to consider a single neuron Compute a function f(w0 + 1 · j · n wjxj ) Assume x0 = 1 then compute a function f(0 · j · n wjxj ) Limited capabilities for single layer networks 23-Nov-18 Rudolf Mak TU/e Computer Science

Error function Again weights are extended with bias w_0 and inputs with component xo = 1 We do not use the prime notation any longer Factor ½ is for computational convenience 23-Nov-18 Rudolf Mak TU/e Computer Science

Gradient Descent 23-Nov-18 Rudolf Mak TU/e Computer Science
Least mean square error function LMS 23-Nov-18 Rudolf Mak TU/e Computer Science

Update of Weight i by Training Pair q
Hence Δw is in the direction of x Simple cases arise when f is the sigmoid or tanh Even simpler when f is the identity function f(z) = z. Then f’(z) = 1. 23-Nov-18 Rudolf Mak TU/e Computer Science

Delta Rule Learning (incremental version, arbitrary transfer function)
In the lecture notes vector manipulation is replaced by a repetition For i:= 0 to n do wi := wi + alpha (t-y) dy xi 23-Nov-18 Rudolf Mak TU/e Computer Science

Stopcriteria The mean square error becomes small enough
The mean square error does not decrease any- more, i.e. the gradient has become very small or even changes sign The maximum number of iterations has been exceeded 23-Nov-18 Rudolf Mak TU/e Computer Science

Remarks Delta rule learning is also called L(east) M(ean) S(quare) learning or Widrow Hoff learning Note that the incremental version of the delta rule is strictly not a gradient descent algorithm, because in each step a different error function E(q) is used Convergence of the incremental version can only be guaranteed if the learning parameter a goes to 0 during learning 23-Nov-18 Rudolf Mak TU/e Computer Science

Perceptron Learning Rule (batch version, arbitrary transfer function)
23-Nov-18 Rudolf Mak TU/e Computer Science

Perceptron Learning Delta Rule (batch version, sigmoidal transfer function)

Perceptron Learning Rule (batch version, linear transfer function)

Convergence of the batch version
For small enough learning parameter the batch version of the delta rule always converges. The resulting weights, however, may correspond to a local minimum of the error function, instead of the global minimum Batch always converges, for linear neuron we will Analyze this further 23-Nov-18 Rudolf Mak TU/e Computer Science

Linear Neurons and Least Squares

C is non-singular 23-Nov-18 Rudolf Mak TU/e Computer Science

Linear Least Squares Convergence

Rudolf Mak TU/e Computer Science
Gradient is a linear operator Recall alpha’ = P alpha Inspect batch version X = <x(1), …, x(P)> 23-Nov-18 Rudolf Mak TU/e Computer Science

Linear Least Squares Convergence

Find the line: 23-Nov-18 Rudolf Mak TU/e Computer Science

Solution: 23-Nov-18 Rudolf Mak TU/e Computer Science

Disadvantages of Discrete Neurons

Similar presentations

Presentation on theme: "Disadvantages of Discrete Neurons"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Disadvantages of Discrete Neurons

Similar presentations

Presentation on theme: "Disadvantages of Discrete Neurons"— Presentation transcript:

Similar presentations

About project

Feedback