Biological and Artificial Neuron Neural Networks Learning Processes Biological and Artificial Neuron Weights, need to be determined Biological neuron Bias, need to be determined Artificial neuron
Application of Neural Networks Learning Processes Application of Neural Networks Function approximation and prediction Pattern recognition Signal processing Modeling and control Machine learning
Building a Neural Network Neural Networks Learning Processes Building a Neural Network Select Structure: design the way that the neurons are interconnected. Select weights: decide the strengths with which the neurons are interconnected. Weights are selected to get a “good match” of network output to the output of a training set. Training set is a set of inputs and desired outputs. The weight selection is conducted by the use of a learning algorithm.
Artificial neural network Neural Networks Learning Processes Learning Process Stage 1: Network Training Artificial neural network Training Data Learning Process Knowledge Input and output sets, adequate coverage In the form of a set of optimized synaptic weights and biases Stage 2: Network Validation Artificial neural network Output Prediction Unseen Data Implementation Phase From the same range as the training data
Neural Networks Learning Processes Learning Process Learning is a process by which the free parameters of a neural network are adapted through a process of stimulation by the environment in which the network is embedded. In most cases, due to complex optimization plane, the optimized weights and biases are obtained as a result of a number of learning iterations. ANN [w,b] x y Initialize: Iteration (0) [w,b]0 x y(0) Iteration (1) [w,b]1 x y(1) … Iteration (n) [w,b]n x y(n) ≈ d d : desired output
Learning Rules Error Correction Learning Neural Networks Learning Processes Learning Rules Error Correction Learning Delta Rule or Widrow-Hoff Rule Memory Based Learning Nearest Neighbor Rule Hebbian Learning Synchronous activation increases the synaptic strength Asynchronous activation decreases the synaptic strength Competitive Learning Boltzmann Learning
Error-Correction Learning Neural Networks Learning Processes Error-Correction Learning Activation function wk1(n) x1 Desired output dk (n) wk2(n) x2 + Output yk (n) S f(.) S Inputs - Synaptic weights wkm(n) Error signal xm bk(n) ek (n) Bias 1 Learning Rule
Delta Rule (Widrow-Hoff Rule) Neural Networks Learning Processes Delta Rule (Widrow-Hoff Rule) Minimization of a cost function (or performance index)
Delta Rule (Widrow-Hoff Rule) Neural Networks Learning Processes Delta Rule (Widrow-Hoff Rule) wkj(0) = 0 n = 0 “Least Mean Square” Rule yk(n) = S [wkj(n) xj(n)] wkj(n+1) = wkj(n) + h [dk(n) – yk(n)] xj(n) h : learning rate, [0…1] n = n+1
Learning Paradigm Supervised Unsupervised S ANN ANN Environment (Data) Neural Networks Learning Processes Learning Paradigm Supervised Unsupervised Environment (Data) Delay ANN Delayed Reinforcement Learning Cost Function S ANN Error Desired Actual + - Environment (Data) Teacher (Expert)
Single Layer Perceptrons Neural Networks Single Layer Perceptrons Single Layer Perceptrons Single-layer perceptron network is a network with all the inputs connected directly to the output(s). Output unit is independent of the others. Analysis can be limited to single output perceptron.
Derivation of a Learning Rule for Perceptrons Neural Networks Single Layer Perceptrons Derivation of a Learning Rule for Perceptrons Key idea: Learning is performed by adjusting the weights in order to minimize the sum of squared errors on a training. Weights are updated repeatedly (in each epoch/iteration). Sum of squared errors is a classical error measure (e.g. commonly used in linear regression). E(w) Learning can be viewed as an optimization search problem in weight space. w1 w2
Derivation of a Learning Rule for Perceptrons Neural Networks Single Layer Perceptrons Derivation of a Learning Rule for Perceptrons The learning rule performs a search within the solution's vector space towards a global minimum. The error surface itself is a hyper-paraboloid but is seldom as smooth as is depicted below. In most problems, the solution space is quite irregular with numerous pits and hills which may cause the network to settle down in a local minimum (not the best overall solution). Epochs are repeated until stopping criterion is reached (error magnitude, number of iterations, change of weights, etc).
Derivation of a Learning Rule for Perceptrons Neural Networks Single Layer Perceptrons Derivation of a Learning Rule for Perceptrons x1 x2 xm wk1 wk2 wkm . Adaline (Adaptive Linear Element) Widrow [1962] Goal:
Least Mean Squares (LMS) Neural Networks Single Layer Perceptrons Least Mean Squares (LMS) The following cost function (error function) should be minimized:
Least Mean Squares (LMS) Neural Networks Single Layer Perceptrons Least Mean Squares (LMS) Letting f(wk) = f (wk1, wk2,…, wkm) be a function over Rm, then Defining
Gradient Operator f w f w f w df : positive df : zero Neural Networks Single Layer Perceptrons Gradient Operator f w f w f w df : positive df : zero df : negative go uphill plain go downhill To minimize f , we choose df is thus guaranteed to be always negative
Adaline Learning Rule With then As already obtained before, Defining Neural Networks Single Layer Perceptrons Adaline Learning Rule With then As already obtained before, Weight Modification Rule Defining we can write
Adaline Learning Modes Neural Networks Single Layer Perceptrons Adaline Learning Modes Batch Learning Mode Incremental Learning Mode
Adaline Learning Rule -Learning Rule LMS Algorithm Neural Networks Single Layer Perceptrons Adaline Learning Rule -Learning Rule LMS Algorithm Widrow-Hoff Learning Rule
Generalization and Early Stopping Neural Networks Single Layer Perceptrons Generalization and Early Stopping By proper training, a neural network may produce reasonable output for inputs not seen during training Generalization Generalization is particularly useful for the analysis of a “noisy” data (e.g. time–series) “Overtraining” will not improve the ability of a neural network to produce good output. On the contrary, it will try to take noise as the real data and lost its generality.
Generalization and Early Stopping Neural Networks Single Layer Perceptrons Generalization and Early Stopping Overfitting vs Generalization
Neural Networks Single Layer Perceptrons Homework 2 Given a function y = 4x2, you are required to find the value of x that will result y = 2 by using the Least Mean Squares method. Use initial estimate x0 = 1 and learning rate η = 0.01. Write down the results of the first 10 epochs/iterations. Give conclusion about your result. Note: Calculation can be done manually or using Matlab.
Neural Networks Single Layer Perceptrons Homework 2A Given a function y = 2x3 + cos2x, you are required to find the value of x that will result y = 5 by using the Least Mean Squares method. Use initial estimate x0 = 0.2*Student ID and learning rate η = 0.01. Write down the results of the first 10 epochs/iterations. Give conclusion about your result. Note: Calculation can be done manually or using Matlab/Excel.