Neural Networks
What are they Models of the human brain used for computational purposes Brain is made up of many interconnected neurons
What is a neuron
Components of biological neuron Dendrites – serve as inputs Soma Cell of neuron which contains Nucleus Nucleus- processing component of neuron Axon along which output goes Synapse - ends across whose gap connection is made between other neurons
How does it work Signals move from neuron to neuron via electrochemical reactions. The synapses release a chemical transmitter which enters the dendrite. This raises or lowers the electrical potential of the cell body. The soma sums the inputs it receives and once a threshold level is reached an electrical impulse is sent down the axon (often known as firing). These impulses eventually reach synapses and the cycle continues.
Synapses Synapses which raise the potential within a cell body are called excitatory. Synapses which lower the potential are called inhibitory. It has been found that synapses exhibit plasticity. This means that long-term changes in the strengths of the connections can be formed depending on the firing patterns of other neurons. This is thought to be the basis for learning in our brains.
Artificial model of neuron
Diagram aj:Activation value of unit j wj,i:Weight on the link from unit j to unit i in i :Weighted sum of inputs to unit i ai:Activation value of unit i (also known as the output value) g:Activation function
How does this work A neuron is connected to other neurons via its input and output links. Each incoming neuron has an activation value and each connection has a weight associated with it. The neuron sums the incoming weighted values and this value is input to an activation function. The output of the activation function is the output from the neuron.
Common ActivationFunctions
Some common activation functions in more detail These functions can be defined as follows. Stept(x)=1 if x >= t, else 0 Sign(x)=+1 if x >= 0, else –1 Sigmoid(x)=1/(1+e-x) On occasions an identify function is also used (i.e. where the input to the neuron becomes the output). This function is normally used in the input layer where the inputs to the neural network are passed into the network unchanged.
A brief history of Neural Networks In 1943 two scientists, Warren McCulloch and Walter Pitts, proposed the first artificial model of a biological neuron [McC]. This synthetic neuron is still the basis for most of today’s neural networks. Rosenblatt came up with his two layered perceptron which was subsequently shown to be defective by Papert and Minsky which lead to a huge decline in funding and interest in Neural Networks.
The bleak years During this period, even though there was a lack of funding and interest in neural networks, a small number of researchers continued to investigate the potential of neural models. A number of papers were published, but none had any great impact. Many of these reports concentrated on the potential of neural networks for aiding in the explanation of biological behaviour (e.g. [Mal], [Bro], [Mar], [Bie], [Coo]). Others focused on real world implementations. In 1972 Teuvo Kohonen and James A. Anderson independently proposed the same model for associative memory [Koh], [An1] In 1976 Marr and Poggio applied a neural network to a realistic problem in computational vision, stereopsis [Mar]. Other projects included [Lit], [Gr1], [Gr2], [Ama], [An2], [McC].
The Discovery of Backpropagation The backpropagation learning algorithm was developed independently by Rumelhart [Ru1], [Ru2], Le Cun [Cun] and Parker [Par] in It was subsequently discovered that the algorithm had also been described by Paul Werbos in his Harvard Ph.D thesis in 1974 [Wer]. Error backpropagation networks are the most widely used neural network model as they can be applied to almost any problem that requires pattern mapping. It was the discovery of this paradigm that brought neural networks out of the research area and into real world implementation.
Interests in neural network differ according to profession. Neurobiologists and psychologists -understanding our brain Engineers and physicists -a tool to recognise patterns in noisy data (see Ts at right) Business analysts and engineers -a tool for modelling data Computer scientists and mathematicians - networks offer an alternative model of computing: machines that may be taught rather than programmed Artificial Intelligensia, cognitive scientists and philosophers -Subsymbolic processing (reasoning with patterns, not symbols)
Backpropagation Network Architecture A backpropagation network typically consists of three or more layers of nodes. The first layer is the known as the input layer and the last layer is known as the output layer. Any layers of nodes in between the input and output layers are known as hidden layers. Each unit in a layer is connected to every unit in the next layer. There are no interlayer connections.
Backpropagation
Operation of the network The operation of the network consists of a forward pass of the input through the network (forward propagation) and then a backward pass of an error value which is used in the weight modification (Backward Propagation)
Forward Propagation A forward propagation step is initiated when an input pattern is presented to the network. No processing is performed at the input layer. The pattern is propagated forward to the next layer, and each node in this layer performs a weighted sum on all its inputs After this sum has been calculated, a function is used to compute the unit’s output.
Example XOR
Layers of the Network The Input Layer The input layer of a backpropagation network acts solely as a buffer to hold the patterns being presented to the network. Each node in the input layer corresponds to one entry in the pattern. No processing is done at the input layer. The pattern is fed forward from the input layer to the next layer.
The Hidden Layers is the hidden layers which give the backpropagation network its exceptional computation abilities. The units in the hidden layers act as “feature detectors”. They extract information from the input patterns which can be used to distinguish between particular classes. The network creates its own internal representation of the data.
The Output Layer The output layer of a network uses the response of the feature detectors in the hidden layer. Each unit in the output layer emphasises each feature according to the values of the connecting weights. The pattern of activation at this layer is taken as the network’s response.
The sigmoid function The function used to perform this operation is the sigmoid function, The main reason why this particular function is chosen is that its derivative, which is used in the learning law, is easily computed. The result obtained after applying this function to the net input is taken to be the node’s output value. This process is continued until the pattern has been propagated through the entire network and reaches the output layer. The activation pattern at the output layer is taken as the network’s result.
Linear Separability and the XOR Problem Consider two-input patterns being classified into two classes Each point with either symbol of or represents a pattern with a set of values. Each pattern is classified into one of two classes. Notice that these classes can be separated with a single line. They are known as linearly separable patterns. Linear separability refers to the fact that classes of patterns with -dimensional vector can be separated with a single decision surface. In the case above, the line represents the decision surface.
Diagram
Xor The most classic example of linearly inseparable pattern is a logical exclusive- OR (XOR) function. Shown in the next figure is the illustration of XOR function that two classes, 0 for black dot and 1 for white dot, cannot be separated with a single line.
XOR linearly inseparable
The significance of This XOR is separable in 3 dimensions but obviously not in 2. So many classifiers will need more than 2 layers to classify Minsky and Papert pointed out that perceptrons in 2 layers couldn’t learn in 3 dimensions or more as far as they could see Because so many problems are like Xor then according to these stars of AI neural networks had limited applicability
But they were wrong Backpropagation showed that neural networks could learn in 3 and more dimensions However such was the stature of this pair that this impacted negatively on research in Neural networks for 2 decades However the work Werbos and Parker and Rumelhart proved them wrong and in 1987 working multilayer networks were working away and learning and have become a he industry
Backward Propagation The first step in the backpropagation stage is the calculation of the error between the network’s result and the desired response. This occurs when the forward propagation phase is completed. Each processing unit in the output layer is compared to its corresponding entry in the desired pattern and an error is calculated for each node in the output layer. The weights are then modified for all of the connections going into the output layer. Next, the error is backpropagated to the hidden layers and by using the generalised delta rule, the weights are adjusted for all connections going into the hidden layer. The procedure is continued until the last layer of weights have been modified. The forward and backward propagation phases are repeated until the network’s output is equal to the desired result.
The Backpropagation Learning Law The Learning Law used is known as the Generalised Delta Rule. It allows for the adjustment of the weights in the hidden layer, a feat deemed impossible by Minsky and Papert. It uses the derivative of the activation function of nodes (which in most cases is the sigmoid function) to determine the extent of the adjustment to the weights connecting to the hidden layers. In other words, the network learns from its errors and uses the difference between expected and actual results(the error) to make adjustments.
Example Calculate the weight adjustments in the following network for expected outputs of {1,1} and the learning rate is 1: The Target Values are 1, 1 and the learning rate is 1
Sample Neural Network
Hidden Layer Computation Xi =iW1 = 1 * * -1 = 1, 1 * * 1 = -1 = { 1 - 1} = {Xi1,Xi2} = Xi
h = F(X) h1 = F(Xi1) = F(1) h2 = F(Xi2) = F(-1)
Output Layer Computation X = hW2 = 0.73 * * 0 = -0.73, 0.73 * * -1 = = { } = {X1,X2} = X
O = F(X) O1 = F(X1) O2 = F(X2)
Error d1 = 0.7( )( ) = 0.7 (0.3)(-0.3) = d2 = 0.6( )( ) = 0.6(0.4)(-0.4) =
Error Calculation e = h(1 - h)W2d
Another Way to write the error e1 = (h1(1-h1)+ h2(1-h2)) W11 D1 +W12D2 e2 =( h1(1-h1)+ h2(1-h2)) W21 D1 +W22D2 e1 = (0.73(1-0.73)+ 0.27(1-0.27))( -1* *-0.096) e2 =( 0.73(1-0.73)+ 0.27(1-0.27)) (0 * *-0.096) e1 = (0.73(0.27)+ 0.27(0.73))( 0.063) e2 =( 0.73(0.27)+ 0.27(0.73)) (0.096) e1 = * = e2 = * = 0.038
Weight Adjustment △ W2t = α hd + Θ △ W2t-1 where α = 1
Weight Change