Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10

Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10 Email lee@hud.ac.uklee@hud.ac.uk http://scom.hud.ac.uk/scomtlm/cha2555/

RECAP: Simple Model of an Artificial Neuron (McCulloch and Pitts 1943) A set of synapses (i.e. connections) brings in activations (inputs) from other neurons. A processing unit sums the inputs x weights, and then applies a transfer function using a “threshold value” to see if the neuron “fires”. An output line transmits the result to other neurons (output can be binary or continuous). If the sum does not reach the threshold, output is 0.

Another Look at XOR We showed in a previous lecture that the XOR truth table can not be realised using a single-layer perceptron network; because it is not linearly separable. Multi-Layer networks / multi-layer perceptrons (MLNs) are able to deal with non-linearly separable problems. We can use a MLN to classify the XOR data using two separating lines (and the step function).

Constructing the Network Here we are constructing the two required separating lines i.e., I 1 = -x 1 - x 2 + 1.5 I 2 = -x 1 - x 2 + 0.5 Here we are combining the information into a single output Consider the following feed forward, fully connected 2-2-1 network:

Evaluating the Network x1x1 x2x2 I1I1 I2I2 f(I 1 )f(I 2 ) 11-0.5-1.500 100.5-0.510 010.5-0.510 001.50.511 Inputs tTotal input hidden layer Output from hidden layer We can calculate the activations of the hidden layer for the network I 1 = -x 1 - x 2 + 1.5 I 2 = -x 1 - x 2 + 0.5

Perceptrons To determine whether the j th output node should fire, we calculate the value If this value exceeds 0 the neuron will fire otherwise it will not fire.

Multi-layer Perceptrons (MLPs) In general, MLPs use the sigmoid activation function: The sigmoid function is mathematically more “user friendly” than the step function. Due to the asymptotic nature of the sigmoid function it is unrealistic to expect values of 0 and 1 to be realised exactly. It is usual to relax the output requirements to target values of 0.1 and 0.9. By adopting the sigmoid function with a more complex architecture, the multi-layer perceptron is able to solve complicated problems.

Backpropagation learning Pseudo code: Assume all weights have been initialised randomly to [-1,1] REPEAT NOCHANGES = TRUE For each input pattern 1.Perform a forward sweep to find the actual output 2.Calculate network errors t j – o j 3.If any t j – o j > TOLERANCE set NOCHANGES = FALSE 4.DO BACKPROPAGATION to determine weight changes 5.Update weights UNTIL NOCHANGES

The Backpropagation Algorithm The change to make to a weight called Δw ij is got by “gradient descent”. It is based on the “delta value” δ j for an output node j, which represents the error at output j, Defined by δ j = “difference between output required and output observed” times “gradient of the threshold function” = ( t j - o j ) * df/dx f is the threshold function 1/(1 – e^(-x)); o j = f(input), the output at j Hence (do some differentiation) δ j = ( t j - o j ) * o j *(1- o j ) for an output node j.

The Backpropagation Algorithm So for the weight before output nodes …. (new weight) w ij = (old weight) w ij ’ + o i * “learning rate” * δ j And for the weight before hidden nodes similarly …. (new weight) w ij = (old weight) w ij ’ + o i * “learning rate” * (sum of w kj * δ k ) Where j – k is a link output from j

Hidden Layers and Hidden Nodes The question of how many hidden layers to use and how many nodes each layer should contain needs to be addressed and answered. Consider first an m-1-n network with n input nodes, m output nodes and just a single node in the hidden layer. This produces m+n weights. It is useful to regard the weights as being degrees of freedom in the network. Adding a second node to the hidden layer doubles the freedom in the network; producing 2(m+n) weights. It is not difficult to see the effect that adding a single node has on the size of the problem to be solved. An m-k-n MLP will produce k(m+n) = km + kn degrees of freedom in the network.

Hidden Layers and Hidden Nodes If we assume that training time is proportional to the number of weights in the network, then we can see a need to balance effectiveness (reasonable accuracy) with efficiency (reasonable training time). A good “rule of thumb” is to start with and increase the number of nodes in the hidden layer if the network has trouble training – experience counts for a lot here. It is only when we have tried everything else – and failed –(i.e., number of hidden nodes, activation function, data scaling etc., ) that further hidden layers are added.

Conclusion: RL v ANN types of learning: ANN - learning by example, supervised learning, RL – learning by observation, low level cognition characterisation of applications: ANN - learning an approximation to a function where lots of training data are available. Particularly good in classification where there is noisy data e.g. diagnosis or object recognition RL – learning low level reactive behaviour, such as in lower forms of animals, good for low level cognitive tasks. Also been used for learning in high level tasks (eg games) where rewards are possible and reasoning with actions (moves) too complex.

Conclusion: RL v ANN Similarities : - both classed as "sub-symbolic" in heavy use of numbers and rather opaque when functioning. - both learning approaches requiring repeated trials - both inspired by natural learning - both resistant to noise and more graceful in degradation with degraded inputs

Conclusion: RL v ANN Differences – ANNs fixed architecture of layers of neurons, with simple firing mechanism and weights randomly assigned at start, and fixed set of inputs ANNs needs supervised TRAINING ie classified data a priori, in the form of value for inputs and a correct output RL need to perform trial and error interactions with the environment RL learns a mapping from a situation to an action by trial and error: it learns to perform actions which will maximise the sum of re- inforcements, so is more of a real time “hands off” approach than ANNs, it aims to learn policies by assigning blame and learning to avoid situations.

Summary of MLPs Feed forward. Fully connected. Sigmoid activation function. Restriction on 0, 1 outputs are relaxed to 0.1, 0.9 to accommodate the asymptotic properties of the sigmoid function. Backpropagation learning is used to train the network. The number of hidden nodes (units) can be chosen using a “rule of thumb”. Outputs are continuous rather than binary.

x2 x1 1 h1 o2 o1 1 0.5 -0.3 -0.5 0.4 0.3 Inputs x2 = 0.5 x1 = 0.1 Required outputs o2 = 0.9 o1 = 0.1 Example MLP

Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10

Similar presentations

Presentation on theme: "Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10

Similar presentations

Presentation on theme: "Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10"— Presentation transcript:

Similar presentations

About project

Feedback