Chapter 3. Artificial Neural Networks - Introduction -
Overview Biological inspiration Artificial neurons and neural networks Application
Biological Neuron Animals are able to react adaptively to changes in their external and internal environment, and they use their nervous system to perform these behaviours. An appropriate model/simulation of the nervous system should be able to produce similar responses and behaviours in artificial systems.
Biological Neuron The information transmission happens at the synapses.
Artificial neurons Neuron
Artificial neurons x1 x2 x3 … xn-1 xn w1 Output w2 Inputs y w3 . . . wn-1 wn one possible model
Artificial neurons Nonlinear generalization of neuron: y is the neuron’s output, x is the vector of inputs, and w is the vector of synaptic weights. Examples: sigmoidal neuron Gaussian neuron
Other Model Hopfield Retropropagation
Artificial neural networks Output Inputs An artificial neural network is composed of many artificial neurons that are linked together according to a specific network architecture. The objective of the neural network is to transform the inputs into meaningful outputs.
Artificial neural networks Tasks to be solved by artificial neural networks: controlling the movements of a robot based on self-perception and other information (e.g., visual information); deciding the category of potential food items (e.g., edible or non-edible) in an artificial world; recognizing a visual object (e.g., a familiar face); predicting where a moving object goes, when a robot wants to catch it.
Neural network mathematics Output Inputs
Neural network mathematics Neural network: input / output transformation W is the matrix of all weight vectors.
Learning principle for artificial neural networks ENERGY MINIMIZATION We need an appropriate definition of energy for artificial neural networks, and having that we can use mathematical optimisation techniques to find how to change the weights of the synaptic connections between neurons. ENERGY = measure of task performance error
Perceptrons - First studied in the late 1950s. - Also known as Layered Feed-Forward Networks. - The only efficient learning element at that time was for single-layered networks. - Today, used as a synonym for a single-layer, feed-forward network.
Perceptrons
Perceptrons
Sigmoid Perceptron
Perceptron learning rule Teacher specifies the desired output for a given input Network calculates what it thinks the output should be Network changes its weights in proportion to the error between the desired & calculated results wi,j = * [teacheri - outputi] * inputj where: is the learning rate; teacheri - outputi is the error term; and inputj is the input activation wi,j = wi,j + wi,j Delta rule
Adjusting perceptron weights wi,j = * [teacheri - outputi] * inputj missi is (teacheri - outputi) Adjust each wi,j based on inputj and missi The above table shows adaptation. Incremental learning.
Node biases A node’s output is a weighted function of its inputs What is a bias? How can we learn the bias value? Answer: treat them like just another weight
Training biases () A node’s output: Rewrite 1 if w1x1 + w2x2 + … + wnxn >= 0 otherwise Rewrite w1x1 + w2x2 + … + wnxn - >= 0 w1x1 + w2x2 + … + wnxn + (-1) >= 0 Hence, the bias is just another weight whose activation is always -1 Just add one more input unit to the network topology bias
Perceptron convergence theorem If a set of <input, output> pairs are learnable (representable), the delta rule will find the necessary weights in a finite number of steps independent of initial weights However, a single layer perceptron can only learn linearly separable concepts it works iff gradient descent works
Linear separability Consider a perceptron Its output is 1, if W1X1 + W2X2 > 0, otherwise In terms of feature space hence, it can only classify examples if a line (hyperplane more generally) can separate the positive examples from the negative examples
What can Perceptrons Represent ? - Some complex Boolean function can be represented. For example: Majority function - will be covered in this lecture. - Perceptrons are limited in the Boolean functions they can represent.
The Separability Problem and EXOR trouble Linear Separability in Perceptrons
AND and OR linear Separators
Separation in n-1 dimensions majority Example of 3Dimensional space
Perceptrons & XOR XOR function no way to draw a line to separate the positive from negative examples
How do we compute XOR?
Perceptron application + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Multi-Layer Perceptron One or more hidden layers Sigmoid activations functions Output layer 2nd hidden layer 1st hidden layer Input data
Multi-Layer Perceptron Application Types of Decision Regions Result Structure Single-Layer Half Plane Bounded By Hyperplane A B Two-Layer Convex Open Or Closed Regions A B Abitrary (Complexity Limited by No. of Nodes) Three-Layer A B
Conclusion NN have some desadvantages such as: Preprocessing Results interpretation by high dimension Learning phase/Supervised/Non Supervised