Artificial Neural Networks
Artificial Neural Networks Other terms/names connectionist parallel distributed processing neural computation adaptive networks..
History of Neural Networks McCulloch & Pitts (1943) are generally recognised as the designers of the first neural network Many of their ideas still used today (e.g. many simple units combine to give increased computational power and the idea of a threshold) Hebb (1949) developed the first learning rule (on the premise that if two neurons were active at the same time the strength between them should be increased)
History of Neural Networks During the 50’s and 60’s many researchers worked on the perceptron amidst great excitement. 1969 saw the death of neural network research for about 15 years – Minsky & Papert 1980’s - Re-emergence of ANN - multi-layer networks Only in the mid 80’s (Parker and LeCun) was interest revived (in fact Werbos discovered algorithm in 1974)
Brain and Machine The Brain The Machine Pattern Recognition Association Complexity Noise Tolerance The Machine Calculation Precision Logic
The contrast in architecture The Von Neumann architecture uses a single processing unit; Tens of millions of operations per second Absolute arithmetic precision The brain uses many slow unreliable processors acting in parallel
Neural networks to the rescue Neural network: information processing paradigm inspired by biological nervous systems, such as our brain Structure: large number of highly interconnected processing elements (neurons) working together Like people, they learn from experience (by example)
Neural networks to the rescue Ten billion (1010) neurons On average, several thousand connections A neuron may connect to as many as 100,000 other neurons Hundreds of operations per second
Neural networks to the rescue Neural networks are configured for a specific application, such as pattern recognition or data classification, through a learning process In a biological system, learning involves adjustments to the synaptic connections between neurons same for artificial neural networks (ANNs)
Where can neural network systems help when we can't formulate an algorithmic solution. when we can get lots of examples of the behavior we require. ‘learning from experience’ when we need to pick out the structure from existing data.
Inspiration from Neurobiology A neuron: many-inputs / one-output unit Output can be excited or not excited Incoming signals from other neurons determine if the neuron shall excite ("fire") Output subject to attenuation in the synapses, which are junction parts of the neuron Inputs: dentrites Processing: soma Outputs: axons Synapses: electrochemical contact between neurons Basically, a biological neuron receives inputs from other sources, combines them in some way, performs a generally nonlinear operation on the result, and then output the final result.
The Structure of Neurons
A Simple Model of a Neuron (Perceptron) w1j w2j w3j wij y1 y2 y3 yi O Each neuron has a threshold value Each neuron has weighted inputs from other neurons The input signals form a weighted sum If the activation level exceeds the threshold, the neuron “fires”
A Simple Model of a Neuron (Perceptron) w1j w2j w3j wij y1 y2 y3 yi f(x) O Each hidden or output neuron has weighted input connections from each of the units in the preceding layer. The unit performs a weighted sum of its inputs, and subtracts its threshold value, to give its activation level. Activation level is passed through a sigmoid activation function to determine output.
Modelling a Neuron aj :Activation value of unit j wj,I :Weight on the link from unit j to unit i inI :Weighted sum of inputs to unit i aI :Activation value of unit i g :Activation function
Activation functions Transforms neuron’s input into output. Stept(x) = 1 if x >= t, else 0 Sign(x) = +1 if x >= 0, else –1 Sigmoid(x) = 1/(1+e-x) Identity Function
Activation Functions Identity Function
The First Artificial Neural Networks Neurons in a McCulloch-Pitts network are connected by directed, weighted paths If the weight on a path is positive the path is excitatory, otherwise it is inhibitory Each neuron has a fixed threshold. If the net input into the neuron is greater than or equal to the threshold, the neuron fires The threshold is set such that any non-zero inhibitory input will prevent the neuron from firing It takes one time step for a signal to pass over one connection
The First Artificial Neural Networks The activation of a neuron is binary. That is, the neuron either fires (activation of one) or does not fire (activation of zero). X1 2 X2 2 Y -1 X3 For the network shown here the activation function for unit Y is f(y_in) = 1, if y_in > θ else 0 where y_in is the total input signal received θ is the threshold for Y
The First Artificial Neural Networks AND 1 X1 X2 Y X1 X2 Y 1 1 1 1 1 AND Function Threshold(Y) = 1 Y = X1 and X2
The First Artificial Neural Networks X1 2 X1 X2 Y Y 1 1 1 1 1 X2 2 1 1 OR Function Threshold(Y) = 1 Y = X1 or X2
The First Artificial Neural Networks AND NOT X1 2 X1 X2 Y Y 1 1 X2 -1 1 1 AND NOT Function 1 Threshold(Y) = 1 Y = X1 and not(X2)
The First Artificial Neural Networks 2 2 XOR X1 Z1 -1 X1 X2 Y Y 1 1 -1 Z2 X2 1 1 2 2 1 1 XOR Function X1 xor X2 = (X1 and not X2) or (x2 and not X1)
{ Simple network 1 if wixi >t output= i=0 0 otherwise Y X W1 = 1.5 W3 = 1 -1 AND with a Biased input W2 = 1
Training Algorithms Adjust neural network weights to map inputs to outputs. Use a set of sample patterns where the desired output (given the inputs presented) is known. The purpose is to learn to generalize Recognize features which are common to good and bad exemplars
Training a perceptron Aim
Training Perceptrons What are the weight values? For AND A B Output 0 0 0 0 1 0 1 0 0 1 1 1 t = 0.0 y x -1 W1 = ? W3 = ? W2 = ? What are the weight values? Initialize with random weight values
Training Perceptrons Need learning For AND A B Output -1 0 0 0 y x -1 W1 = 0.3 W3 =-0.4 W2 = 0.5 For AND A B Output 0 0 0 0 1 0 1 0 0 1 1 1 Need learning
Learning From experience: examples / training data Strength of connection between the neurons is stored as a weight-value for the specific connection Learning the solution to a problem = changing the connection weights The brain basically learns from experience. Neural networks are sometimes called machine learning algorithms, because changing of its connection weights (training) causes the network to learn the solution to a problem. The strength of connection between the neurons is stored as a weight-value for the specific connection. The system learns new knowledge by adjusting these connection weights. The learning ability of a neural network is determined by its architecture and by the algorithmic method chosen for training.
Learning From experience: examples / training data Strength of connection between the neurons is stored as a weight-value for the specific connection Learning the solution to a problem = changing the connection weights The brain basically learns from experience. Neural networks are sometimes called machine learning algorithms, because changing of its connection weights (training) causes the network to learn the solution to a problem. The strength of connection between the neurons is stored as a weight-value for the specific connection. The system learns new knowledge by adjusting these connection weights. The learning ability of a neural network is determined by its architecture and by the algorithmic method chosen for training.
Learning in Neural Networks Learn values of weights from I/O pairs Start with random weights Load training example’s input Observe computed input Modify weights to reduce difference Iterate over all training examples Terminate when weights stop changing OR when error is very small
Learning algorithm Present network with next inputs from epoch While epoch produces an error Present network with next inputs from epoch Error = T – O If Error <> 0 then Wj = Wj + LR * Ij * Error End If End While
Learning algorithm Epoch : Presentation of the entire training set to the neural network. In the case of the AND function an epoch consists of four sets of inputs being presented to the network (i.e. [0,0], [0,1], [1,0], [1,1]) Error: The error value is the amount by which the value output by the network differs from the target value. For example, if we required the network to output 0 and it output a 1, then Error = -1
Learning algorithm Target(Training) Value, T : When we are training a network we not only present it with the input but also with a value that we require the network to produce. For example, if we present the network with [1,1] for the AND function the target value will be 1 Output , O : The output value from the neuron Ij : Inputs being presented to the neuron Wj : Weight from input neuron (Ij) to the output neuron LR : The learning rate. This dictates how quickly the network converges. It is set by a matter of experimentation. It is typically 0.1
Supervised Learning (Back propagation) Desired output of the training examples Error = difference between actual & desired output Change weight relative to error size Calculate output layer error , then propagate back to previous layer Improved performance, very common! Back propagation This method is proven highly successful in training of multilayered neural nets. The network is not just given reinforcement for how it is doing on a task. Information about errors is also filtered back through the system and is used to adjust the connections between the layers, thus improving performance. A form of supervised learning.
Supervised Learning Training and test data sets Training set; input & target
Decision boundaries In simple cases, divide feature space by drawing a hyperplane across it. Known as a decision boundary. Discriminant function: returns different values on opposite sides. (straight line) Problems which can be thus classified are linearly separable.
Decision Surface of a Perceptron x2 + - x1 x2 + - x1 + - Linearly separable Non-Linearly separable Perceptron is able to represent some useful functions AND(x1,x2) choose weights w0=-1.5, w1=1, w2=1 But functions that are not linearly separable (e.g. XOR) are not representable
Linear Separability Decision Boundary X1 A A A A A A A X2 B B B B B B
Rugby players & Ballet dancers 2 Height (m) Ballet? 1 50 120 Weight (Kg)
Different Non-Linearly Separable Problems Types of Decision Regions Exclusive-OR Problem Classes with Meshed regions Most General Region Shapes Structure Single-Layer Half Plane Bounded By Hyperplane A B B A Two-Layer Convex Open Or Closed Regions A B B A Arbitrary (Complexity Limited by No. of Nodes) Three-Layer A B B A
Multilayer Perceptron (MLP) Output Values Input Signals (External Stimuli) Output Layer Adjustable Weights Input Layer
Multilayer Perceptron (MLP) Adaptive interaction between individual neurons Power: collective behavior of interconnected neurons The hidden layer learns to recode (or to provide a representation of) the inputs: associative mapping Biologically, neural networks are constructed in a three dimensional way from microscopic components. These neurons seem capable of nearly unrestricted interconnections. This is not true in any man-made network. Artificial neural networks are the simple clustering of the primitive artificial neurons. This clustering occurs by creating layers, which are then connected to one another. How these layers connect may also vary. Basically, all artificial neural networks have a similar structure of topology. Some of the neurons interface the real world to receive its inputs and other neurons provide the real world with the network’s outputs. All the rest of the neurons are hidden form view. As the figure above shows, the neurons are grouped into layers The input layer consist of neurons that receive input form the external environment. The output layer consists of neurons that communicate the output of the system to the user or external environment. There are usually a number of hidden layers between these two layers; the figure above shows a simple structure with only one hidden layer. When the input layer receives the input its neurons produce output, which becomes input to the other layers of the system. The process continues until a certain condition is satisfied or until the output layer is invoked and fires their output to the external environment. Inter-layer connections There are different types of connections used between layers, these connections between layers are called inter-layer connections. Fully connected Each neuron on the first layer is connected to every neuron on the second layer. Partially connected A neuron of the first layer does not have to be connected to all neurons on the second layer. Feed forward The neurons on the first layer send their output to the neurons on the second layer, but they do not receive any input back form the neurons on the second layer. Bi-directional There is another set of connections carrying the output of the neurons of the second layer into the neurons of the first layer. Feed forward and bi-directional connections could be fully- or partially connected. Hierarchical If a neural network has a hierarchical structure, the neurons of a lower layer may only communicate with neurons on the next level of layer. Resonance The layers have bi-directional connections, and they can continue sending messages across the connections a number of times until a certain condition is achieved.
Types of Layers The input layer. The hidden layer(s). Introduces input values into the network. No activation function or other processing. The hidden layer(s). Perform classification of features Two hidden layers are sufficient to solve any problem Features imply more layers may be better The output layer. Functionally just like the hidden layers Outputs are passed on to the world outside the neural network.
Where are ANN used? Recognizing and matching complicated, vague, or incomplete patterns Data is unreliable Problems with noisy data Prediction Classification Data association Data conceptualization Filtering Planning Where are Neural Networks being used? Neural networks are performing successfully where other methods do not, recognizing and matching complicated, vague, or incomplete patterns. Neural networks have been applied in solving a wide variety of problems. Basically, most applications of neural networks fall into the following five categories: Prediction Uses input values to predict some output. e.g. pick the best stocks in the market, predict weather, identify people with cancer risk. Classification Use input values to determine the classification. e.g. is the input the letter A, is the blob of the video data a plane and what kind of plane is it. Data association Like classification but it also recognizes data that contains errors. e.g. not only identify the characters that were scanned but identify when the scanner is not working properly. Data Conceptualization Analyze the inputs so that grouping relationships can be inferred. e.g. extract from a database the names of those most likely to by a particular product. Data Filtering Smooth an input signal. e.g. take the noise out of a telephone signal.
Strengths of a Neural Network Power: Model complex functions, nonlinearity built into the network Ease of use: Learn by example Very little user domain-specific expertise needed Intuitively appealing: based on model of biology, will it lead to genuinely intelligent computers/robots? Neural networks cannot do anything that cannot be done using traditional computing techniques, BUT they can do some things which would otherwise be very difficult. Power. Neural networks are very sophisticated modeling techniques capable of modeling extremely complex functions. In particular, neural networks are nonlinear (a term which is discussed in more detail later in this section). For many years linear modeling has been the commonly used technique in most modeling domains since linear models have well-known optimization strategies. Where the linear approximation was not valid (which was frequently the case) the models suffered accordingly. Neural networks also keep in check the curse of dimensionality problem that bedevils attempts to model nonlinear functions with large numbers of variables. Ease of use. Neural networks learn by example. The neural network user gathers representative data, and then invokes training algorithms to automatically learn the structure of the data. Although the user does need to have some heuristic knowledge of how to select and prepare data, how to select an appropriate neural network, and how to interpret the results, the level of user knowledge needed to successfully apply neural networks is much lower than would be the case using (for example) some more traditional nonlinear statistical methods. Neural networks are also intuitively appealing, based as they are on a crude low-level model of biological neural systems. In the future, the development of this neurobiological modeling may lead to genuinely intelligent computers.
General Advantages Advantages Adapt to unknown situations Robustness: fault tolerance due to network redundancy Autonomous learning and generalization Disadvantages Not exact Large complexity of the network structure
Applications Prediction: learning from past experience pick the best stocks in the market predict weather identify people with cancer risk Classification Image processing Predict bankruptcy for credit card companies Risk assessment Prediction The most common use for neural networks is to project what will most likely happen. There are many areas where prediction can help in setting priorities. For example, the emergency room at a hospital can be a hectic place, to know who needs the most critical help can enable a more successful operation. Basically, all organizations must establish priorities, which govern the allocation of their resources. Neural networks have been used as a mechanism of knowledge acquisition for expert system in stock market forecasting with astonishingly accurate results. Neural networks have also been used for bankruptcy prediction for credit card institutions. Classification Although one may apply neural network systems for interpretation, prediction, diagnosis, planing, monitoring, debugging, repair, instruction, and control, the most successful applications of neural networks are in categorization and pattern recognition. Such a system classifies the object under investigation (e.g. an illness, a pattern, a picture, a chemical compound, a word, the financial profile of a customer) as one of numerous possible categories that, in return, may trigger the recommendation of an action (such as a treatment plan or a financial plan. A company called Nestor, have used neural network for financial risk assessment for mortgage insurance decisions, categorizing the risk of loans as good or bad. Neural networks has also been applied to convert text to speech, NETtalk is one of the systems developed for this purpose. Image processing and pattern recognition form an important area of neural networks, probably one of the most actively research areas of neural networks.
Applications Recognition Pattern recognition: SNOOPE (bomb detector in U.S. airports) Character recognition Handwriting: processing checks Data association Not only identify the characters that were scanned but identify when the scanner is not working properly An other of research for application of neural networks is character recognition and handwriting recognition. This area has use in banking, credit card processing and other financial services, where reading and correctly recognizing handwriting on documents is of crucial significance. The pattern recognition capability of neural networks has been used to read handwriting in processing checks, the amount must normally be entered into the system by a human. A system that could automate this task would expedite check processing and reduce errors. One such system has been developed by HNC (Hecht-Nielsen Co.) for BankTec. One of the best known applications is the bomb detector installed in some U.S. airports. This device called SNOOPE, determine the presence of certain compounds from the chemical configurations of their components. In a document from International Joint conference, one can find reports on using neural networks in areas ranging from robotics, speech, signal processing, vision, character recognition to musical composition, detection of heart malfunction and epilepsy, fish detection and classification, optimization, and scheduling. One may take under consideration that most of the reported applications are still in research stage.
Applications Data Conceptualization infer grouping relationships e.g. extract from a database the names of those most likely to buy a particular product. Data Filtering e.g. take the noise out of a telephone signal, signal smoothing Planning Unknown environments Sensor data is noisy Fairly new approach to planning Data Conceptualization Analyze the inputs so that grouping relationships can be inferred. e.g. extract from a database the names of those most likely to by a particular product. Data Filtering Smooth an input signal. e.g. take the noise out of a telephone signal.
Neural network for OCR feedforward network trained using Back- propagation
OCR for 8x10 characters NN are able to generalise learning involves generating a partitioning of the input space for single layer network input space must be linearly separable what is the dimension of this input space? how many points in the input space? this network is binary(uses binary values) networks may also be continuous
Engine management The behaviour of a car engine is influenced by a large number of parameters temperature at various points fuel/air mixture lubricant viscosity. Major companies have used neural networks to dynamically tune an engine depending on current settings.
ALVINN 30 outputs for steering 30x32 weights 4 hidden into one out of Drives 70 mph on a public highway 30 outputs for steering 30x32 weights into one out of four hidden unit 4 hidden units 30x32 pixels as inputs
Signature recognition Each person's signature is different. There are structural similarities which are difficult to quantify. One company has manufactured a machine which recognizes signatures to within a high level of accuracy. Considers speed in addition to gross shape. Makes forgery even more difficult.
Sonar target recognition Distinguish mines from rocks on sea-bed The neural network is provided with a large number of parameters which are extracted from the sonar signal. The training set consists of sets of signals from rocks and mines.
Recognition
Stock market prediction “Technical trading” refers to trading based solely on known statistical parameters; e.g. previous price Neural networks have been used to attempt to predict changes in prices. Difficult to assess success since companies using these techniques are reluctant to disclose information.
Mortgage assessment Assess risk of lending to an individual. Difficult to decide on marginal cases. Neural networks have been trained to make decisions, based upon the opinions of expert underwriters. Neural network produced a 12% reduction in delinquencies compared with human experts.
Neural Network Problems Many Parameters to be set Overfitting long training times ...
Parameter setting Number of layers Number of neurons Learning rate too many neurons, require more training time Learning rate from experience, value should be small ~0.1 Momentum term ..
Over-fitting With sufficient nodes can classify any training set exactly May have poor generalisation ability. Cross-validation with some patterns Typically 30% of training patterns Validation set error is checked each epoch Stop training if validation error goes up
Training time How many epochs of training? Stop if the error fails to improve (has reached a minimum) Stop if the rate of improvement drops below a certain level Stop if the error reaches an acceptable level Stop when a certain number of epochs have passed