Comp3710 Artificial Intelligence Thompson Rivers University

Comp3710 Artificial Intelligence Thompson Rivers University
Neural Network Comp3710 Artificial Intelligence Computing Science Thompson Rivers University

Intro to Machine Learning
Course Outline Part I – Introduction to Artificial Intelligence Part II – Classical Artificial Intelligence Part III – Machine Learning Introduction to Machine Learning Neural Networks Probabilistic Reasoning and Bayesian Belief Networks Artificial Life: Learning through Emergent Behavior Part IV – Advanced Topics TRU-COMP3710 Intro to Machine Learning

A new sort of computer What are (everyday) computer systems good at... and not so good at? How about humans? Good at Not so good at Rule-based systems: doing what the programmer wants them to do Dealing with noisy data Dealing with unknown environment data Massive parallelism Fault tolerance Adapting to circumstances

Reference Artificial Intelligence Illuminated, Ben Coppin, Jones and Bartlett Illuminated Series Many tutorials from the Internet Neural Networks with Java Joone – Java Object Oriented Neural Engine … Neural Network

Learning Outcomes Train a perceptron with a training data set.
Update the weights in a feed-forward network with error backpropagation. Update the weights in a feed-forward network with error backpropagation and delta rule. Implement a feed-forward network for a simple pattern recognition application. List the two examples of recurrent network. List applications of Hopfield netorks, Bidirectional Associative Memories and Kohonen Maps. Update the weights in a Hebbian network. Neural Network

Chapter Contents Biological Neurons
How to model biological neurons – artificial neurons The first neural network – Perceptrons How to overcome the problems in Perceptron networks – Multilayer Neural Networks Feed-forward, Backpropagation, Backpropagation with Delta Rule Can an ANN remember? – Recurrent Networks Hopfield Networks Bidirectional Associative Memories How to learn without using training data set Kohonen Maps Hebbian Learning Fuzzy Neural Networks Evolving Neural Networks Neural Network

1. Biological Neurons Topics
The human brain is made up of about 100 billions of simple processing units – neurons. -> massive parallelism; emergent behavior; ... Inputs are received on dendrites, and if the input levels are over a threshold, the neuron fires, passing a signal through the axon to the synapse which then connects to another neuron. The human brain has a property known as plasticity, which means that neurons can change the nature and number of their connections to other neurons in response to events that occur. In this way, the brain is able to learn. [Q] How to model? Neural Network

2. Artificial Neurons Artificial neurons are based on biological neurons. McCulloch and Pitts (1943) Each neuron in the network receives one or more inputs. They will be summed up. An activation function is applied to the inputs, which determines the output of the neuron – the activation level. Weights associated to synapses Neural Network

Three typical activation functions.
Neural Network

E.g., a typical activation function, called step function, works as follows:
Each previous node i has a weight, wi, associated with it. The input from previous node i is xi. t is the threshold. So if the weighted sum of the inputs to the neuron is above the threshold, then the neuron fires. Y x1:w1 xn:wn Neural Network

There is no central processing or control mechanism.
Topics There is no central processing or control mechanism. The entire network is involved in every piece of computation that takes place. The processing time in each artificial neuron is small, not like real biological neuron. Parallelism of massive number of neurons The weight associated with each connection (equivalent to a synapse in the biological brain) can be changed in response to particular sets of inputs and events. In this way, an artificial neural network is able to learn. [Q] How and when to change the weights? Neural Network

3. Perceptrons A perceptron is a single neuron that classifies a set of inputs into one of two categories (usually 1 or -1). Rosenblatt (1958) Neural Network

A perceptron is a single neuron that classifies a set of inputs into one of two categories (usually 1 or -1). Rosenblatt (1958) If the inputs are in the form of a grid, a perceptron can be used to recognize visual images of shapes. [Q] How? An image can be converted to 1-D array, and the values are fed into the input nodes. The perceptron usually uses a step function, which returns 1 if the weighted sum of inputs exceeds a threshold, and 0 otherwise.

[Q] What will be trained?
The perceptron is trained as follows: First, random weights (usually between –0.5 and 0.5) are given. An item of training data is presented. Output classification is observed. If the perceptron mis-classifies it, the weights are modified according to the following: a is the learning rate, between 0 and 1. e is the size of the error; e = expected_value – output The train continues until all errors become zero, or after some training epochs. Then weights are not changed anymore. Neural Network

Example of logical OR for two inputs, with t = 0 and a = 0.2:
Random initial weights: The training data: Expected output is 0 (= 0 OR 0). Output is correct. The weights are not changed. w1 = w * 0 * 0; w2 = w * 0 * 0 Then the system is used for queries. Neural Network

For the next training data
Expected value Neural Network

Training data set Epoch x1 x2 Expected Y Actual Y Error w1 w2 1 -0.2
-0.2 0.4 2 0.2 3 Neural Network

[Q] Can you test the previous system with some noisy inputs?
(0.1, 1) -> .2 * * 1 = .42 -> 1 (0.1, 0.3) -> … (0, -0.2) -> … (0.8, 0.6) -> … [Q] Can you develop a system for the Boolean AND operation? [Q] What is learning in Perceptrons ? Neural Network

The second is not linearly separable (Exclusive-OR).
Topics Perceptrons can only classify linearly separable functions. (AND and OR) The first of the following graphs shows a linearly separable function (OR). The second is not linearly separable (Exclusive-OR). Demo – Perceptron Learning Applet [Q] What problems do Perceptron systems have? How to improve? Neural Network

4. Multilayer Neural Networks
Multilayer neural networks can classify a range of functions, including non linearly separable ones. Each input layer neuron connects to all neurons in the hidden layer. The neurons in the hidden layer connect to all neurons in the output layer. Training data Output_I Output_H Output_O Y ( Input t × Weight ) A feed-forward network Not cellular automata Neurons work synchronously [Q] How to train? Weight_IH Weight_HO Neural Network

Demo – XOR, AND, OR XOR can be written: Split:
y = (x1  x2)  (x1  x2) Therefore, y = (x1  x2)  (x1  x2) Split: y1 = x1  x2 y2 = (x1  x2) y = y1  y2 Neural Network

Example – Digits Recognition
Example of digits recognition: Images of 0, 1, 2, …, 9. An image is of size n × m squares. We can say each square is used for a digit or not, using 0 and 1. The number of neurons in the input layer -> n × m The number of neurons in the output layer -> 4 (23 = 8 < 10 < 24 = 16) Prepare many images of digits with correct information of 0, 1, …, 9, and non-digit images. Train the system with the training data set. All the weights are updated all together at once. Neural Network

Matrix multiplication
Example of digits recognition: How to implement? N: the number of input nodes M: the number of hidden nodes K: the number of output nodes Input: N × 1 matrix WeightIH : N × M matrix YH : M × 1 matrix – activation functions for hidden nodes WeightHO : M × K matrix YO : K × 1 matrix – activation functions for output nodes Output : K × 1 matrix [Q] How to obtain Output? Output = (YO( (YH( Input t × WeightIH )) t × WeightHO )) t Matrix multiplication Neural Network

Example: Voice Recognition
Task: Learn to discriminate between two different voices saying “Hello” [Q] How to implement? Data Sources Steve Simpson David Raubenheimer Format Frequency distribution (60 bins) 60 values -> 60 input nodes The number of output nodes?

Network architecture Feed forward network
60 input nodes (one for each frequency bin) 6 hidden nodes 2 output nodes (0-1 for “Steve”, 1-0 for “David”) [Q] What is the total number of nodes? What is the total number of links?

Presenting the data (untrained network)
Steve 0.43 0.26 David 0.73 0.55

Calculate error |0.43 – 0| = 0.43 |0.26 –1| = 0.74 |0.73 – 1| = 0.27
Steve |0.43 – 0| = 0.43 |0.26 –1| = 0.74 David |0.73 – 1| = 0.27 |0.55 – 0| = 0.55

Backpropagation error and adjust weights
Steve 0.43 – 0 = 0.43 0.26 – 1 = 0.74 1.17 (overall error) David 0.73 – 1 = 0.27 0.55 – 0 = 0.55 0.82

Repeat process (sweep) for all training pairs
Present data Calculate error Backpropagation error Adjust weights Repeat process multiple times

Presenting the data (trained network)
Steve 0.01 0.99 David 0.99 0.01

Results – Voice Recognition
Performance of trained network Discrimination accuracy between known “Hello”s 100% Discrimination accuracy between new “Hello”’s

Results – Voice Recognition (ctnd.)
Network has learnt to generalise from original data. Networks with different weight settings can have same functionality. Trained networks ‘concentrate’ on lower frequencies. Network is robust against non-functioning nodes.

Weights are updated layer by layer from the output layer.
Backpropagation Y ( Input t × Weight ) Multilayer neural networks learn in the same way as Perceptrons. But it takes a too long time to train. [Q] Why? There are many more weights, and it is important to assign credit (or blame) correctly when changing weights. (Note  m. output nodes.) Backpropagation networks use the sigmoid activation function, not a step function, as it is easy to differentiate. (Note that derivative shows the change of quantity/data, i.e., the rate of change.) Weights are updated layer by layer from the output layer. Neural Network

For node j, Xj is the input yj is the output
n is the number of inputs to node j j is the threshold for j After values are fed forward through the network, errors are fed back to modify the weights in order to train the network. For each node, we calculate an error gradient. j wjk wij k yj xi Neural Network

Similarly, for each node j in the hidden layer:
For each node k in the output layer, the error ek is the difference between the desired output and the actual output. The error gradient for the node k is: Similarly, for each node j in the hidden layer: Now the weights are updated as follows: …  is the learning rate, (a positive number below 1) Known as gradient descent k xi wjk wij yj j yj Hidden layer Output layer Neural Network

Actually wjk and wjl should be calculated after j so that j can use old weights.
wij k yj xi l wjl yk yl Hidden layer Output layer Neural Network

Example of digits recognition:
How to change Weights? N: the number of input nodes M: the number of hidden nodes K: the number of output nodes Input: N × 1 matrix WeightIH : N × M matrix YH : M × 1 matrix – activation functions for hidden nodes WeightHO : M × K matrix YO : K × 1 matrix – activation functions for output nodes Output : K × 1 matrix Expected : K × 1 matrix Neural Network

Example of digits recognition:
How to change Weights? Error = Expected – Output DeltaO = OutputO  (1 – OutputO)  Error DeltaH = OutputH  (1 – OutputH)  (WeightHO × DeltaO) (Note. Delta for input nodes is not necessary.) WeightHO[j,k] = WeightHO[j,k] +   OutputH[j]  DeltaO[k] WeightIH[j,k] = WeightIH[j,k] +   OutputI[j]  DeltaH[k] Neural Network

Algorithm for training:
Initialize weights; // normalized weights For each epoch, For each training image, Image, Input = Image; Output_I = Input; Decide Output_H; Decide Output_O; Output = Output_O; Decide Error; Decide Dalta_O; Decide Delta_H; Change Weight_HO with Delta_O; Change Weight_IH with Delta_H;

Algorithm for training:
For each training image, Image Output_I = Input = Image; For each node, j, in the hidden layer X[j] = sum of (Output_I[i] * Weight_IH[i][j]); Output_H[j] = activate(X[j]); For each node, j, in the output layer X[j] = sum of (Output_H[i] * Weight_HO[i][j]); Output_O[j] = activate(X[j]); Output = Output_O; Error[j] = Expected[j] – Output[j]; Delta_O[j] = Output_O[j] * (1 – Output_O[j]) * Error[j]; D[j] = sum of (Weight_HO[j][k] * Delta_O[k]); Delta_H[j] = Output_H[j] * (1 – Output_H[j]) * D[j]; For each node, k, in the output layer Weight_HO[j][k] += Alpha * Output_H[j] * Delta[k]; For each node, j, in the input layer For each node, k, in the hidden layer Weight_IH[j][k] += Alpha * Output_I[j] * Delta[k];

Algorithm for query: Input = Query_image; Output_IL = Input;
Decide Output_HL; Decide Output_OL; Output = Output_OL; Classification with Output;

Not likely appear to occur in the human brain
Let’s try this example , that recognizes digits. Not likely appear to occur in the human brain Still slow learning. With some simple problems it can take hundreds or even thousands of epochs to reach a satisfactorily low level of error. [Q] Why? Weights are changed too easily. Can we use a sort of the idea of Simulated Annealing? [Q] How to improve? Neural Network

Backpropagation with Delta Rule
Generalized delta rule: Inclusion of momentum, the extent to which a weight was changed on the previous iteration Sigmod or Hyperbolic tangent instead of sigmoid activation function … Neural Network

Backpropagation [with Delta Rule]
How to implement? Advantages: It works! Relatively fast Downsides: Requires a training set. (Is it really a disadvantage?) Training can be slow Probably not biologically realistic Alternatives to Backpropagation: Hebbian learning Not successful in feed-forward nets Reinforcement learning Only limited success Artificial evolution More general, but can be even slower than backpropagation. Neural Network

Example: Digits Recognition
Let try this example that recognizes digits. [Q] The number of nodes in the input layer? [Q] Output layer? (0, 1) -> ‘1’ (1, 0) -> ‘2’

Applications of Feed-forward nets
Pattern recognition Character recognition Face Recognition Sonar mine/rock recognition (Gorman & Sejnowksi, 1988) Navigation of a car (Pomerleau, 1989) Stock-market prediction Pronunciation (NETtalk) (Sejnowksi & Rosenberg, 1987)

Topics Project idea Backward propagation with delta rule for alphabets and exceptional cases Neural Network

5. Recurrent Networks Feed-forward networks do not have memory. [Q] What does it mean? Acyclic Once a feed-forward network is trained, its state is fixed and does not alter as new input data is presented to it. Recurrent networks, or also called feedback networks, can have arbitrary connections between nodes in any layer, even backward from output nodes to input nodes. The internal state can alter as sets of input data are presented to it –> a memory. [Q] What does it mean? Or called memory units Neural Network

The stable values of the network are called as fundamental memories.
Biological nervous systems show high levels of recurrency (but feed-forward structures exist too). Recurrent networks can be used to solve problems where the solution depends on previous inputs as well as current inputs (e.g. predicting stock market movements). Inputs are fed through the network, including feeding data back from outputs to inputs, and repeat this process until the values of the outputs do not change – a state of equilibrium or stability. The stable values of the network are called as fundamental memories. But, it is not always the case that a recurrent network reaches a stable state. Recurrent networks are also called attractor networks. Neural Network

Examples of recurrent networks
Hopfield networks BAM (Bidirectional Associative Memory) Neural Network

Hopfield Networks A Hopfield Network is a recurrent network.
Use a sign activation function: If a neuron receives a 0 as an input it does not change state – in other words, it continues to output the same previous value. Weights are usually represented as matrices. They look like… Neural Network

[Q] How many input values?
Three states to learn: [Q] How many input values? Then, the learning step is The three states will be the stable states for the network. ? Neural Network

Output vector Neural Network

Then, what is the result, Y4 ? [Q] Let’s try another query. Then
Output vector [Q] Let’s try a query. Then, what is the result, Y4 ? [Q] Let’s try another query. Then Applied again, then [Q] What does this mean?   1 Neural Network

Three steps Training weights with the attractor states as inputs– a storage or memorization stage Testing with the training data set Using the network in acting as a memory to retrieve data from its memory The network is trained to represent a set of attractors, or stable states. Any input usually will be mapped to an output state that is the attractor closest to the input. The measure of distance is the Hamming distance that measures the number of elements of the vectors that differ. Hence, the Hopfield network is a memory that usually maps an input vector to the memorized vector whose Hamming distance from the input vector is least. In fact, not always converge to the state closest to the original input. Neural Network

[Q] Applications of Hopfield networks?
Pattern recognition [Q] How? Demo Pattern recognition: [Q] How to implement the above demo? [Q] How is the pattern recognition using Hopfield networks different from the one that uses the backpropagation network? Neural Network

Project idea Hopfield network to recognize alphabets Neural Network

E.g., Fall makes me think of autumn color.
A Hopfield network is autoassociative – it can only associate an item with itself or a similar one. However, the human brain is fully associative, or heteroassociative, which means one item is able to cause the brain to recall an entirely different item. E.g., Fall makes me think of autumn color. [Q] How to improve Hopfield networks? Neural Network

Bidirectional Associative Memories
A BAM (Bidirectional Associative Memory) is a heteroassociative memory: Like the brain, it can learn to associate one item from one set with another completely unrelated item in another set. Similar in structure to the Hopfield network The network consists of two fully connected layers of nodes. Every node in one layer is connected to every node in the other layer, not to any node in the same layer. In a Hopfield network, there is only one layer and all nodes are interconnected. The BAM is guaranteed to produce a stable output for any given inputs and for any training data. Neural Network

The BAM uses a sign activation function.
Two sets of data are to be learned, so that when an item from set X is presented to the network, it will recall a corresponding item from set Y. The weights matrix is Neural Network

[Q] Can you prove it? Neural Network

[Q] Applications of BAM ?
Pattern recognition Demo Pattern recognition: Neural Network

Topics Project idea BAM to recognize alphabets Neural Network

6. Kohonen Maps Also called SOM (Self-Organizing Feature Map)
An unsupervised learning system. There is no training. Finding the natural structure inherent in the input data The objective of a Kohonen network is to map input vectors (patterns) of arbitrary dimension N onto a discrete map with 1, 2 or 3 dimensions. Patterns close to one another in the input space should be close to one another in the map: they should be topologically ordered. Neural Network

Two layers of nodes: an input layer and a cluster (output) layer.
Uses competitive learning, using winner-take-all: Two layers Input and cluster N – The number of input nodes is equal to the size of input vectors. M – The number of cluster nodes is equal to the number of data in which we would like to represent clusters. Input node is connected to every node in the cluster node. Every input is compared with the weight vectors of each node in the cluster node. In the cluster layer, the node that most closely matches the input fires. The node is called the winner. This is the clustering of the input. Euclidean distance is used. The winning node has its weight vector modified to be closer to the input vector. Neural Network

Learning process: initialize the weights for each cluster unit
loop until weight changes are negligible for each input pattern present the input pattern find the winning cluster unit (i.e., the most similar one to the input pattern) find all units in the neighborhood of the winner update the weight vectors for all those units reduce the size of neighborhoods if required Neural Network

The cluster node for which dj is the smallest is the winner
x is chosen at random from the training data set. Update of the weights of the winner => The weights become similar to the input, i.e., the winner becomes similar to the input. The learning rate decreases over time. In fact, a neighborhood of neurons around the winner are usually updated together. The radius defining the neighbor decreases over time. The training phase terminates when the modification of weights becomes very small. Neural Network

High dimensional data to 2D -> (Note 2D values can be
shown as a coordinate.) Neural Network

It has been shown that while self-organizing maps with a small number of nodes behave in a way that is similar to K-means, larger self-organizing maps rearrange data in a way that is fundamentally topological in character. High dimensional data to 2D -> (Note 2D values can be shown as a coordinate.) A good reference Applications? Clustering Dimension reduction; visualization of high dimensional data Neural Network

Project idea SOM Topics 3D numerical data into 2D, with visualization
Each data has (x, y, z), where x, y, z are in [0, 255]. Neural Network

7. Hebbian Learning Hebb’s law: “When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased”. Hence, if two neurons that are connected together fire at the same time, the weights of the connection between them is strengthened. Conversely, if the neurons fire at different times, the weight of the connection between them is decreased. i j wij xi yj Neural Network

The activity product rule is used to modify the weights of a connection between two nodes i and j that fire at the same time: where  is the learning rate; xi is the input to node j from node i and yj is the output of node j. Hebbian networks usually also use a forgetting factor, which decreases the weight of the connection between the two nodes if they fire at different times. i j wij xi yj Neural Network

More interesting demo BrainyAliens.exe params.ini Topics
Neural Network

8. Fuzzy Neural Networks Weights are fuzzy sets Topics
Takagi-Sugeno-Kang fuzzy rules Neural Network

9. Evolving Neural Networks
Neural networks also can be susceptible to local maxima. Evolutionary methods (genetic algorithms) can be used to determine the starting weights for a neural network, thus avoiding these kinds of problems. Neural Network

Other Learning Methods
Topics Other Learning Methods Clustering The K-Means The Fuzzy C-Means Classification Naïve Bayes classifier Support vector machine Decision tree Reinforcement learning … Neural Network

Comp3710 Artificial Intelligence Thompson Rivers University

Similar presentations

Presentation on theme: "Comp3710 Artificial Intelligence Thompson Rivers University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Comp3710 Artificial Intelligence Thompson Rivers University

Similar presentations

Presentation on theme: "Comp3710 Artificial Intelligence Thompson Rivers University"— Presentation transcript:

Similar presentations

About project

Feedback