Neural Networks Lecture 17: Self-Organizing Maps

Slides:



Advertisements
Similar presentations
Beyond Linear Separability
Advertisements

NEURAL NETWORKS Backpropagation Algorithm
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
November 18, 2010Neural Networks Lecture 18: Applications of SOMs 1 Assignment #3 Question 2 Regarding your cascade correlation projects, here are a few.
Unsupervised learning. Summary from last week We explained what local minima are, and described ways of escaping them. We investigated how the backpropagation.
Self Organization: Competitive Learning
5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Artificial neural networks:
Adaptive Resonance Theory (ART) networks perform completely unsupervised learning. Their competitive learning algorithm is similar to the first (unsupervised)
Unsupervised Networks Closely related to clustering Do not require target outputs for each input vector in the training data Inputs are connected to a.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
X0 xn w0 wn o Threshold units SOM.
Machine Learning Neural Networks
The back-propagation training algorithm
Radial Basis Functions
Slides are based on Negnevitsky, Pearson Education, Lecture 8 Artificial neural networks: Unsupervised learning n Introduction n Hebbian learning.
November 19, 2009Introduction to Cognitive Science Lecture 20: Artificial Neural Networks I 1 Artificial Neural Network (ANN) Paradigms Overview: The Backpropagation.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 1 Sigmoidal Neurons In backpropagation networks, we typically choose  = 1 and 
November 9, 2010Neural Networks Lecture 16: Counterpropagation 1 Unsupervised Learning So far, we have only looked at supervised learning, in which an.
September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.
September 16, 2010Neural Networks Lecture 4: Models of Neurons and Neural Networks 1 Capabilities of Threshold Neurons By choosing appropriate weights.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
November 30, 2010Neural Networks Lecture 20: Interpolative Associative Memory 1 Associative Networks Associative networks are able to store a set of patterns.
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
November 24, 2009Introduction to Cognitive Science Lecture 21: Self-Organizing Maps 1 Self-Organizing Maps (Kohonen Maps) In the BPN, we used supervised.
October 28, 2010Neural Networks Lecture 13: Adaptive Networks 1 Adaptive Networks As you know, there is no equation that would tell you the ideal number.
September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +
Lecture 09 Clustering-based Learning
November 21, 2012Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms III 1 Learning in the BPN Gradients of two-dimensional functions:
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Radial Basis Function (RBF) Networks
Radial Basis Function Networks
Radial Basis Function Networks
November 25, 2014Computer Vision Lecture 20: Object Recognition IV 1 Creating Data Representations The problem with some data representations is that the.
Artificial Neural Networks
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Artificial Neural Networks Dr. Abdul Basit Siddiqui Assistant Professor FURC.
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
Artificial Neural Network Unsupervised Learning
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
Artificial Neural Network Supervised Learning دكترمحسن كاهاني
Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
November 26, 2013Computer Vision Lecture 15: Object Recognition III 1 Backpropagation Network Structure Perceptrons (and many other classifiers) can only.
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
UNSUPERVISED LEARNING NETWORKS
Non-Bayes classifiers. Linear discriminants, neural networks.
Introduction to Neural Networks. Biological neural activity –Each neuron has a body, an axon, and many dendrites Can be in one of the two states: firing.
381 Self Organization Map Learning without Examples.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Semiconductors, BP&A Planning, DREAM PLAN IDEA IMPLEMENTATION.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
November 20, 2014Computer Vision Lecture 19: Object Recognition III 1 Linear Separability So by varying the weights and the threshold, we can realize any.
November 21, 2013Computer Vision Lecture 14: Object Recognition II 1 Statistical Pattern Recognition The formal description consists of relevant numerical.
CHAPTER 14 Competitive Networks Ming-Feng Yeh.
Chapter 6 Neural Network.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
Lecture 2 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 2/1 Dr.-Ing. Erwin Sitompul President University
Supervised Learning in ANNs
Neural Networks Winter-Spring 2014
CSE P573 Applications of Artificial Intelligence Neural Networks
Lecture 22 Clustering (3).
CSE 573 Introduction to Artificial Intelligence Neural Networks
Capabilities of Threshold Neurons
Computer Vision Lecture 19: Object Recognition III
Artificial Neural Networks
Presentation transcript:

Neural Networks Lecture 17: Self-Organizing Maps About Assignment #3 Two approaches to backpropagation learning: 1. “Per-pattern” learning: Update weights after every exemplar presentation. 2. “Per-epoch” (batch-mode) learning: Update weights after every epoch. During epoch, compute the sum of required changes for each weight across all exemplars. After epoch, update each weight using the respective sum. November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

Neural Networks Lecture 17: Self-Organizing Maps About Assignment #3 Per-pattern learning often approaches near-optimal network error quickly, but may then take longer to reach the error minimum. During per-pattern learning, it is important to present the exemplars in random order. Reducing the learning rate between epochs usually leads to better results. November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

Neural Networks Lecture 17: Self-Organizing Maps About Assignment #3 Per-epoch learning involves less frequent weight updates, which makes the initial approach to the error minimum rather slow. However, per-epoch learning computes the actual network error and its gradient for each weight so that the network can make more informed decisions about weight updates. Two of the most effective algorithms that exploit this information are Quickprop and Rprop. November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

The Quickprop Learning Algorithm The assumption underlying Quickprop is that the network error as a function of each individual weight can be approximated by a paraboloid. Based on this assumption, whenever we find that the gradient for a given weight switched its sign between successive epochs, we should fit a paraboloid through these data points and use its minimum as the next weight value. November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

The Quickprop Learning Algorithm Illustration (sorry for the crummy paraboloid): assumed error function (paraboloid) w E slope: E’(t-1) E(t-1) w(t-1) slope: E’(t) E(t) w(t) w(t+1) w(t-1) November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

The Quickprop Learning Algorithm Newton’s method: November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

The Quickprop Learning Algorithm For the minimum of E we must have: November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

The Quickprop Learning Algorithm Notice that this method cannot be applied if the error gradient has not decreased in magnitude and has not changed its sign at the preceding time step. In that case, we would ascent in the error function or make an infinitely large weight modification. In most cases, Quickprop converges several times faster than standard backpropagation learning. November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

Resilient Backpropagation (Rprop) The Rprop algorithm takes a very different approach to improving backpropagation as compared to Quickprop. Instead of making more use of gradient information for better weight updates, Rprop only uses the sign of the gradient, because its size can be a poor and noisy estimator of required weight updates. Furthermore, Rprop assumes that different weights need different step sizes for updates, which vary throughout the learning process. November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

Resilient Backpropagation (Rprop) The basic idea is that if the error gradient for a given weight wij had the same sign in two consecutive epochs, we increase its step size ij, because the weight’s optimal value may be far away. If, on the other hand, the sign switched, we decrease the step size. Weights are always changed by adding or subtracting the current step size, regardless of the absolute value of the gradient. This way we do not “get stuck” with extreme weights that are hard to change because of the shallow slope in the sigmoid function. November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

Resilient Backpropagation (Rprop) Formally, the step size update rules are: Empirically, best results were obtained with initial step sizes of 0.1, +=1.2, -=1.2, max=50, and min=10-6. November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

Resilient Backpropagation (Rprop) Weight updates are then performed as follows: It is important to remember that, like in Quickprop, in Rprop the gradient needs to be computed across all samples (per-epoch learning). November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

Resilient Backpropagation (Rprop) The performance of Rprop is comparable to Quickprop; it also considerably accelerates backpropagation learning. Compared to both the standard backpropagation algorithm and Quickprop, Rprop has one advantage: Rprop does not require the user to estimate or empirically determine a step size parameter and its change over time. Rprop will determine appropriate step size values by itself and can thus be applied “as is” to a variety of problems without significant loss of efficiency. November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

The Counterpropagation Network Let us look at the CPN structure again. How can this network determine its hidden-layer winner unit? Y1 Y2 Output layer H1 H2 H3 Hidden layer X1 X2 Input layer Additional connections! November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

Neural Networks Lecture 17: Self-Organizing Maps The Solution: Maxnet A maxnet is a recurrent, one-layer network that uses competition to determine which of its nodes has the greatest initial input value. All pairs of nodes have inhibitory connections with the same weight -, where typically   1/(# nodes). In addition, each node has a self-excitatory connection to itself, whose weight  is typically 1. The nodes update their net input and their output by the following equations: November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

Neural Networks Lecture 17: Self-Organizing Maps Maxnet All nodes update their output simultaneously. With each iteration, the neurons’ activations will decrease until only one neuron remains active. This is the “winner” neuron that had the greatest initial input. Maxnet is a biologically plausible implementation of a maximum-finding function. In parallel hardware, it can be more efficient than a corresponding serial function. We can add maxnet connections to the hidden layer of a CPN to find the winner neuron. November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

Neural Networks Lecture 17: Self-Organizing Maps Maxnet Example Example of a Maxnet with five neurons and  = 1,  = 0.2: 0.5 0.24 0.07 0.9 0.9 0.24 0.07 Winner! 0.07 0.24 0.9 1 0.22 0.17 0.36 November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

Self-Organizing Maps (Kohonen Maps) As you may remember, the counterpropagation network employs a combination of supervised and unsupervised learning. We will now study Self-Organizing Maps (SOMs) as examples for completely unsupervised learning (Kohonen, 1980). This type of artificial neural network is particularly similar to biological systems (as far as we understand them). November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

Self-Organizing Maps (Kohonen Maps) In the human cortex, multi-dimensional sensory input spaces (e.g., visual input, tactile input) are represented by two-dimensional maps. The projection from sensory inputs onto such maps is topology conserving. This means that neighboring areas in these maps represent neighboring areas in the sensory input space. For example, neighboring areas in the sensory cortex are responsible for the arm and hand regions. November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

Self-Organizing Maps (Kohonen Maps) Such topology-conserving mapping can be achieved by SOMs: Two layers: input layer and output (map) layer Input and output layers are completely connected. Output neurons are interconnected within a defined neighborhood. A topology (neighborhood relation) is defined on the output layer. November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

Self-Organizing Maps (Kohonen Maps) Network structure: output vector o … O1 O2 O3 Om … x1 x2 xn input vector x November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

Self-Organizing Maps (Kohonen Maps) Common output-layer structures: One-dimensional (completely interconnected) i Two-dimensional (connections omitted, only neighborhood relations shown [green]) i Neighborhood of neuron i November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

Self-Organizing Maps (Kohonen Maps) A neighborhood function (i, k) indicates how closely neurons i and k in the output layer are connected to each other. Usually, a Gaussian function on the distance between the two neurons in the layer is used:  position of i position of k November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps

Unsupervised Learning in SOMs For n-dimensional input space and m output neurons: (1) Choose random weight vector wi for neuron i, i = 1, ..., m (2) Choose random input x (3) Determine winner neuron k: ||wk – x|| = mini ||wi – x|| (Euclidean distance) (4) Update all weight vectors of all neurons i in the neighborhood of neuron k: wi := wi + ·(i, k)·(x – wi) (wi is shifted towards x) (5) If convergence criterion met, STOP. Otherwise, narrow neighborhood function  and learning parameter  and go to (2). November 16, 2010 Neural Networks Lecture 17: Self-Organizing Maps