Unsupervised learning: simple competitive learning Biological background: Neurons are wired topographically, nearby neurons connect to nearby neurons.

Slides:



Advertisements
Similar presentations
© Negnevitsky, Pearson Education, Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Advertisements

2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.
Un Supervised Learning & Self Organizing Maps. Un Supervised Competitive Learning In Hebbian networks, all neurons can fire at the same time Competitive.
Neural Networks Chapter 9 Joost N. Kok Universiteit Leiden.
Unsupervised learning. Summary from last week We explained what local minima are, and described ways of escaping them. We investigated how the backpropagation.
Self Organization: Competitive Learning
5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Kohonen Self Organising Maps Michael J. Watts
Unsupervised Learning with Artificial Neural Networks The ANN is given a set of patterns, P, from space, S, but little/no information about their classification,
Artificial neural networks:
Competitive learning College voor cursus Connectionistische modellen M Meeter.
Unsupervised Networks Closely related to clustering Do not require target outputs for each input vector in the training data Inputs are connected to a.
Self-Organizing Map (SOM). Unsupervised neural networks, equivalent to clustering. Two layers – input and output – The input layer represents the input.
X0 xn w0 wn o Threshold units SOM.
Radial Basis Functions
Slides are based on Negnevitsky, Pearson Education, Lecture 8 Artificial neural networks: Unsupervised learning n Introduction n Hebbian learning.
PMR5406 Redes Neurais e Lógica Fuzzy
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Un Supervised Learning & Self Organizing Maps Learning From Examples
November 9, 2010Neural Networks Lecture 16: Counterpropagation 1 Unsupervised Learning So far, we have only looked at supervised learning, in which an.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Instar Learning Law Adapted from lecture notes of the course CN510: Cognitive and Neural Modeling offered in the Department of Cognitive and Neural Systems.
1 Study of Topographic and Equiprobable Mapping with Clustering for Fault Classification Ashish Babbar EE645 Final Project.
November 24, 2009Introduction to Cognitive Science Lecture 21: Self-Organizing Maps 1 Self-Organizing Maps (Kohonen Maps) In the BPN, we used supervised.
Un Supervised Learning & Self Organizing Maps Learning From Examples
Neural Networks Lecture 17: Self-Organizing Maps
WK6 – Self-Organising Networks:
Lecture 09 Clustering-based Learning
Radial Basis Function (RBF) Networks
Project reminder Deadline: Monday :00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday during.
Lecture 12 Self-organizing maps of Kohonen RBF-networks
KOHONEN SELF ORGANISING MAP SEMINAR BY M.V.MAHENDRAN., Reg no: III SEM, M.E., Control And Instrumentation Engg.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Self Organizing Maps (SOM) Unsupervised Learning.
Self Organized Map (SOM)
CZ5225: Modeling and Simulation in Biology Lecture 5: Clustering Analysis for Microarray Data III Prof. Chen Yu Zong Tel:
Self-organizing map Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND Pasi Fränti Clustering Methods: Part.
Artificial Neural Network Unsupervised Learning
Self organizing maps 1 iCSC2014, Juan López González, University of Oviedo Self organizing maps A visualization technique with data dimension reduction.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Kohonen Mapping and Text Semantics Xia Lin College of Information Science and Technology Drexel University.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
Self Organizing Feature Map CS570 인공지능 이대성 Computer Science KAIST.
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
UNSUPERVISED LEARNING NETWORKS
381 Self Organization Map Learning without Examples.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Semiconductors, BP&A Planning, DREAM PLAN IDEA IMPLEMENTATION.
EEE502 Pattern Recognition
Self-Organizing Maps (SOM) (§ 5.5)
CHAPTER 14 Competitive Networks Ming-Feng Yeh.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Soft Computing Lecture 15 Constructive learning algorithms. Network of Hamming.
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
Computational Intelligence: Methods and Applications Lecture 9 Self-Organized Mappings Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
J. Kubalík, Gerstner Laboratory for Intelligent Decision Making and Control Artificial Neural Networks II - Outline Cascade Nets and Cascade-Correlation.
Machine Learning 12. Local Models.
Chapter 5 Unsupervised learning
Self-Organizing Network Model (SOM) Session 11
Data Mining, Neural Network and Genetic Programming
Unsupervised learning
Lecture 22 Clustering (3).
Competitive Networks.
Competitive Networks.
Feature mapping: Self-organizing Maps
Unsupervised Networks Closely related to clustering
Presentation transcript:

Unsupervised learning: simple competitive learning Biological background: Neurons are wired topographically, nearby neurons connect to nearby neurons. In visual cortex, neurons are organized in functional columns. Ocular dominance columns: one region responds to one eye input Orientation columns: one region responds to one direction

Self-organization as a principle of neural development

Competitive learning --finite resources: outputs ‘compete’ to see which will win via inhibitory connections between them --aim is to automatically discover statistically salient features of pattern vectors in training data set: feature detectors --can find clusters in training data pattern space which can be used to classify new patterns --basic structure: inputs Outputs

input layer fully connected to output layer input to output layer connection feedforward output layer compares activation's of units following presentation of pattern vector x via (sometimes virtual) inhibitory lateral connections winner selected based on largest activation winner- takes-all (WTA) linear or binary activation functions of output units. Very different from previous (supervised) learning where we pay our attention to input-output relationship, here we will look at the pattern of connections (weights)

Simple competitive learning algorithm Initialise all weights to random values and normalise (so that || w ||=1) loop until stopping criteria satisfied. choose pattern vector x from training set Compute distance between pattern and weight vectors || X i - W || find output unit with largest activation ie ‘winner’ i* with the property that || X i* - W ||< || X i - W || update the weight vector of winning unit only with W(t+1) = W(t)+  (t) (X i - W (t)) end loop

NB choosing the largest output is the same as choosing the vector w that is nearest to x since: a)w.x = w T x = ||w|| ||x|| cos(angle between x and w) = ||w|| ||x|| if the angle is 0 b) ||w - x|| 2 = (w1-x1) 2 + (w2-x2) 2 = w1 2 + w2 2 + x1 2 + x2 2 – 2(x1w1 + x2w2) = ||w|| 2 + ||x|| w T x Since ||w|| = 1 and ||x|| is fixed, minimising ||w - x|| 2 is equivalent to maximising w T x Therefore, as we only really want the angle, WLOG only consider inputs with ||x|| =1

EG  = 0.5 W1 = (-1, 0) W2 = (0,1) X = (1, 0) Y1= -1, y2 = 0 so y2 wins W2-> (0.5, 0.5) Y1= -1, y2 = 0.5 so y2 wins W2-> (0.75, 0.25) Y1= -1, y2 = 0.75 so y2 wins W2-> (0.875, 0.125) Etc etc

How does competitive learning work? can view the points of all input vectors as in contact with surface of hypersphere (in 2D: a circle) distance between points on surface of hypersphere = degree of similarity between patterns Outputs inputs Initial state

inputs Outputs with respect to the incoming signal, the weight of the yellow line is updated so that the vector is rotated towards the incoming signal W(t+1) = W(t)+  (t) (X i - W (t)) Thus the weight vector becomes more and more similar to the input ie a feature detector for that input

When there are more input patterns, can see each weight vector migrates from initial position (determined randomly) to centre of gravity of a cluster of the input vectors. Thus: Discover Clusters inputs Outputs Eg on-line k-means. Nearest centre updated by: centre i (n+1) = centre i +  (t) (X i - centre i (t))

Error minimization viewpoint: Consider error minimization across all patterns N in training set. Aim is to decrease error E E =  ||X i - W (t)|| 2 For winning unit k when pattern is X i the direction the weights need to change in a direction (so as to perform gradient descent) determined by (from previous lectures): W(t+1) = W(t)+  (t) (X i - W (t)) Which is the update rule for supervised learning (remembering that in supervised learning, W is O, the output of the neurons) ie replace W by O and we recover the adaline/simple gradient descent learning rule

Enforcing fairer competition Initial position of weight vector of an output unit may be in region with few, if any, patterns (cf problems of k-means) Many never or rarely become a winner and so weight vector may not be updated preventing it finding richer part of pattern space DEAD UNIT Or, initial position of a weight vector may be close to a large number of patterns while most other unit’s weights are more distant CONTINUAL WINNER, but weights will change little over time and prevent other units competing More efficient to ensure a fairer competition where each unit has an equal chance of representing some part of training data

Leaky learning modify weights of both winning and losing units but at different learning rates where  w  (t) >>  L  (t) has the effect of slowly moving losing units towards denser regions pattern space. Many other ways as we will discuss later on

Vector Quantization: Application of competitive learning Idea: Categorize a given set of input vectors into M classes using competitive learning algorithms, and then represent any vector just by the class into which it falls Important use of competitive learning (esp. in data compressing) divides entire pattern space into a number of separate subspaces set of M units represent set of prototype vectors: CODEBOOK (cf k- means) new pattern x is assigned to a class based on its closeness to a prototype vector using Euclidean distances LABELED (cf k-nearest neighbours, kernel density estimation)

Example (VQ) See:

Topographic maps Extend the ideas of competitive learning to incorporate the neighborhood around inputs and neurons We want a nonlinear transformation of input pattern space onto output feature space which preserves neighbourhood relationship between the inputs -- a feature map where nearby neurons respond to similar inputs Eg Place cells, orientation columns, somatosensory cells etc Idea is that neurons selectively tune to particular input patterns in such a way that the neurons become ordered with respect to each other so that a meaningful coordinate system for different input features is created

Known as a Topographic map: spatial locations are indicative of the intrinsic statistical features of the input patterns: ie close in the input => close in the output EG: When the yellow input is active, the yellow neuron is the winner so when the orange input input is active, we want the orange neuron to be the winner

2 layer network each cortical unit fully connect to visual space via Hebbian units Interconnections of cortical units described by ‘Mexican-hat’ function (Garbor function): short-range excitation and long-range inhibition visual space cortical units Eg Activity-based self-organization (von der Malsburg, 1973) incorporation of competitive and cooperative mechanisms to generate feature maps using unsupervised learning networks Biologically motivated: how can activity-based learning using highly interconnected circuits lead to orderly mapping of visual stimulus space onto cortical surface? (visual-tectum map)

After learning (see original paper for details), a topographic map appears. However, input dimension is the same as output dimension Kohonen simplified this model and called it Kohonen’s self-organizing map (SOM) algorithm More general as it can perform dimensionality reduction SOM can be viewed as a vector quantisation type algorithm

Kohonen’s self-organizing map (SOM) Algorithm set time, t=0 initialise learning rate  (0) initialise size of neighbourhood function initialize all weights to small random values inputs

Loop until stopping criteria satisfied Choose pattern vector x from training set Compute distance between pattern and weight vectors for each output unit || x - W i (t) || Find winning unit from minimum distance i*: || x - W i* (t) || = min || x - W i (t) || Update weights of winning and neighbouring units using neighbourhood functions w ij (t+1)=w ij (t)+  (t) h(i,i*,t) [x j - w ij (t)] note that when h(i,i*)=1 if i=i* and 0 otherwise, we have the simple competitive learning

Decrease size of neighbourhood when t is large we have h(i,i*,t)=1 if i=i* and 0 otherwise Decrease learning rate  (t) when t is large we have  (t) ~ 0 Increment time t=t+1 end loop Generally, need a LARGE number of iterations

Neighbourhood function Relates degree of weight update to distance from winning unit, i* to other units in lattice. Typically a Gaussian function When i=i*, distance is zero so h=1 Note h decreases monotonically with distance from winning unit and is symmetrical about winning unit Important that the size of neighbourhood (width) decreases over time to stabilize mapping, otherwise would get noisy version of competitive learning

Biologically, development (self-organization) is governed by the gradient of a chemical element, which generally preserves the topological relation. Kohonen (and others) has shown that the neighborhood function could be implemented by a difusing neuromodulatory gas such as nitric oxide (which has been shown to be necessary for learning esp. spatial learning) Thus there is some biological motivation for the SOM

Weight Adaptation two phases during training: ordering and convergence Ordering Phase -Topographic ordering of weight vectors of output units to untangle map so that weight vectors are evenly spread Example: W2W2 W3W3 W1W1 W 1 is closest to input x unit 1 is winner W 3 is closer to x than W 2 x

W2W2 W3W3 W1W1 W 2 moves closer to x since h(1,2) > 0; W 3 does not move as h(1,3) = 0 After x is shown many times W 2 moves closer to W 1 Finally, the weights will have the correct order Completion of ordering phase W2W2 W3W3 W1W1

Convergence phase Topographic map adapts to probability density of input patterns in training data However, it is important that ordering is maintained Thus learning rate should be small to prevent large movements of weight vector but non-zero to avoid pathological states (getting caught in odd minima etc) Neighbourhood size should also start off relatively small containing only close neighbours and eventually shrinking to nearest neighbours (or less)

Problems SOM tends to overrepresent regions of low density and underrepresent regions of high density where p(x) is the distribution density of input data Also, can have problems when there is no natural 2d ordering of the data.

Visualizing network training Consider simple 2D lattice of units Six output units in 2 D space: set up neighbourhood function according to some two dimensional structure, connect units which are close together with a line

Initial random distribution of weight vectors of each unit in 2D Ordered weight vectors after training

Example 2D to 2D

Example (JAVA) See: bochum.de/ini/VDM/research/gsn/DemoGNG/GNG.html

Semantic Maps In the previous demos etc, we only viewed the feature map by visualising the weight vectors of the neurons. Another method of visualisation is to view class labels assigned to neurons in a 2d lattice depending on how they respond to (unseen) test patterns: contextual or semantic maps In this way the SOM creates coherent regions by grouping sets of labels with similar features together Useful tool in data mining and for visualising complex high dimensional data

Example: Animal and attributes, 16 animals with 13 attributes Animal small is medium big 2 legs 4 legs has hair hooves mane see Haykin’s book feathers hunt likes run to fly swim Dove Hen Duck Goose Owl Hawk Eagle Fox Dog Wolf Cat Tiger Lion Horse Zebra cow

lion=(0,0,0,0,0,0,0,0,0, 0.2, 0,0,0, 0,0,1,0,1,1,0,1,0,1,1,0,0) 0,0,0,0,0,0,0,0,0, 0.2, 0,0,0 =animal code Train for 2000 interations of all animals (until convergence) Next the code for each animal is presented but with the animal code all zeroes. We then assign the neuron to the animal for which it gives the largest response 10x10 neurons output 29 D inputs