3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

Slides:



Advertisements
Similar presentations
© Negnevitsky, Pearson Education, Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Advertisements

Clustering. How are we doing on the pass sequence? Pretty good! We can now automatically learn the features needed to track both people But, it sucks.
2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.
Data Mining Classification: Alternative Techniques
Neural Networks Chapter 9 Joost N. Kok Universiteit Leiden.
Unsupervised learning. Summary from last week We explained what local minima are, and described ways of escaping them. We investigated how the backpropagation.
Self Organization: Competitive Learning
5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Kohonen Self Organising Maps Michael J. Watts
Unsupervised Learning with Artificial Neural Networks The ANN is given a set of patterns, P, from space, S, but little/no information about their classification,
Artificial neural networks:
Competitive learning College voor cursus Connectionistische modellen M Meeter.
Unsupervised Networks Closely related to clustering Do not require target outputs for each input vector in the training data Inputs are connected to a.
Self-Organizing Map (SOM). Unsupervised neural networks, equivalent to clustering. Two layers – input and output – The input layer represents the input.
Ch. 4: Radial Basis Functions Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from many Internet sources Longin.
Biological and Artificial Neurons Michael J. Watts
Artificial Neural Networks - Introduction -
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
X0 xn w0 wn o Threshold units SOM.
Self Organizing Maps. This presentation is based on: SOM’s are invented by Teuvo Kohonen. They represent multidimensional.
Slides are based on Negnevitsky, Pearson Education, Lecture 8 Artificial neural networks: Unsupervised learning n Introduction n Hebbian learning.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Un Supervised Learning & Self Organizing Maps Learning From Examples
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
1 Study of Topographic and Equiprobable Mapping with Clustering for Fault Classification Ashish Babbar EE645 Final Project.
November 24, 2009Introduction to Cognitive Science Lecture 21: Self-Organizing Maps 1 Self-Organizing Maps (Kohonen Maps) In the BPN, we used supervised.
Neural Networks Lecture 17: Self-Organizing Maps
Lecture 09 Clustering-based Learning
Radial Basis Function (RBF) Networks
 C. C. Hung, H. Ijaz, E. Jung, and B.-C. Kuo # School of Computing and Software Engineering Southern Polytechnic State University, Marietta, Georgia USA.
Project reminder Deadline: Monday :00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday during.
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Self Organizing Maps (SOM) Unsupervised Learning.
Self Organized Map (SOM)
CZ5225: Modeling and Simulation in Biology Lecture 5: Clustering Analysis for Microarray Data III Prof. Chen Yu Zong Tel:
Artificial Neural Networks Dr. Abdul Basit Siddiqui Assistant Professor FURC.
Artificial Neural Network Unsupervised Learning
Introduction to Artificial Neural Network Models Angshuman Saha Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
ADVANCED PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
UNSUPERVISED LEARNING NETWORKS
Introduction to Neural Networks. Biological neural activity –Each neuron has a body, an axon, and many dendrites Can be in one of the two states: firing.
381 Self Organization Map Learning without Examples.
CUNY Graduate Center December 15 Erdal Kose. Outlines Define SOMs Application Areas Structure Of SOMs (Basic Algorithm) Learning Algorithm Simulation.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Unsupervised Learning Networks 主講人 : 虞台文. Content Introduction Important Unsupervised Learning NNs – Hamming Networks – Kohonen’s Self-Organizing Feature.
Semiconductors, BP&A Planning, DREAM PLAN IDEA IMPLEMENTATION.
Vector Quantization CAP5015 Fall 2005.
Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg
CHAPTER 14 Competitive Networks Ming-Feng Yeh.
Unsupervised learning: simple competitive learning Biological background: Neurons are wired topographically, nearby neurons connect to nearby neurons.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Soft Computing Lecture 15 Constructive learning algorithms. Network of Hamming.
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
CLUSTERING EE Class Presentation. TOPICS  Clustering basic and types  K-means, a type of Unsupervised clustering  Supervised clustering type.
J. Kubalík, Gerstner Laboratory for Intelligent Decision Making and Control Artificial Neural Networks II - Outline Cascade Nets and Cascade-Correlation.
Chapter 5 Unsupervised learning
Self-Organizing Network Model (SOM) Session 11
Data Mining, Neural Network and Genetic Programming
Lecture 22 Clustering (3).
Unsupervised Networks Closely related to clustering
David Kauchak CS158 – Spring 2019
Presentation transcript:

Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland and some slides from the Internet Collected and modified by Longin Jan Latecki Temple University

Stephen Marsland Introduction  Suppose we don’t have good training data  Hard and boring to generate targets  Don’t always know target values  Biologically implausible to have targets?  Two cases:  Know when we’ve got it right  No external information at all

Stephen Marsland Unsupervised Learning  We have no external error information  No task-specific error criterion  Generate internal error  Must be general  Usual method is to cluster data together according to activation of neurons  Competitive learning

Stephen Marsland Competitive Learning  Set of neurons compete to fire  Neuron that ‘best matches’ the input (has the highest activation) fires  Winner-take-all  Neurons ‘specialise’ to recognise some input  Grandmother cells

Stephen Marsland The k-Means Algorithm  Suppose that you know the number of clusters, but not what the clusters look like  How do you assign each data point to a cluster?  Position k centers at random in the space  Assign each point to its nearest center according to some chosen distance measure  Move the center to the mean of the points that it represents  Iterate

3.6 6 k-means Clustering

Stephen Marsland Euclidean Distance x y y1 - y2 x1 - x2

Stephen Marsland means ^ + ^ * The k-Means Algorithm

Stephen Marsland + ^ * ^ ^ - ^ * These are local minima solutions The k-Means Algorithm

Stephen Marsland ^ - ^ ^ + * - ^ - ^ * More perfectly valid, wrong solutions The k-Means Algorithm

Stephen Marsland ^ - ^ * If you don’t know the number of means the problem is worse The k-Means Algorithm

Stephen Marsland The k-Means Algorithm  One solution is to run the algorithm for many values of k  Pick the one with lowest error  Up to overfitting  Run the algorithm from many starting points  Avoids local minima?  What about noise?  Median instead of mean?

Stephen Marsland k-Means Neural Network Neuron activation measures distance between input and neuron position in weight space

Stephen Marsland Weight Space  Image we plot neuronal positions according to their weights w1w1 w3w3 w2w2 w2w2 w1w1 w3w3

Stephen Marsland k-Means Neural Network  Use winner-take-all neurons  Winning neuron is the one closest to input  Best-matching cluster  How do we do training?  Update weights - move neuron positions  Move winning neuron towards current input  Ignore the rest

Stephen Marsland Normalisation  Suppose the weights are:  (0.2, 0.2, -0.1)  (0.15, -0.15, 0.1)  (10, 10, 10) The input is (0.2, 0.2, -0.1) w1w1 w3w3 w2w2

Stephen Marsland Normalisation  For a perfect match with first neuron:  0.2* * *-0.1 = 0.09  0.15* * *-0.1 =  10* * *-0.1 = 3  Can only compare activations if the weights are about the same size

Stephen Marsland Normalisation  Make the distance between each neuron and the origin be 1  All neurons lie on the unit hypersphere  Need to stop the weights growing unboundedly

Stephen Marsland k-Means Neural Network  Normalise inputs too  Then use:  That’s it  Simple and easy

Stephen Marsland Vector Quantisation (VQ)  Think about the problem of data compression  Want to store a set of data (say, sensor readings) in as small an amount of memory as possible  We don’t mind some loss of accuracy  Could make a codebook of typical data and index each data point by reference to a codebook entry  Thus, VQ is a coding method by mapping each data point x to the closest codeword, i.e., we encode x by replacing it with the closest codeword.

3.21 S.R.Subramanya 21 Outline of Vector Quantization of Images

Stephen Marsland The Codebook... … is sent to the receiver At least 30 bits Vector Quantisation

Stephen Marsland The data … is encoded... …and sent 1 3 bits Vector Quantisation

Stephen Marsland The data … is encoded... …and sent 3 3 bits Vector Quantisation

Stephen Marsland The data … is encoded... ? Vector Quantisation

Stephen Marsland The data … is encoded... ? Vector Quantisation Pick the nearest according to some measure

Stephen Marsland The data … is encoded... ? Vector Quantisation Pick the nearest according to some measure And send … 3 bits, but information is lost

Stephen Marsland The data … is sent as … which takes 15 bits instead of 30 Of course, sending the codebook is inefficient for this data, but if there was a lot more information, the cost would have been reduced Vector Quantisation

Stephen Marsland  The problem is that we have only sent 2 different pieces of data and 00101, instead of the 5 we had.  If the codebook had been picked more carefully, this would have been a lot better  How can you pick the codebook?  Usually k-means is used for Vector Quantisation Learning Vector Quantisation

Stephen Marsland Voronoi Tesselation  Join neighbouring points  Draw lines equidistant to each pair of points  These are perpendicular to other lines

3.31 Codewords in 2-dimensional space. Input vectors are marked with an x, codewords are marked with red circles, and the Voronoi regions are separated with boundary lines. Two Dimensional Voronoi Diagram

3.32 Self Organizing Maps Self-organizing maps (SOMs) are a data visualization technique invented by Professor Teuvo Kohonen Also called Kohonen Networks, Competitive Learning, Winner-Take-All Learning Generally reduces the dimensions of data through the use of self-organizing neural networks Useful for data visualization; humans cannot visualize high dimensional data so this is often a useful technique to make sense of large data sets

3.33 Neurons in the Brain Although heterogeneous, at a low level the brain is composed of neurons A neuron receives input from other neurons (generally thousands) from its synapses Inputs are approximately summed When the input exceeds a threshold the neuron sends an electrical spike that travels that travels from the body, down the axon, to the next neuron(s)

Stephen Marsland Feature Maps Low pitch Higher pitch High pitch

Stephen Marsland  Sounds that are similar (‘close together’) excite neurons that are near to each other  Sounds that are very different excite neurons that are a long way off  This is known as topology preservation  The ordering of the inputs is preserved  If possible (perfectly topology-preserving) Feature Maps

Stephen Marsland Topology Preservation Inputs Outputs

Stephen Marsland Topology Preservation

3.38 November 24, 2009Introduction to Cognitive Science Lecture 21: Self-Organizing Maps 38 Self-Organizing Maps (Kohonen Maps) Common output-layer structures: One-dimensional (completely interconnected for determining “winner” unit) Two-dimensional (connections omitted, only neighborhood relations shown) i i Neighborhood of neuron i

Stephen Marsland The Self-Organising Map Inputs

Stephen Marsland Neuron Connections?  We don’t actually need the inhibitory connections  Just use a neighbourhood of positive connections  How large should this neighbourhood be?  Early in learning, network is unordered Big neighbourhood  Later on, just fine-tuning network Small neighbourhood

Stephen Marsland  The weight vectors are randomly initialised  Input vectors are presented to the network  The neurons are activated proportional to the Euclidean distance between the input and the weight vector  The winning node has its weight vector moved closer to the input  So do the neighbours of the winning node  Over time, the network self-organises so that the input topology is preserved The Self-Organising Map

Stephen Marsland Self-Organisation  Global ordering from local interactions  Each neurons sees its neighbours  The whole network becomes ordered  Understanding self-organisation is part of complexity science  Appears all over the place

3.43 Basic “Winner Take All” Network Two layer network Input units, output units, each input unit is connected to each output unit I1 I2 O1 O2 Input Layer W i,j I3 Output Layer

3.44 Basic Algorithm (the same as k-Means Neural Network) Initialize Map (randomly assign weights) Loop over training examples Assign input unit values according to the values in the current example Find the “winner”, i.e. the output unit that most closely matches the input units, using some distance metric, e.g. Modify weights on the winner to more closely match the input For all output units j=1 to m and input units i=1 to n Find the one that minimizes: where c is a small positive learning constant that usually decreases as the learning proceeds

3.45 Result of Algorithm Initially, some output nodes will randomly be a little closer to some particular type of input These nodes become “winners” and the weights move them even closer to the inputs Over time nodes in the output become representative prototypes for examples in the input Note there is no supervised training here Classification: Given new input, the class is the output node that is the winner

3.46 Typical Usage: 2D Feature Map In typical usage the output nodes form a 2D “map” organized in a grid-like fashion and we update weights in a neighborhood around the winner I1 I2 Input Layer I3 Output Layers O11O12O13O14O15 O21O22O23O24O25 O31O32O33O34O35 O41O42O43O44O45 O51O52O53O54O55 …

3.47 Modified Algorithm Initialize Map (randomly assign weights) Loop over training examples Assign input unit values according to the values in the current example Find the “winner”, i.e. the output unit that most closely matches the input units, using some distance metric, e.g. Modify weights on the winner to more closely match the input Modify weights in a neighborhood around the winner so the neighbors on the 2D map also become closer to the input Over time this will tend to cluster similar items closer on the map

3.48 November 24, 2009Introduction to Cognitive Science Lecture 21: Self-Organizing Maps 48 Unsupervised Learning in SOMs For n-dimensional input space and m output neurons: (1) Choose random weight vector w i for neuron i, i = 1,..., m (2) Choose random input x (3) Determine winner neuron k: ||w k – x|| = min i ||w i – x|| (Euclidean distance) (4) Update all weight vectors of all neurons i in the neighborhood of neuron k: w i := w i + η·h(i, k)·(x – w i ) (w i is shifted towards x) (5) If convergence criterion met, STOP. Otherwise, narrow neighborhood function h and learning parameter η and go to (2).

Stephen Marsland The Self-Organising Map Before training (large neighbourhood)

Stephen Marsland The Self-Organising Map After training (small neighbourhood)

3.51 Updating the Neighborhood Node O 44 is the winner Color indicates scaling to update neighbors Output Layers O11O12O13O14O15 O21O22O23O24O25 O31O32O33O34O35 O41O42O43O44O45 O51O52O53O54O55 c=1 c=0.75 c=0.5

3.52 Selecting the Neighborhood Typically, a “Sombrero Function” or Gaussian function is used Neighborhood size usually decreases over time to allow initial “jockeying for position” and then “fine-tuning” as algorithm proceeds Strength Distance

3.53 Color Example

3.54 Kohonen Network Examples Document Map: html/root.html

3.55 Poverty Map arch/som- research/worldmap.html

3.56 SOM for Classification A generated map can also be used for classification Human can assign a class to a data point, or use the strongest weight as the prototype for the data point For a new test case, calculate the winning node and classify it as the class it is closest to

Stephen Marsland Network Size  We have to predetermine the network size  Big network  Each neuron represents exact feature  Not much generalisation  Small network  Too much generalisation  No differentiation  Try different sizes and pick the best