3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland and some slides from the Internet Collected and modified by Longin Jan Latecki Temple University latecki@temple.edu

3.2 159.302Stephen Marsland Introduction  Suppose we don’t have good training data  Hard and boring to generate targets  Don’t always know target values  Biologically implausible to have targets?  Two cases:  Know when we’ve got it right  No external information at all

3.3 159.302Stephen Marsland Unsupervised Learning  We have no external error information  No task-specific error criterion  Generate internal error  Must be general  Usual method is to cluster data together according to activation of neurons  Competitive learning

3.4 159.302Stephen Marsland Competitive Learning  Set of neurons compete to fire  Neuron that ‘best matches’ the input (has the highest activation) fires  Winner-take-all  Neurons ‘specialise’ to recognise some input  Grandmother cells

3.5 159.302Stephen Marsland The k-Means Algorithm  Suppose that you know the number of clusters, but not what the clusters look like  How do you assign each data point to a cluster?  Position k centers at random in the space  Assign each point to its nearest center according to some chosen distance measure  Move the center to the mean of the points that it represents  Iterate

3.6 6 k-means Clustering

3.7 159.302Stephen Marsland Euclidean Distance x y y1 - y2 x1 - x2

3.8 159.302Stephen Marsland............ 4 means ^ + ^ + - - - - * + - + The k-Means Algorithm

3.9 159.302Stephen Marsland + ^ + - - - - * + - + ^ ^ - ^ + - - - - * + - - These are local minima solutions The k-Means Algorithm

3.10 159.302Stephen Marsland ^ - ^ + - - - - ^ + * - ^ - ^ + - - - - * + - - More perfectly valid, wrong solutions The k-Means Algorithm

3.11 159.302Stephen Marsland + - + + - - - - + + - - ^ - ^ + - - - - * + - - If you don’t know the number of means the problem is worse The k-Means Algorithm

3.12 159.302Stephen Marsland The k-Means Algorithm  One solution is to run the algorithm for many values of k  Pick the one with lowest error  Up to overfitting  Run the algorithm from many starting points  Avoids local minima?  What about noise?  Median instead of mean?

3.13 159.302Stephen Marsland k-Means Neural Network Neuron activation measures distance between input and neuron position in weight space

3.14 159.302Stephen Marsland Weight Space  Image we plot neuronal positions according to their weights w1w1 w3w3 w2w2 w2w2 w1w1 w3w3

3.15 159.302Stephen Marsland k-Means Neural Network  Use winner-take-all neurons  Winning neuron is the one closest to input  Best-matching cluster  How do we do training?  Update weights - move neuron positions  Move winning neuron towards current input  Ignore the rest

3.16 159.302Stephen Marsland Normalisation  Suppose the weights are:  (0.2, 0.2, -0.1)  (0.15, -0.15, 0.1)  (10, 10, 10) The input is (0.2, 0.2, -0.1) w1w1 w3w3 w2w2

3.17 159.302Stephen Marsland Normalisation  For a perfect match with first neuron:  0.2*0.2 + 0.2*0.2 + -0.1*-0.1 = 0.09  0.15*0.2 + -0.15*0.2 + 0.1*-0.1 = -0.01  10*0.2 + 10*0.2 + 10*-0.1 = 3  Can only compare activations if the weights are about the same size

3.18 159.302Stephen Marsland Normalisation  Make the distance between each neuron and the origin be 1  All neurons lie on the unit hypersphere  Need to stop the weights growing unboundedly

3.19 159.302Stephen Marsland k-Means Neural Network  Normalise inputs too  Then use:  That’s it  Simple and easy

3.20 159.302Stephen Marsland Vector Quantisation (VQ)  Think about the problem of data compression  Want to store a set of data (say, sensor readings) in as small an amount of memory as possible  We don’t mind some loss of accuracy  Could make a codebook of typical data and index each data point by reference to a codebook entry  Thus, VQ is a coding method by mapping each data point x to the closest codeword, i.e., we encode x by replacing it with the closest codeword.

3.21 S.R.Subramanya 21 Outline of Vector Quantization of Images

3.22 159.302Stephen Marsland The Codebook... … is sent to the receiver 10110 01001 11010 11100 11001 0123401234 10110 01001 11010 11100 11001 0123401234 At least 30 bits Vector Quantisation

3.23 159.302Stephen Marsland The data... 10110 01001 11010 11100 11001 0123401234 01001 11100 11101 00101 11110 … is encoded... …and sent 1 3 bits Vector Quantisation

3.24 159.302Stephen Marsland The data... 10110 01001 11010 11100 11001 0123401234 01001 11100 11101 00101 11110 … is encoded... …and sent 3 3 bits Vector Quantisation

3.25 159.302Stephen Marsland The data... 10110 01001 11010 11100 11001 0123401234 01001 11100 11101 00101 11110 … is encoded... ? Vector Quantisation

3.26 159.302Stephen Marsland The data... 10110 01001 11010 11100 11001 0123401234 01001 11100 11101 00101 11110 … is encoded... ? Vector Quantisation Pick the nearest according to some measure

3.27 159.302Stephen Marsland The data... 10110 01001 11010 11100 11001 0123401234 01001 11100 11101 00101 11110 … is encoded... ? Vector Quantisation Pick the nearest according to some measure And send … 3 bits, but information is lost

3.28 159.302Stephen Marsland The data... 01001 11100 11101 00101 11110 … is sent as 13313 … which takes 15 bits instead of 30 Of course, sending the codebook is inefficient for this data, but if there was a lot more information, the cost would have been reduced Vector Quantisation

3.29 159.302Stephen Marsland  The problem is that we have only sent 2 different pieces of data - 11100 and 00101, instead of the 5 we had.  If the codebook had been picked more carefully, this would have been a lot better  How can you pick the codebook?  Usually k-means is used for Vector Quantisation Learning Vector Quantisation

3.30 159.302Stephen Marsland Voronoi Tesselation  Join neighbouring points  Draw lines equidistant to each pair of points  These are perpendicular to other lines

3.31 Codewords in 2-dimensional space. Input vectors are marked with an x, codewords are marked with red circles, and the Voronoi regions are separated with boundary lines. Two Dimensional Voronoi Diagram

3.32 Self Organizing Maps Self-organizing maps (SOMs) are a data visualization technique invented by Professor Teuvo Kohonen Also called Kohonen Networks, Competitive Learning, Winner-Take-All Learning Generally reduces the dimensions of data through the use of self-organizing neural networks Useful for data visualization; humans cannot visualize high dimensional data so this is often a useful technique to make sense of large data sets

3.33 Neurons in the Brain Although heterogeneous, at a low level the brain is composed of neurons A neuron receives input from other neurons (generally thousands) from its synapses Inputs are approximately summed When the input exceeds a threshold the neuron sends an electrical spike that travels that travels from the body, down the axon, to the next neuron(s)

3.34 159.302Stephen Marsland Feature Maps Low pitch Higher pitch High pitch

3.35 159.302Stephen Marsland  Sounds that are similar (‘close together’) excite neurons that are near to each other  Sounds that are very different excite neurons that are a long way off  This is known as topology preservation  The ordering of the inputs is preserved  If possible (perfectly topology-preserving) Feature Maps

3.36 159.302Stephen Marsland Topology Preservation Inputs Outputs

3.37 159.302Stephen Marsland Topology Preservation

3.38 November 24, 2009Introduction to Cognitive Science Lecture 21: Self-Organizing Maps 38 Self-Organizing Maps (Kohonen Maps) Common output-layer structures: One-dimensional (completely interconnected for determining “winner” unit) Two-dimensional (connections omitted, only neighborhood relations shown) i i Neighborhood of neuron i

3.39 159.302Stephen Marsland The Self-Organising Map Inputs

3.40 159.302Stephen Marsland Neuron Connections?  We don’t actually need the inhibitory connections  Just use a neighbourhood of positive connections  How large should this neighbourhood be?  Early in learning, network is unordered Big neighbourhood  Later on, just fine-tuning network Small neighbourhood

3.41 159.302Stephen Marsland  The weight vectors are randomly initialised  Input vectors are presented to the network  The neurons are activated proportional to the Euclidean distance between the input and the weight vector  The winning node has its weight vector moved closer to the input  So do the neighbours of the winning node  Over time, the network self-organises so that the input topology is preserved The Self-Organising Map

3.42 159.302Stephen Marsland Self-Organisation  Global ordering from local interactions  Each neurons sees its neighbours  The whole network becomes ordered  Understanding self-organisation is part of complexity science  Appears all over the place

3.43 Basic “Winner Take All” Network Two layer network Input units, output units, each input unit is connected to each output unit I1 I2 O1 O2 Input Layer W i,j I3 Output Layer

3.44 Basic Algorithm (the same as k-Means Neural Network) Initialize Map (randomly assign weights) Loop over training examples Assign input unit values according to the values in the current example Find the “winner”, i.e. the output unit that most closely matches the input units, using some distance metric, e.g. Modify weights on the winner to more closely match the input For all output units j=1 to m and input units i=1 to n Find the one that minimizes: where c is a small positive learning constant that usually decreases as the learning proceeds

3.45 Result of Algorithm Initially, some output nodes will randomly be a little closer to some particular type of input These nodes become “winners” and the weights move them even closer to the inputs Over time nodes in the output become representative prototypes for examples in the input Note there is no supervised training here Classification: Given new input, the class is the output node that is the winner

3.46 Typical Usage: 2D Feature Map In typical usage the output nodes form a 2D “map” organized in a grid-like fashion and we update weights in a neighborhood around the winner I1 I2 Input Layer I3 Output Layers O11O12O13O14O15 O21O22O23O24O25 O31O32O33O34O35 O41O42O43O44O45 O51O52O53O54O55 …

3.47 Modified Algorithm Initialize Map (randomly assign weights) Loop over training examples Assign input unit values according to the values in the current example Find the “winner”, i.e. the output unit that most closely matches the input units, using some distance metric, e.g. Modify weights on the winner to more closely match the input Modify weights in a neighborhood around the winner so the neighbors on the 2D map also become closer to the input Over time this will tend to cluster similar items closer on the map

3.48 November 24, 2009Introduction to Cognitive Science Lecture 21: Self-Organizing Maps 48 Unsupervised Learning in SOMs For n-dimensional input space and m output neurons: (1) Choose random weight vector w i for neuron i, i = 1,..., m (2) Choose random input x (3) Determine winner neuron k: ||w k – x|| = min i ||w i – x|| (Euclidean distance) (4) Update all weight vectors of all neurons i in the neighborhood of neuron k: w i := w i + η·h(i, k)·(x – w i ) (w i is shifted towards x) (5) If convergence criterion met, STOP. Otherwise, narrow neighborhood function h and learning parameter η and go to (2).

3.49 159.302Stephen Marsland The Self-Organising Map Before training (large neighbourhood)

3.50 159.302Stephen Marsland The Self-Organising Map After training (small neighbourhood)

3.51 Updating the Neighborhood Node O 44 is the winner Color indicates scaling to update neighbors Output Layers O11O12O13O14O15 O21O22O23O24O25 O31O32O33O34O35 O41O42O43O44O45 O51O52O53O54O55 c=1 c=0.75 c=0.5

3.52 Selecting the Neighborhood Typically, a “Sombrero Function” or Gaussian function is used Neighborhood size usually decreases over time to allow initial “jockeying for position” and then “fine-tuning” as algorithm proceeds Strength Distance

3.53 Color Example http://davis.wpi.edu/~matt/courses/soms/applet.html

3.54 Kohonen Network Examples Document Map: http://websom.hut.fi/websom/milliondemo/ html/root.html

3.55 Poverty Map http://www.cis.hut.fi/rese arch/som- research/worldmap.html

3.56 SOM for Classification A generated map can also be used for classification Human can assign a class to a data point, or use the strongest weight as the prototype for the data point For a new test case, calculate the winning node and classify it as the class it is closest to

3.57 159.302Stephen Marsland Network Size  We have to predetermine the network size  Big network  Each neuron represents exact feature  Not much generalisation  Small network  Too much generalisation  No differentiation  Try different sizes and pick the best

3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

Similar presentations

Presentation on theme: "3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

Similar presentations

Presentation on theme: "3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen."— Presentation transcript:

Similar presentations

About project

Feedback