Self-organizing map Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND Pasi Fränti Clustering Methods: Part 9
SOM main principles Self-organizing map (SOM) is a clustering method suitable especially for visualization. Clustering represented by centroids organized in a 1-d or 2-d network. Dimensionality reduction and visualization possibility achieved as side product. Clustering performed by competitive learning principle.
Self-organizing map Initial configuration M nodes, one for each cluster Initial locations not important Nodes connected by network topology (1-d or 2-d)
Self-organizing map Final configuration Node locations adapt during learning stage Network keeps neighbor vectors close to each other Network limits the movement of vectors during learning
SOM pseudo code (1/2) Learning stage
SOM pseudo code Update centroids (2/2)
Competitive learning Each data vector is processed once. Find nearest centroid: The centroid is updated by moving it towards the data vector by: Learning stage similar to k-means but centroid update has different principle.
Decreases with time movement is large in the beginning but eventually stabilizes. Linear decrease of weighting: Learning rate ( ) Exponential decrease of weighting:
Neighborhood (d) Neighboring centroids are also updated: Effect is stronger for nearby centroids:
Weighting of the neighborhood Weighting decreases exponentially
Number of iterations T –Convergence of SOM is rather slow Should be set as high as possible –Roughly iterations at minimum. Size of the initial neighborhood D max –Small enough to allow local adaption. –Value D=0 indicates no neighbor structure Maximum learning rate A –Higher values have mostly random effect. –Most critical are the final stages (D 2) –Optimal choices of A and D max highly correlated. Parameter setup
Difficulty of parameter setup Fixing total number of iterations (T D max ) to 20, 40 and 80. Optimal parameter combination non-trivial.
To reduce the effect of parameter set- up, should be as high as possible. Enough time to adapt at the cost of high time complexity. Adaptive number of iterations: Adaptive number of iterations For D max =10 and T max =100: Ti = {1, 1, 1, 1, 2, 3, 6, 13, 25, 50, 100}
Example of SOM (1-d) One cluster too many One cluster missing
Example of SOM (2-d) (to appear sometime in future)
1.T. Kohonen, Self-Organization and Associative Memory. Springer- Verlag, New York, N.M. Nasrabadi and Y. Feng, "Vector quantization of images based upon the Kohonen self-organization feature maps", Neural Networks, 1 (1), 518, P. Fränti, "On the usefulness of self-organizing maps for the clustering problem in vector quantization", 11th Scandinavian Conf. on Image Analysis (SCIA’99), Kangerlussuaq, Greenland, vol. 1, , Literature