Feature mapping: Self-organizing Maps Unsupervised learning designed to achieve dimensionality reduction by topographically ordering of input patterns Basic aim of SOM is similar to data compression Given a large set of input vectors find a smaller set of prototypes that provide a good approximation to the whole input data space
Structure of SOM neural network: input nodes connected directly to a 2D lattice of output nodes wk = weight vector connecting input nodes to output node k
Creating prototypes If input space contains many instances like x, i(x) will be the topographical location of this feature in the output space. wi will become a prototype of instances of this feature
Initialize weights on all connections to small random numbers SOM algorithm Initialize weights on all connections to small random numbers 3 aspects of training: Output-node competition based on a discriminant function Winning node becomes center of a cooperative neighborhood Neighborhoods adapt to input patterns because cooperation increases the susceptible to large values of the discriminant function
Competitive learning Given randomly selected input vector x, best-matching (winning) output node i(x) = arg mink ||x – wk|| k = 1, 2, …L L = # of output nodes in lattice, all assumed to have same bias wk = weight vector connecting input nodes to output node k Equivalent to saying wi has smallest dot product with x The output node with current weight vector most like the randomly selected input vector is the winner
di,k is the lattice separation between k and i(x) Gaussian cooperative neighborhood Probability that output nodes k belongs to the neighborhood of winning output node i(x) di,k is the lattice separation between k and i(x) s(n) = s0 exp(-n/n0) Initially neighborhood is large. Decreases on successive iterations.
Adaptation Applied to all output nodes in the neighborhood of winning node k Makes the winning node more like x Leaning rate h(n) = h0 exp(-n/n0)
Colorful illustration of SOM performance Randomly colored pixels become bands of color prototypes have developed in neighborhoods of output nodes ~ size of lattice s~nearest neighbors
SOM as an elastic net covering input space Each wi points to a prototypical instance in input space. connect these points to reflect horizontal and vertical lines in the lattice get a net in input space with nodes that are the compressed representation of input space
Twist-free topographical ordering: all rectangles, no butterflies Low twist index facilitates interpretation
Refinement distorts the net to reflect input statistics Input distribution Initial weights Ordering phase Refined map
Contextual map: Classification by SOM SOM training should produce coherent regions in the output lattice where weight vectors are prototypes of the attributes of distinct classes in the input A contextual map labels these regions based on responses of the lattice to “test patterns”, labeled instances, not used in training that characterize a class
Example: SOM classification of animals 13 attributes of 16 animals 10 x 10 lattice, 2000 iterations of SOM algorithm
Input vectors for training are concatenation of label (16-component Boolean vector to identify animal) and attributes (13-component Boolean vector) Test patterns are concatenation of label with 13-component null vector
Lattice sites with strongest response for each animal type prey predators birds
Semantic map: All lattice sites label by animal type inducing the strongest response
Semantic maps resemble maps of the brain that pinpoint area of strong response to specific inputs
Unified distance matrix (u-matrix or UMAT) Standard approach for clustering applications of SOM If SOM is twist-free, Euclidian distance between the prototype vectors of neighboring output nodes approximates the distance between different parts of the underlying input data space. UMAT usually displayed as heat map with darker colors for larger distance Clusters denoted by groups of light colors; darker colors show boundaries between the clusters
UMAT of famous animal-clustering SOM by Kohonen birds mammals
Fine structure revealed by enhanced heat map Stars show local minima of distance matrixes and connectedness of clusters Each output node is connect to one and only one local minimum