The Dynamics of Learning Vector Quantization, RUG, The Dynamics of Learning Vector Quantization Rijksuniversiteit Groningen Mathematics and Computing Science Michael Biehl, Anarta Ghosh TU Clausthal-Zellerfeld Institute of Computing Science Barbara Hammer
The Dynamics of Learning Vector Quantization, RUG, Vector Quantization (VQ) Learning Vector Quantization (LVQ) Introduction The dynamics of learning a model situation: randomized data learning algorithms for VQ und LVQ analysis and comparison: dynamics, success of learning Summary Outlook prototype-based learning from example data: representation, classification
The Dynamics of Learning Vector Quantization, RUG, Vector Quantization (VQ) aim: representation of large amounts of data by (few) prototype vectors example: identification and grouping in clusters of similar data assignment of feature vector to the closest prototype w (similarity or distance measure, e.g. Euclidean distance )
The Dynamics of Learning Vector Quantization, RUG, unsupervised competitive learning initialize K prototype vectors present a single example identify the closest prototype, i.e the so-called winner move the winner even closer towards the example intuitively clear, plausible procedure - places prototypes in areas with high density of data - identifies the most relevant combinations of features - (stochastic) on-line gradient descent with respect to the cost function...
The Dynamics of Learning Vector Quantization, RUG, quantization error prototypes data w j is the winner ! here: Euclidean distance aim: faithful representation (in general: ≠ clustering ) Result depends on - the number of prototype vectors - the distance measure / metric used
The Dynamics of Learning Vector Quantization, RUG, Learning Vector Quantization (LVQ) aim: classification of data learning from examples Learning: choice of prototypes according to example data example situtation: 3 classes classification: assignment of a vector to the class of the closest prototype w, 3 prototypes aim : generalization ability, i.e. correct classification of novel data after training
The Dynamics of Learning Vector Quantization, RUG, prominent example [Kohonen]: “ LVQ 2.1. ” present a single example initialize prototype vectors (for different classes) identify the closest correct and the closest wrong prototype move the corresponding winner towards / away from the example known convergence / stability problems, e.g. for infrequent classes mostly: heuristically motivated variations of competitive learning
The Dynamics of Learning Vector Quantization, RUG, LVQ algorithms... - are frequently applied in a variety of problems involving the classification of structured data, a few examples: - appear plausible, intuitive, flexible - are fast, easy to implement - real time speech recognition - medical diagnosis, e.g. from histological data - texture recognition and classification - gene expression data analysis -...
The Dynamics of Learning Vector Quantization, RUG, illustration: microscopic images of (pig) semen cells after freezing and storage, c/o Lidia Sanchez-Gonzalez, Leon/Spain
The Dynamics of Learning Vector Quantization, RUG, healthy cells damaged cells prototypes obtained by LVQ (1) illustration: microscopic images of (pig) semen cells after freezing and storage, c/o Lidia Sanchez-Gonzalez, Leon/Spain
The Dynamics of Learning Vector Quantization, RUG, LVQ algorithms... - are often based on purely heuristic arguments, or derived from a cost function with unclear relation to the generalization ability - almost exclusively use the Euclidean distance measure, inappropriate for heterogeneous data - lack, in general, a thorough theoretical understanding of dynamics, convergence properties, performance w.r.t. generalization, etc.
The Dynamics of Learning Vector Quantization, RUG, In the following: analysis of LVQ algorithms w.r.t. - dynamics of the learning process - performance, i.e. generalization ability - asymptotic behavior in the limit of many examples typical behavior in a model situation - randomized, high-dimensional data - essential features of LVQ learning aim: - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications
The Dynamics of Learning Vector Quantization, RUG, model situation : two clusters of N-dimensional data random vectors ∈ ℝ N according to mixture of two Gaussians: orthonormal center vectors: B +, B - ∈ ℝ N, ( B ) 2 =1, B + · B - =0 prior weights of classes p +, p - p + + p - = 1 B+B+ B-B- (p+)(p+) (p-)(p-) separation ℓ ℓ independent components:
The Dynamics of Learning Vector Quantization, RUG, high-dimensional data (formally: N ∞) 400 examples ξ μ ∈ℝ N, N=200, ℓ=1, p + =0.6 μ B y ξ (240) (160) projections into the plane of center vectors B +, B - μ By ξ μ 2 2 x ξ w (240) (160) projections in two independent random directions w 1,2 μ 11 x ξw model for studying typical behavior of LVQ algorithms, not: density-estimation based classification Note:
The Dynamics of Learning Vector Quantization, RUG, dynamics of on-line training sequence of independent random data acc. to learning rate, step size competition, direction of update etc. change of prototype towards or away from the current data above examples: unsupervised Vector Quantization The Winner Takes It All (classes irrelevant/unknown) Learning Vector Quantization “2.1.” here: two prototypes, no explicit competition update of prototype vectors:
The Dynamics of Learning Vector Quantization, RUG, recursions mathematical analysis of the learning dynamics projections distances random vector ξ μ enters only in the form of projections in the (B +, B - )-plane length and relative position of prototypes 1. description in terms of a few characteristic quantitities ( here: ℝ 2N ℝ 7 )
The Dynamics of Learning Vector Quantization, RUG, completely specified in terms of first and second moments (w/o indices μ) in the thermodynamic limit N random vector acc. to correlated Gaussian random quantities 2. average over the current example averaged recursions closed in { R sσ, Q st }
The Dynamics of Learning Vector Quantization, RUG, characteristic quantities - depend on the random sequence of example data - their variance vanishes with N (here: ∝ N -1 ) learning dynamics is completely described in terms of averages 3. self-averaging properties 4. continuous learning time # of examples # of learning steps per degree of freedom recursions coupled, ordinary differential equations evolution of projections
The Dynamics of Learning Vector Quantization, RUG, probability for misclassification of a novel example 5. learning curve generalization error ε g (α) after training with α N examples N - repulsive/attractive fixed points of the dynamics - asymptotic behavior for - dependence on learning rate, separation, initialization -... investigation and comparison of given algorithms - time-dependent learning rate η(α) - variational optimization w.r.t. f s [...] -... optimization and development of new prescriptions maximize
The Dynamics of Learning Vector Quantization, RUG, optimal classification with minimal generalization error B-B- B+B+ (p - >p + ) (p + ) separation of classes by the plane with in the model situation (equal variances of clusters): excess error minimal ε g as a function of prior weights ℓ=2 εgεg p+p+ ℓ=1 ℓ=0 ℓ
The Dynamics of Learning Vector Quantization, RUG, “LVQ 2.1.“ update the correct and wrong winner (analytical) integration for w s (0) = 0 p = (1+m ) / 2 (m>0) [Seo, Obermeyer]: LVQ2.1. ↔ cost function (likelihood ratios) theory and simulation (N=100) p + =0.8, ℓ=1, =0.5 averages over 100 independent runs
The Dynamics of Learning Vector Quantization, RUG, (p - ) (p + > p - ) strategies: - selection of data in a window close to the current decision boundary slows down the repulsion, system remains instable - Soft Robust Learning Vector Quantization [Seo & Obermayer] density-estimation based cost function limiting case Learning from mistakes: LVQ2.1-step only, if the example is currently misclassified slow learning, poor generalization problem: instability of the algorithm due to repulsion of wrong prototypes trivial classification für α ∞: ε g = max { p +,p - }
The Dynamics of Learning Vector Quantization, RUG, “ The winner takes it all ” numerical integration for w s (0)=0 theory and simulation (N=200) p + =0.2, ℓ=1.2, =1.2 averaged over 100 indep. runs Q ++ Q -- Q +- α w+w+ w-w- ℓ B + ℓ B - trajectories in the (B +,B - )-plane ( ) =20,40, optimal decision boundary ____ asymptotic position R S+ R S- R -- R -+ R -- R ++ winner w s 11 I) LVQ 1 [Kohonen] only the winner is updated according to the class membership w-w-
The Dynamics of Learning Vector Quantization, RUG, learning curve εgεg =1.2 (p+=0.2, ℓ=1.2) ε g (α ∞) grows lin. with η - stationary state: - role of the learning rate α εgεg η η0η0 - variable rate η(α) !? - well-defined asymptotics: (ODE linear in η ) 10 εgεg min. ε g (η α)(η α) η0η0 η 0, α ∞ ( η α ) ∞ suboptimal
The Dynamics of Learning Vector Quantization, RUG, “ The winner takes it all “ II ) LVQ+ ( only positive steps without repulsion) winnercorrect α ∞ asymptotic configuration symmetric about ℓ (B + +B - )/2 w-w- w+w+ ℓ B + ℓ B - p+=0.2, ℓ=1.2, =1.2 classification scheme and the achieved generalization error are independent of the prior weights p (and optimal for p = 1/2 ) LVQ+ ≈ VQ within the classes (w s updated only from class S)
The Dynamics of Learning Vector Quantization, RUG, LVQ 2.1. trivial assignment to the more frequent class optimal classification εgεg p+p+ min {p +,p - } - LVQ 1 here: close to optimal classification p+p+ - LVQ+ min-max solution p ± -independent classification p+=0.2, ℓ=1.0, =1.0 εgεg α learning curves LVQ+ LVQ1 asymptotics: η 0, (ηα) ∞
The Dynamics of Learning Vector Quantization, RUG, Vector Quantization competitive learning w s winner class membership is unknown or identical for all data numerical integration for w s (0)≈0 ( p + =0.2, ℓ=1.0, =1.2 ) εgεg α VQ LVQ+ LVQ1 α α R ++ R +- R -+ R system is invariant under exchange of the prototypes weakly repulsive fixed points
The Dynamics of Learning Vector Quantization, RUG, interpretations: - VQ, unsupervised learning unlabelled data - LVQ, two prototypes of the same class, identical labels - LVQ, different classes, but labels are not used in training εgεg p+p+ asymptotics ( , 0, ) p + ≈0 p - ≈1 - low quantization error - high gen. error ε g
The Dynamics of Learning Vector Quantization, RUG, work in progress, outlook regularization of LVQ 2.1, Robust Soft LVQ [Seo, Obermayer] model: different cluster variances, more clusters/prototypes optimized procedures: learning rate schedules, variational approach / density estimation / Bayes optimal on-line several classes and prototypes Summary prototype-based learning Vector Quantization and Learning Vector Quantization a model scenario: two clusters, two prototypes dynamics of online training comparison of algorithms: LVQ 2.1.: instability, trivial (stationary) classification LVQ 1 : close to optimal asymptotic generalization LVQ + : min-max solution w.r.t. asymptotic generalization VQ : symmetry breaking, representation
The Dynamics of Learning Vector Quantization, RUG, Perspectives Self-Organizing Maps (SOM) (many) N-dim. prototypes form a (low) d-dimensional grid representation of data in a topology preserving map neighborhood preserving SOM Neural Gas (distance based) Generalized Relevance LVQ [Hammer & Villmann] adaptive metrics, e.g. distance measure training applications