Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 The Dynamics of Learning Vector Quantization Rijksuniversiteit Groningen Mathematics and.

Similar presentations


Presentation on theme: "The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 The Dynamics of Learning Vector Quantization Rijksuniversiteit Groningen Mathematics and."— Presentation transcript:

1 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 The Dynamics of Learning Vector Quantization Rijksuniversiteit Groningen Mathematics and Computing Science Michael Biehl, Anarta Ghosh TU Clausthal-Zellerfeld Institute of Computing Science Barbara Hammer

2 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 Vector Quantization (VQ) Learning Vector Quantization (LVQ) Introduction The dynamics of learning a model situation: randomized data learning algorithms for VQ und LVQ analysis and comparison: dynamics, success of learning Summary Outlook prototype-based learning from example data: representation, classification

3 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 Vector Quantization (VQ) aim: representation of large amounts of data by (few) prototype vectors example: identification and grouping in clusters of similar data assignment of feature vector  to the closest prototype w (similarity or distance measure, e.g. Euclidean distance )

4 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 unsupervised competitive learning initialize K prototype vectors present a single example identify the closest prototype, i.e the so-called winner move the winner even closer towards the example intuitively clear, plausible procedure - places prototypes in areas with high density of data - identifies the most relevant combinations of features - (stochastic) on-line gradient descent with respect to the cost function...

5 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 quantization error prototypes data w j is the winner ! here: Euclidean distance aim: faithful representation (in general: ≠ clustering ) Result depends on - the number of prototype vectors - the distance measure / metric used

6 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 Learning Vector Quantization (LVQ) aim: classification of data learning from examples Learning: choice of prototypes according to example data example situtation: 3 classes classification: assignment of a vector  to the class of the closest prototype w, 3 prototypes    aim : generalization ability, i.e. correct classification of novel data after training

7 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 prominent example [Kohonen]: “ LVQ 2.1. ” present a single example initialize prototype vectors (for different classes) identify the closest correct and the closest wrong prototype move the corresponding winner towards / away from the example known convergence / stability problems, e.g. for infrequent classes mostly: heuristically motivated variations of competitive learning

8 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 LVQ algorithms... - are frequently applied in a variety of problems involving the classification of structured data, a few examples: - appear plausible, intuitive, flexible - are fast, easy to implement - real time speech recognition - medical diagnosis, e.g. from histological data - texture recognition and classification - gene expression data analysis -...

9 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 illustration: microscopic images of (pig) semen cells after freezing and storage, c/o Lidia Sanchez-Gonzalez, Leon/Spain

10 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 healthy cells damaged cells prototypes obtained by LVQ (1) illustration: microscopic images of (pig) semen cells after freezing and storage, c/o Lidia Sanchez-Gonzalez, Leon/Spain

11 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 LVQ algorithms... - are often based on purely heuristic arguments, or derived from a cost function with unclear relation to the generalization ability - almost exclusively use the Euclidean distance measure, inappropriate for heterogeneous data - lack, in general, a thorough theoretical understanding of dynamics, convergence properties, performance w.r.t. generalization, etc.

12 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 In the following: analysis of LVQ algorithms w.r.t. - dynamics of the learning process - performance, i.e. generalization ability - asymptotic behavior in the limit of many examples typical behavior in a model situation - randomized, high-dimensional data - essential features of LVQ learning aim: - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications

13 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 model situation : two clusters of N-dimensional data random vectors  ∈ ℝ N according to mixture of two Gaussians: orthonormal center vectors: B +, B - ∈ ℝ N, ( B  ) 2 =1, B + · B - =0 prior weights of classes p +, p - p + + p - = 1 B+B+ B-B- (p+)(p+) (p-)(p-) separation ℓ ℓ independent components:

14 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 high-dimensional data (formally: N  ∞) 400 examples ξ μ ∈ℝ N, N=200, ℓ=1, p + =0.6 μ B y ξ     (240) (160) projections into the plane of center vectors B +, B - μ By ξ   μ 2 2 x ξ w   (240) (160) projections in two independent random directions w 1,2 μ 11 x ξw   model for studying typical behavior of LVQ algorithms, not: density-estimation based classification Note:

15 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 dynamics of on-line training sequence of independent random data acc. to learning rate, step size competition, direction of update etc. change of prototype towards or away from the current data above examples: unsupervised Vector Quantization The Winner Takes It All (classes irrelevant/unknown) Learning Vector Quantization “2.1.” here: two prototypes, no explicit competition update of prototype vectors:

16 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005  recursions mathematical analysis of the learning dynamics projections distances random vector ξ μ enters only in the form of projections in the (B +, B - )-plane length and relative position of prototypes 1. description in terms of a few characteristic quantitities ( here: ℝ 2N  ℝ 7 )

17 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 completely specified in terms of first and second moments (w/o indices μ) in the thermodynamic limit N   random vector acc. to correlated Gaussian random quantities 2. average over the current example  averaged recursions closed in { R sσ, Q st }

18 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 characteristic quantities - depend on the random sequence of example data - their variance vanishes with N   (here: ∝ N -1 ) learning dynamics is completely described in terms of averages 3. self-averaging properties 4. continuous learning time # of examples # of learning steps per degree of freedom recursions  coupled, ordinary differential equations  evolution of projections

19 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 probability for misclassification of a novel example 5. learning curve generalization error ε g (α) after training with α N examples N   - repulsive/attractive fixed points of the dynamics - asymptotic behavior for  - dependence on learning rate, separation, initialization -... investigation and comparison of given algorithms - time-dependent learning rate η(α) - variational optimization w.r.t. f s [...] -... optimization and development of new prescriptions maximize

20 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 optimal classification with minimal generalization error B-B- B+B+ (p - >p + ) (p + ) separation of classes by the plane with in the model situation (equal variances of clusters): excess error minimal ε g as a function of prior weights ℓ=2 εgεg 0.25 0.50 0 0.5 1.0 0 p+p+ ℓ=1 ℓ=0 ℓ

21 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 “LVQ 2.1.“ update the correct and wrong winner (analytical) integration for w s (0) = 0 p  = (1+m  ) / 2 (m>0) [Seo, Obermeyer]: LVQ2.1. ↔ cost function (likelihood ratios) theory and simulation (N=100) p + =0.8, ℓ=1,  =0.5 averages over 100 independent runs

22 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 (p - ) (p + > p - ) strategies: - selection of data in a window close to the current decision boundary slows down the repulsion, system remains instable - Soft Robust Learning Vector Quantization [Seo & Obermayer] density-estimation based cost function limiting case Learning from mistakes: LVQ2.1-step only, if the example is currently misclassified slow learning, poor generalization problem: instability of the algorithm due to repulsion of wrong prototypes trivial classification für α  ∞: ε g = max { p +,p - }

23 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 “ The winner takes it all ” numerical integration for w s (0)=0 theory and simulation (N=200) p + =0.2, ℓ=1.2,  =1.2 averaged over 100 indep. runs Q ++ Q -- Q +- α w+w+ w-w- ℓ B + ℓ B - trajectories in the (B +,B - )-plane ( )  =20,40,....140....... optimal decision boundary ____ asymptotic position R S+ R S- R -- R -+ R -- R ++ winner w s 11 I) LVQ 1 [Kohonen] only the winner is updated according to the class membership w-w-

24 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 learning curve  εgεg  =1.2 (p+=0.2, ℓ=1.2) ε g (α  ∞) grows lin. with η - stationary state: - role of the learning rate α 100 200 300 εgεg 0.26 0.22 0.18 0.14 0 η 2.0 0.4 0.2 η0η0 - variable rate η(α) !? - well-defined asymptotics: (ODE linear in η ) 10 εgεg 20 30 40 50 0 0.14 0.26 0.22 0.18 min. ε g (η α)(η α) η0η0 η  0, α  ∞ ( η α )  ∞ suboptimal

25 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 “ The winner takes it all “ II ) LVQ+ ( only positive steps without repulsion) winnercorrect α  ∞ asymptotic configuration symmetric about ℓ (B + +B - )/2 w-w- w+w+ ℓ B + ℓ B - p+=0.2, ℓ=1.2,  =1.2 classification scheme and the achieved generalization error are independent of the prior weights p  (and optimal for p  = 1/2 ) LVQ+ ≈ VQ within the classes (w s updated only from class S)

26 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 - LVQ 2.1. trivial assignment to the more frequent class optimal classification εgεg p+p+ min {p +,p - } - LVQ 1 here: close to optimal classification p+p+ - LVQ+ min-max solution p ± -independent classification p+=0.2, ℓ=1.0,  =1.0 εgεg α learning curves LVQ+ LVQ1 asymptotics: η  0, (ηα)  ∞

27 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 Vector Quantization competitive learning w s winner class membership is unknown or identical for all data numerical integration for w s (0)≈0 ( p + =0.2, ℓ=1.0,  =1.2 ) εgεg α VQ LVQ+ LVQ1 α α R ++ R +- R -+ R -- 100 200 3000 0 1.0 system is invariant under exchange of the prototypes  weakly repulsive fixed points

28 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 interpretations: - VQ, unsupervised learning unlabelled data - LVQ, two prototypes of the same class, identical labels - LVQ, different classes, but labels are not used in training εgεg p+p+ asymptotics ( ,  0,  ) p + ≈0 p - ≈1 - low quantization error - high gen. error ε g

29 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 work in progress, outlook regularization of LVQ 2.1, Robust Soft LVQ [Seo, Obermayer] model: different cluster variances, more clusters/prototypes optimized procedures: learning rate schedules, variational approach / density estimation / Bayes optimal on-line several classes and prototypes Summary prototype-based learning Vector Quantization and Learning Vector Quantization a model scenario: two clusters, two prototypes dynamics of online training comparison of algorithms: LVQ 2.1.: instability, trivial (stationary) classification LVQ 1 : close to optimal asymptotic generalization LVQ + : min-max solution w.r.t. asymptotic generalization VQ : symmetry breaking, representation

30 The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 Perspectives Self-Organizing Maps (SOM) (many) N-dim. prototypes form a (low) d-dimensional grid representation of data in a topology preserving map neighborhood preserving SOM Neural Gas (distance based) Generalized Relevance LVQ [Hammer & Villmann] adaptive metrics, e.g. distance measure training applications


Download ppt "The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 The Dynamics of Learning Vector Quantization Rijksuniversiteit Groningen Mathematics and."

Similar presentations


Ads by Google