3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ)

3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ)
References M. Biehl, A. Freking, G. Reents Dynamics of on-line competitive learning Europhysics Letters 38 (1997) 73-78 M. Biehl, A. Ghosh, B. Hammer Dynamics and generalization ability of LVQ algorithms J. Machine Learning Research 8 (2007) and references in the latter

Vector Quantization (VQ)
aim: representation of large amounts of data by (few) prototype vectors example: identification and grouping in clusters of similar data assignment of feature vector  to the closest prototype w (similarity or distance measure, e.g. Euclidean distance )

unsupervised competitive learning
• initialize K prototype vectors • present a single example • identify the closest prototype, i.e the so-called winner • move the winner even closer towards the example intuitively clear, plausible procedure - places prototypes in areas with high density of data - identifies the most relevant combinations of features - (stochastic) on-line gradient descent with respect to the cost function ...

quantization error here: Euclidean distance wj is the winner !
prototypes data aim: faithful representation (in general: ≠ clustering ) Result depends on the number of prototype vectors - the distance measure / metric used

   Learning Vector Quantization
∙ identification of prototype vectors from labelled example data ∙ distance based classification (e.g. Euclidean, Manhattan, …) basic, heuristic LVQ scheme: LVQ1 [Kohonen] classification: assignment of a vector  to the class of the closest prototype w N-dim.feature space • initialize prototype vectors for different classes   • present a single example - away from the data (different class) • identify the closest prototype, i.e the so-called winner piecewise linear decision boundaries  • move the winner - closer towards the data (same class) aim: generalization ability classification of novel data after learning from examples

LVQ algorithms ... plausible, intuitive, flexible
- fast, easy to implement frequently applied in a variety of practical problems often based on heuristic arguments or cost functions with unclear relation to generalization limited theoretical understanding of - dynamics and convergence properties - achievable generalization ability here: analysis of LVQ algorithms w.r.t. - dynamics of the learning process - performance, i.e. generalization ability - typical properties in a model situation

Model situation: two clusters of N-dimensional data
random vectors  ∈ ℝN according to mixture of two Gaussians: orthonormal center vectors: B+, B- ∈ ℝN, ( B )2 =1, B+· B- =0 prior weights of classes p+, p- p+ + p- = 1 B+ B- (p+) (p-) cluster distance ∝ ℓ ℓ ℝN indep. components with and variance:

x ξ w × = B y ξ × = B y ξ × = high-dimensional data (formally: N∞)
ξμ ∈ℝN , N=200, ℓ=1, p+=0.4, v+=1.44, v-=0.64 (● 240) (○ 160) projections into the plane of center vectors B+, B- μ 2 x ξ w × = projections on two independent random directions w1,2 1 μ B y ξ × = - μ B y ξ × = +

Dynamics of on-line training
sequence of new, independent random examples drawn according to update of two prototype vectors w+, w- : learning rate, step size competition, direction of update etc. change of prototype towards or away from the current data example: LVQ1, original formulation [Kohonen] Winner-Takes-All (WTA) algorithm

Mathematical analysis of the learning dynamics
projections into the (B+, B- )-plane length and relative position of prototypes 1. description in terms of a few characteristic quantitities ( here: ℝ2N  ℝ7 ) algorithm  recursions 2. average over the current example random vector according to : avg. length in the thermodynamic limit N   correlated Gaussian random quantities completely specified in terms of first and second moments (w/o indices μ):

 averaged recursions closed in
- depend on the random sequence of example data - their fluctuations vanish with N   learning dynamics is completely described in terms of averages 3. self-averaging property of characteristic quantities 1/N (mean and variance) R++ (α=10) computer simulations (LVQ1) mean results approach theoretical prediction - variance vanishes as N  

4. continuous learning time
# of examples # of learning steps per degree of freedom integration yields evolution of projections stochastic recursions  deterministic ODE probability for misclassification of a novel example 5. learning curve  generalization error εg(α) after training with α N examples

LVQ1: The winner takes it all
only the winner is updated according to the class label winner ws 1 initialization ws(0)=0 theory and simulation (N=100) p+=0.8, v+=4, p+=9, ℓ=2.0, =1.0 averaged over 100 indep. runs Q++ Q-- Q+- α RSσ self-averaging property (mean and variances) 1/N R++ (α=10)

LVQ1: The winner takes it all
only the winner is updated according to the class label winner ws 1 initialization ws(0)≈0 theory and simulation (N=100) p+=0.8, v+=4, v+=9, ℓ=2.0, =1.0 averaged over 100 indep. runs Q++ Q-- Q+- α RSσ w+ w- w+ ℓ B- ℓ B+ RS- RS+ Trajectories in the (B+,B- )-plane (•)  =20,40, optimal decision boundary ____ asymptotic position

εg εg εg εg (α∞) grows linearly with η η 0, α∞, (η α )  ∞  η 
Learning curve  η= 2.0 1.0 0.2 η  suboptimal, non-monotonic behavior for small η εg p+ = 0.2, ℓ=1.0 v+ = v- = 1.0 - stationary state: εg (α∞) grows linearly with η - well-defined asymptotics: η 0, α∞, (η α )  ∞ achievable generalization error: εg εg v+ = v- =1.0 v+ =0.25 v-=0.81 .... best linear boundary ― LVQ1 p+ p+

LVQ 2.1 [Kohonen] here: update correct and wrong winner
theory and simulation (N=100) p+=0.8, ℓ=1, v+=v-=1, =0.5 averages over 100 independent runs RS- problem: instability of the algorithm due to repulsion of wrong prototypes trivial classification for α∞: εg = min { p+,p- } RS+

εg εg  η suggested strategy: selection of data in a window close to
the current decision boundary slows down the repulsion, system remains instable Early stopping: end training process at minimal εg (idealized) pronounced minimum in εg (α) depends on initialization and cluster geometry here: lowest minimum value reached for η0 η εg εg  η= 2.0, 1.0, 0.5 v+ =0.25 v-=0.81 ― LVQ1 __ early stopping p+

Learning From Mistakes (LFM)
LVQ2.1 update only if the current classification is wrong crisp limit version of Soft Robust LVQ [Seo and Obermayer, 2003] εg  p+=0.8, ℓ=3.0 v+=4.0, v-=9.0 η= 2.0, 1.0, 0.5 Learning curves: η-independent asymptotic εg projected trajetory: RS- ℓ B- ℓ B+ RS+ p+=0.8, ℓ= 1.2, v+=v-=1.0

Comparison: achievable generalization ability
equal cluster variances v+=v-=1.0 unequal variances v+=0.25 v-=0.81 εg p+ p+ best linear boundary ― LVQ1 --- LVQ2.1 (early stopping) ·-· LFM ― trivial classification

Vector Quantization εg α competitive learning ws winner
class membership is unknown or identical for all data numerical integration for ws(0)≈0 ( p+=0.2, ℓ=1.0, =1.2 ) εg α VQ LVQ+ LVQ1 α R++ R+- R-+ R-- 100 200 300 1.0 system is invariant under exchange of the prototypes  weakly repulsive fixed points

εg interpretations: VQ, unsupervised learning unlabelled data
LVQ, two prototypes of the same class, identical labels LVQ, different classes, but labels are not used in training εg p+ asymptotics (,0, ) p+≈0 p-≈1 - low quantization error - high gen. error εg

Summary a model scenario of LVQ training two clusters, two prototypes
dynamics of online training comparison of algorithms (within the model): LVQ : original formulation of LVQ with close to optimal asymptotic generalization LVQ : intuitive extension creates instability trivial (stationary) classification ...+ stopping : potentially good performance practical difficulties, depends on initialization LFM : crisp limit of Soft Robust LVQ, stable behavior far from optimal generalization VQ : description of in-class competition

Outlook multi-class, multi-prototype problems
optimized procedures: learning rate schedules variational approach / Bayes optimal on-line Generalized Relevance LVQ [e.g. Hammer & Villmann] adaptive metrics, e.g. distance measure training Self-Organizing Maps (SOM) neighborhood preserving SOM Neural Gas (distance rank based) applications

3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ)

Similar presentations

Presentation on theme: "3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ)

Similar presentations

Presentation on theme: "3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ)"— Presentation transcript:

Similar presentations

About project

Feedback