The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 The Dynamics of Learning Vector Quantization Rijksuniversiteit Groningen Mathematics and.

Slides:

Advertisements

Similar presentations

2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.

Advertisements

Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration.

Self Organization: Competitive Learning

3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ)

Chapter 4: Linear Models for Classification

Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.

X0 xn w0 wn o Threshold units SOM.

Decision Making: An Introduction 1. 2 Decision Making Decision Making is a process of choosing among two or more alternative courses of action for the.

Visual Recognition Tutorial

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Minimum Classification Error Networks Based on book chapter 9, by Shigeru Katagiri Jaakko Peltonen, 28 th February, 2002.

Efficient Training in high-dimensional weight space Theoretische Physik und Astrophysik Computational Physics Julius-Maximilians-Universität Würzburg.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Radial Basis Function Networks

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Self Organizing Maps (SOM) Unsupervised Learning.

Self-organizing map Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND Pasi Fränti Clustering Methods: Part.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

Classification of boar sperm head images using Learning Vector Quantization Rijksuniversiteit Groningen/ NL Mathematics and Computing Science

CSE 185 Introduction to Computer Vision Pattern Recognition 2.

Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 24 Nov 2, 2005 Nanjing University of Science & Technology.

Stabil07 03/10/ Michael Biehl Intelligent Systems Group University of Groningen Rainer Breitling, Yang Li Groningen Bioinformatics Centre Analysis.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.

Precise and Approximate Representation of Numbers 1.The Cartesian-Lagrangian representation of numbers. 2.The homotopic representation of numbers 3.Loops.

Dynamical Analysis of LVQ type algorithms, WSOM 2005 Dynamical analysis of LVQ type learning rules Rijksuniversiteit Groningen Mathematics and Computing.

BCS547 Neural Decoding.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Regularization in Matrix Relevance Learning Petra Schneider,

Semiconductors, BP&A Planning, DREAM PLAN IDEA IMPLEMENTATION.

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.

Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.

Research & development Hendrik Schmidt France Telecom NSM/RD/RESA/NET SpasWin07, Limassol, Cyprus 16 April 2007 Comparison.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

CHAPTER 14 Competitive Networks Ming-Feng Yeh.

6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.

Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.

Fuzzy Pattern Recognition. Overview of Pattern Recognition Pattern Recognition Procedure Feature Extraction Feature Reduction Classification (supervised)

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

Classification of FDG-PET* Brain Data

Deep Feedforward Networks

LECTURE 11: Advanced Discriminant Analysis

Classification of unlabeled data:

Prototype-based models in unsupervised and supervised machine learning

Latent Variables, Mixture Models and EM

REMOTE SENSING Multispectral Image Classification

Neural Networks and Their Application in the Fields of Coporate Finance By Eric Séverin Hanna Viinikainen.

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.

Boltzmann Machine (BM) (§6.4)

Learning From Observed Data

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Presentation transcript:

The Dynamics of Learning Vector Quantization, RUG, The Dynamics of Learning Vector Quantization Rijksuniversiteit Groningen Mathematics and Computing Science Michael Biehl, Anarta Ghosh TU Clausthal-Zellerfeld Institute of Computing Science Barbara Hammer

The Dynamics of Learning Vector Quantization, RUG, Vector Quantization (VQ) Learning Vector Quantization (LVQ) Introduction The dynamics of learning a model situation: randomized data learning algorithms for VQ und LVQ analysis and comparison: dynamics, success of learning Summary Outlook prototype-based learning from example data: representation, classification

The Dynamics of Learning Vector Quantization, RUG, Vector Quantization (VQ) aim: representation of large amounts of data by (few) prototype vectors example: identification and grouping in clusters of similar data assignment of feature vector  to the closest prototype w (similarity or distance measure, e.g. Euclidean distance )

The Dynamics of Learning Vector Quantization, RUG, unsupervised competitive learning initialize K prototype vectors present a single example identify the closest prototype, i.e the so-called winner move the winner even closer towards the example intuitively clear, plausible procedure - places prototypes in areas with high density of data - identifies the most relevant combinations of features - (stochastic) on-line gradient descent with respect to the cost function...

The Dynamics of Learning Vector Quantization, RUG, quantization error prototypes data w j is the winner ! here: Euclidean distance aim: faithful representation (in general: ≠ clustering ) Result depends on - the number of prototype vectors - the distance measure / metric used

The Dynamics of Learning Vector Quantization, RUG, Learning Vector Quantization (LVQ) aim: classification of data learning from examples Learning: choice of prototypes according to example data example situtation: 3 classes classification: assignment of a vector  to the class of the closest prototype w, 3 prototypes    aim : generalization ability, i.e. correct classification of novel data after training

The Dynamics of Learning Vector Quantization, RUG, prominent example [Kohonen]: “ LVQ 2.1. ” present a single example initialize prototype vectors (for different classes) identify the closest correct and the closest wrong prototype move the corresponding winner towards / away from the example known convergence / stability problems, e.g. for infrequent classes mostly: heuristically motivated variations of competitive learning

The Dynamics of Learning Vector Quantization, RUG, LVQ algorithms... - are frequently applied in a variety of problems involving the classification of structured data, a few examples: - appear plausible, intuitive, flexible - are fast, easy to implement - real time speech recognition - medical diagnosis, e.g. from histological data - texture recognition and classification - gene expression data analysis -...

The Dynamics of Learning Vector Quantization, RUG, illustration: microscopic images of (pig) semen cells after freezing and storage, c/o Lidia Sanchez-Gonzalez, Leon/Spain

The Dynamics of Learning Vector Quantization, RUG, healthy cells damaged cells prototypes obtained by LVQ (1) illustration: microscopic images of (pig) semen cells after freezing and storage, c/o Lidia Sanchez-Gonzalez, Leon/Spain

The Dynamics of Learning Vector Quantization, RUG, LVQ algorithms... - are often based on purely heuristic arguments, or derived from a cost function with unclear relation to the generalization ability - almost exclusively use the Euclidean distance measure, inappropriate for heterogeneous data - lack, in general, a thorough theoretical understanding of dynamics, convergence properties, performance w.r.t. generalization, etc.

The Dynamics of Learning Vector Quantization, RUG, In the following: analysis of LVQ algorithms w.r.t. - dynamics of the learning process - performance, i.e. generalization ability - asymptotic behavior in the limit of many examples typical behavior in a model situation - randomized, high-dimensional data - essential features of LVQ learning aim: - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications

The Dynamics of Learning Vector Quantization, RUG, model situation : two clusters of N-dimensional data random vectors  ∈ ℝ N according to mixture of two Gaussians: orthonormal center vectors: B +, B - ∈ ℝ N, ( B  ) 2 =1, B + · B - =0 prior weights of classes p +, p - p + + p - = 1 B+B+ B-B- (p+)(p+) (p-)(p-) separation ℓ ℓ independent components:

The Dynamics of Learning Vector Quantization, RUG, high-dimensional data (formally: N  ∞) 400 examples ξ μ ∈ℝ N, N=200, ℓ=1, p + =0.6 μ B y ξ     (240) (160) projections into the plane of center vectors B +, B - μ By ξ   μ 2 2 x ξ w   (240) (160) projections in two independent random directions w 1,2 μ 11 x ξw   model for studying typical behavior of LVQ algorithms, not: density-estimation based classification Note:

The Dynamics of Learning Vector Quantization, RUG, dynamics of on-line training sequence of independent random data acc. to learning rate, step size competition, direction of update etc. change of prototype towards or away from the current data above examples: unsupervised Vector Quantization The Winner Takes It All (classes irrelevant/unknown) Learning Vector Quantization “2.1.” here: two prototypes, no explicit competition update of prototype vectors:

The Dynamics of Learning Vector Quantization, RUG,  recursions mathematical analysis of the learning dynamics projections distances random vector ξ μ enters only in the form of projections in the (B +, B - )-plane length and relative position of prototypes 1. description in terms of a few characteristic quantitities ( here: ℝ 2N  ℝ 7 )

The Dynamics of Learning Vector Quantization, RUG, completely specified in terms of first and second moments (w/o indices μ) in the thermodynamic limit N   random vector acc. to correlated Gaussian random quantities 2. average over the current example  averaged recursions closed in { R sσ, Q st }

The Dynamics of Learning Vector Quantization, RUG, characteristic quantities - depend on the random sequence of example data - their variance vanishes with N   (here: ∝ N -1 ) learning dynamics is completely described in terms of averages 3. self-averaging properties 4. continuous learning time # of examples # of learning steps per degree of freedom recursions  coupled, ordinary differential equations  evolution of projections

The Dynamics of Learning Vector Quantization, RUG, probability for misclassification of a novel example 5. learning curve generalization error ε g (α) after training with α N examples N   - repulsive/attractive fixed points of the dynamics - asymptotic behavior for  - dependence on learning rate, separation, initialization -... investigation and comparison of given algorithms - time-dependent learning rate η(α) - variational optimization w.r.t. f s [...] -... optimization and development of new prescriptions maximize

The Dynamics of Learning Vector Quantization, RUG, optimal classification with minimal generalization error B-B- B+B+ (p - >p + ) (p + ) separation of classes by the plane with in the model situation (equal variances of clusters): excess error minimal ε g as a function of prior weights ℓ=2 εgεg p+p+ ℓ=1 ℓ=0 ℓ

The Dynamics of Learning Vector Quantization, RUG, “LVQ 2.1.“ update the correct and wrong winner (analytical) integration for w s (0) = 0 p  = (1+m  ) / 2 (m>0) [Seo, Obermeyer]: LVQ2.1. ↔ cost function (likelihood ratios) theory and simulation (N=100) p + =0.8, ℓ=1,  =0.5 averages over 100 independent runs

The Dynamics of Learning Vector Quantization, RUG, (p - ) (p + > p - ) strategies: - selection of data in a window close to the current decision boundary slows down the repulsion, system remains instable - Soft Robust Learning Vector Quantization [Seo & Obermayer] density-estimation based cost function limiting case Learning from mistakes: LVQ2.1-step only, if the example is currently misclassified slow learning, poor generalization problem: instability of the algorithm due to repulsion of wrong prototypes trivial classification für α  ∞: ε g = max { p +,p - }

The Dynamics of Learning Vector Quantization, RUG, “ The winner takes it all ” numerical integration for w s (0)=0 theory and simulation (N=200) p + =0.2, ℓ=1.2,  =1.2 averaged over 100 indep. runs Q ++ Q -- Q +- α w+w+ w-w- ℓ B + ℓ B - trajectories in the (B +,B - )-plane ( )  =20,40, optimal decision boundary ____ asymptotic position R S+ R S- R -- R -+ R -- R ++ winner w s 11 I) LVQ 1 [Kohonen] only the winner is updated according to the class membership w-w-

The Dynamics of Learning Vector Quantization, RUG, learning curve  εgεg  =1.2 (p+=0.2, ℓ=1.2) ε g (α  ∞) grows lin. with η - stationary state: - role of the learning rate α εgεg η η0η0 - variable rate η(α) !? - well-defined asymptotics: (ODE linear in η ) 10 εgεg min. ε g (η α)(η α) η0η0 η  0, α  ∞ ( η α )  ∞ suboptimal

The Dynamics of Learning Vector Quantization, RUG, “ The winner takes it all “ II ) LVQ+ ( only positive steps without repulsion) winnercorrect α  ∞ asymptotic configuration symmetric about ℓ (B + +B - )/2 w-w- w+w+ ℓ B + ℓ B - p+=0.2, ℓ=1.2,  =1.2 classification scheme and the achieved generalization error are independent of the prior weights p  (and optimal for p  = 1/2 ) LVQ+ ≈ VQ within the classes (w s updated only from class S)

The Dynamics of Learning Vector Quantization, RUG, LVQ 2.1. trivial assignment to the more frequent class optimal classification εgεg p+p+ min {p +,p - } - LVQ 1 here: close to optimal classification p+p+ - LVQ+ min-max solution p ± -independent classification p+=0.2, ℓ=1.0,  =1.0 εgεg α learning curves LVQ+ LVQ1 asymptotics: η  0, (ηα)  ∞

The Dynamics of Learning Vector Quantization, RUG, Vector Quantization competitive learning w s winner class membership is unknown or identical for all data numerical integration for w s (0)≈0 ( p + =0.2, ℓ=1.0,  =1.2 ) εgεg α VQ LVQ+ LVQ1 α α R ++ R +- R -+ R system is invariant under exchange of the prototypes  weakly repulsive fixed points

The Dynamics of Learning Vector Quantization, RUG, interpretations: - VQ, unsupervised learning unlabelled data - LVQ, two prototypes of the same class, identical labels - LVQ, different classes, but labels are not used in training εgεg p+p+ asymptotics ( ,  0,  ) p + ≈0 p - ≈1 - low quantization error - high gen. error ε g

The Dynamics of Learning Vector Quantization, RUG, work in progress, outlook regularization of LVQ 2.1, Robust Soft LVQ [Seo, Obermayer] model: different cluster variances, more clusters/prototypes optimized procedures: learning rate schedules, variational approach / density estimation / Bayes optimal on-line several classes and prototypes Summary prototype-based learning Vector Quantization and Learning Vector Quantization a model scenario: two clusters, two prototypes dynamics of online training comparison of algorithms: LVQ 2.1.: instability, trivial (stationary) classification LVQ 1 : close to optimal asymptotic generalization LVQ + : min-max solution w.r.t. asymptotic generalization VQ : symmetry breaking, representation

The Dynamics of Learning Vector Quantization, RUG, Perspectives Self-Organizing Maps (SOM) (many) N-dim. prototypes form a (low) d-dimensional grid representation of data in a topology preserving map neighborhood preserving SOM Neural Gas (distance based) Generalized Relevance LVQ [Hammer & Villmann] adaptive metrics, e.g. distance measure training applications