Dynamical Analysis of LVQ type algorithms, WSOM 2005 Dynamical analysis of LVQ type learning rules Rijksuniversiteit Groningen Mathematics and Computing.

Slides:



Advertisements
Similar presentations
2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.
Advertisements

Linear Classifiers (perceptrons)
Lecture 3 Nonparametric density estimation and classification
Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration.
Self Organization: Competitive Learning
3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ)
DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.
Self-Organizing Map (SOM). Unsupervised neural networks, equivalent to clustering. Two layers – input and output – The input layer represents the input.
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Unsupervised recurrent networks Barbara Hammer, Institute of Informatics, Clausthal University of Technology.
X0 xn w0 wn o Threshold units SOM.
Aims: - evaluate typical properties in controlled model situations - gain general insights into machine learning problems - compare algorithms in controlled.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Dimension Reduction of Combustion Chemistry using Pre-Image Curves Zhuyin (laniu) Ren October 18 th, 2004.
Efficient Training in high-dimensional weight space Theoretische Physik und Astrophysik Computational Physics Julius-Maximilians-Universität Würzburg.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
1 Study of Topographic and Equiprobable Mapping with Clustering for Fault Classification Ashish Babbar EE645 Final Project.
Visual Recognition Tutorial
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
November 24, 2009Introduction to Cognitive Science Lecture 21: Self-Organizing Maps 1 Self-Organizing Maps (Kohonen Maps) In the BPN, we used supervised.
Neural Networks Lecture 17: Self-Organizing Maps
Radial Basis Function Networks
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:
The Dynamics of Learning Vector Quantization, RUG, The Dynamics of Learning Vector Quantization Rijksuniversiteit Groningen Mathematics and.
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques.
Self Organizing Maps (SOM) Unsupervised Learning.
DTU Medical Visionday May 27, 2009 Generative models for automated brain MRI segmentation Koen Van Leemput Athinoula A. Martinos Center for Biomedical.
Artificial Neural Networks Dr. Abdul Basit Siddiqui Assistant Professor FURC.
Chapter 9 UNSUPERVISED LEARNING: Clustering Part 2 Cios / Pedrycz / Swiniarski / Kurgan.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Classification of boar sperm head images using Learning Vector Quantization Rijksuniversiteit Groningen/ NL Mathematics and Computing Science
Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.
CSE 185 Introduction to Computer Vision Pattern Recognition 2.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 24 Nov 2, 2005 Nanjing University of Science & Technology.
Stabil07 03/10/ Michael Biehl Intelligent Systems Group University of Groningen Rainer Breitling, Yang Li Groningen Bioinformatics Centre Analysis.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Lecture 2: Statistical learning primer for biologists
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Regularization in Matrix Relevance Learning Petra Schneider,
Semiconductors, BP&A Planning, DREAM PLAN IDEA IMPLEMENTATION.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Research & development Hendrik Schmidt France Telecom NSM/RD/RESA/NET SpasWin07, Limassol, Cyprus 16 April 2007 Comparison.
Classification Ensemble Methods 1
Self-Organizing Maps (SOM) (§ 5.5)
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
CHAPTER 14 Competitive Networks Ming-Feng Yeh.
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Iterative K-Means Algorithm Based on Fisher Discriminant UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Mantao Xu to be presented.
CSC2535: Lecture 4: Autoencoders, Free energy, and Minimum Description Length Geoffrey Hinton.
Classification of FDG-PET* Brain Data
LECTURE 11: Advanced Discriminant Analysis
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Prototype-based models in unsupervised and supervised machine learning
Haim Kaplan and Uri Zwick
Synaptic Dynamics: Unsupervised Learning
Overview of Supervised Learning
CS 4/527: Artificial Intelligence
Hidden Markov Models Part 2: Algorithms
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Competitive Networks.
Competitive Networks.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Dynamical Analysis of LVQ type algorithms, WSOM 2005 Dynamical analysis of LVQ type learning rules Rijksuniversiteit Groningen Mathematics and Computing Science Michael Biehl, Anarta Ghosh Clausthal University of Technology Institute of Computing Science Barbara Hammer

Dynamical Analysis of LVQ type algorithms, WSOM 2005 identify the closest prototype, i.e the so-called winner initialize prototype vectors for different classes present a single example move the winner - closer towards the data (same class) - away from the data (different class) classification: assignment of a vector  to the class of the closest prototype w    aim: generalization ability classification of novel data after learning from examples Learning Vector Quantization (LVQ) - identification of prototype vectors from labelled example data - parameterization of distance based classification schemes example: basic LVQ scheme [Kohonen]: “LVQ 1” often: heuristically motivated variations of competitive learning 

Dynamical Analysis of LVQ type algorithms, WSOM 2005 LVQ algorithms... - frequently applied in a variety of practical problems - plausible, intuitive, flexible - fast, easy to implement - often based on heuristic arguments or cost functions with unclear relation to generalization - limited theoretical understanding of - dynamics and convergence properties - achievable generalization ability here: analysis of LVQ algorithms w.r.t. - dynamics of the learning process - performance, i.e. generalization ability - typical properties in a model situation

Dynamical Analysis of LVQ type algorithms, WSOM 2005 Model situation : two clusters of N-dimensional data random vectors  ∈ ℝ N according to mixture of two Gaussians: orthonormal center vectors: B +, B - ∈ ℝ N, ( B  ) 2 =1, B + · B - =0 prior weights of classes p +, p - p + + p - = 1 B+B+ B-B- (p+)(p+) (p-)(p-) separation ∝ ℓ ℓ independent components: with variance: ℝNℝN

Dynamical Analysis of LVQ type algorithms, WSOM 2005 Dynamics of on-line training sequence of new, independent random examples drawn according to learning rate, step size competition, direction of update etc. change of prototype towards or away from the current data example: LVQ1, original formulation [Kohonen] Winner-Takes-All (WTA) algorithm update of two prototype vectors w +, w - :

Dynamical Analysis of LVQ type algorithms, WSOM 2005  recursions Mathematical analysis of the learning dynamics random vector ξ μ enters only through its length and the projections projections into the (B +, B - )-plane length and relative position of prototypes 1. description in terms of a few characteristic quantitities ( here: ℝ 2N  ℝ 7 )

Dynamical Analysis of LVQ type algorithms, WSOM 2005 completely specified in terms of first and second moments in the thermodynamic limit N   correlated Gaussian random quantities 2. average over the current example  averaged recursions closed in random vector according to : avg. length characteristic quantities - depend on the random sequence of example data - their variance vanishes with N   (here: ∝ N -1 ) learning dynamics is completely described in terms of averages 3. self-averaging property

Dynamical Analysis of LVQ type algorithms, WSOM continuous learning time # of examples # of learning steps per degree of freedom integration yields evolution of projections stochastic recursions  deterministic ODE probability for misclassification of a novel example 5. learning curve  generalization error ε g (α) after training with α N examples

Dynamical Analysis of LVQ type algorithms, WSOM 2005 LVQ1: The winner takes it all initialization w s (0)≈0 theory and simulation (N=100) p + =0.8, v + =4, v + =9, ℓ=2.0,  =1.0 averaged over 100 indep. runs Q ++ Q -- Q +- α RSσRSσ winner w s 11 only the winner is updated according to the class label w-w- w+w+ ℓ B - ℓ B + R S- w+w+ R S+ Trajectories in the (B +,B - )-plane ( )  =20,40, optimal decision boundary ____ asymptotic position

Dynamical Analysis of LVQ type algorithms, WSOM 2005 Learning curve  η= suboptimal, non-monotonic behavior for small η ε g (α  ∞) grows linearly with η - stationary state: η  0, α  ∞, (η α )  ∞ - well-defined asymptotics: η η  εgεg p + = 0.2, ℓ=1.0 v + = v - = 1.0 achievable generalization error: εgεg εgεg p+p+ p+p+ v + = v - =1.0 v + =0.25 v - = best linear boundary ― LVQ1

Dynamical Analysis of LVQ type algorithms, WSOM 2005 “LVQ 2.1“ [Kohonen] here: update correct and wrong winner theory and simulation (N=100) p + =0.8, ℓ=1, v + =v - =1,  =0.5 averages over 100 independent runs problem: instability of the algorithm due to repulsion of wrong prototypes trivial classification für α  ∞: ε g = min { p +,p - } R S+ R S-

Dynamical Analysis of LVQ type algorithms, WSOM 2005 suggested strategy: selection of data in a window close to the current decision boundary slows down the repulsion, system remains instable Early stopping: end training process at minimal ε g (idealized) εgεg  η= 2.0, 1.0, 0.5 η - pronounced minimum in ε g (α) depends on initialization and cluster geometry - lowest minimum assumed for η  0 v + =0.25 v - =0.81 εgεg p+p+ ― LVQ1 __ early stopping

Dynamical Analysis of LVQ type algorithms, WSOM 2005 “Learning From Mistakes (LFM)” LVQ2.1 update only if the current classification is wrong crisp limit of Soft Robust LVQ [Seo and Obermayer, 2003] projected trajetory: ℓ B - ℓ B + R S+ R S- εgεg  p + =0.8, ℓ=3.0 v + =4.0, v - =9.0 η= 2.0, 1.0, 0.5 Learning curves: η-independent asymptotic ε g p+=0.8, ℓ= 1.2, v+=v=1.0

Dynamical Analysis of LVQ type algorithms, WSOM 2005 εgεg p+p+ equal cluster variances p+p+ unequal variances..... best linear boundary ― LVQ1 --- LVQ2.1 (early stopping) ·-· LFM Comparison: achievable generalization ability v + =0.25 v - =0.81 v + =v - =1.0

Dynamical Analysis of LVQ type algorithms, WSOM 2005 work in progress, outlook multi-class, multi-prototype problems optimized procedures: learning rate schedules variational approach / Bayes optimal on-line Summary prototype-based learning Vector Quantization and Learning Vector Quantization a model scenario: two clusters, two prototypes dynamics of online training comparison of algorithms: LVQ 1 : close to optimal asymptotic generalization LVQ 2.1. : instability, trivial (stationary) classification + stopping : potentially very good performance LFM : far from optimal generalization behavior

Dynamical Analysis of LVQ type algorithms, WSOM 2005 Perspectives Self-Organizing Maps (SOM) (many) N-dim. prototypes form a (low) d-dimensional grid representation of data in a topology preserving map neighborhood preserving SOM Neural Gas (distance based) Generalized Relevance LVQ [e.g. Hammer & Villmann] adaptive metrics, e.g. distance measure training applications

Dynamical Analysis of LVQ type algorithms, WSOM 2005 Outlook:

Dynamical Analysis of LVQ type algorithms, WSOM 2005

completely specified in terms of first and second moments (w/o indices μ): in the thermodynamic limit N   correlated Gaussian random quantities 2. average over the current example  averaged recursions closed in random vector according to : avg. length

Dynamical Analysis of LVQ type algorithms, WSOM 2005 N   - repulsive/attractive fixed points of the dynamics - asymptotic behavior for  - dependence on learning rate, separation, initialization -... investigation and comparison of given algorithms - time-dependent learning rate η(α) - variational optimization w.r.t. f s [...] -... optimization and development of new prescriptions maximize

Dynamical Analysis of LVQ type algorithms, WSOM 2005 LVQ1: The winner takes it all initialization w s (0)=0 theory and simulation (N=100) p + =0.8, v + =4, p + =9, ℓ=2.0,  =1.0 averaged over 100 indep. runs Q ++ Q -- Q +- α RSσRSσ winner w s 11 only the winner is updated according to the class label self-averaging property (mean and variances) 1/N R ++ (α=10)

Dynamical Analysis of LVQ type algorithms, WSOM 2005 high-dimensional data (formally: N  ∞) ξ μ ∈ℝ N, N=200, ℓ=1, p + =0.4, v + =0.44, v - =0.44 μ By ξ   (● 240) (○ 160) projections into the plane of center vectors B +, B - μ By ξ   μ 2 2 x ξ w   projections on two independent random directions w 1,2 μ 11 x ξw  