Almost Random Projection Machine with Margin Maximization and Kernel Features Tomasz Maszczyk and Włodzisław Duch Department of Informatics, Nicolaus Copernicus.

Slides:



Advertisements
Similar presentations
Visualization of the hidden node activities or hidden secrets of neural networks. Włodzisław Duch Department of Informatics Nicolaus Copernicus University,
Advertisements

Introduction to Support Vector Machines (SVM)
Artificial Neural Networks (1)
Universal Learning Machines (ULM) Włodzisław Duch and Tomasz Maszczyk Department of Informatics, Nicolaus Copernicus University, Toruń, Poland ICONIP 2009,
Support Vector Machines
Supervised Learning Recap
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Ch. 4: Radial Basis Functions Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from many Internet sources Longin.
Support Vector Machines
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Lecture 14 – Neural Networks
Heterogeneous adaptive systems Włodzisław Duch & Krzysztof Grąbczewski Department of Informatics, Nicholas Copernicus University, Torun, Poland.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
K-separability Włodzisław Duch Department of Informatics Nicolaus Copernicus University, Torun, Poland School of Computer Engineering, Nanyang Technological.
Coloring black boxes: visualization of neural network decisions Włodzisław Duch School of Computer Engineering, Nanyang Technological University, Singapore,
Support Vector Neural Training Włodzisław Duch Department of Informatics Nicolaus Copernicus University, Toruń, Poland School of Computer Engineering,
Transfer functions: hidden possibilities for better neural networks. Włodzisław Duch and Norbert Jankowski Department of Computer Methods, Nicholas Copernicus.
Connectionist Modeling Some material taken from cspeech.ucd.ie/~connectionism and Rich & Knight, 1991.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Competent Undemocratic Committees Włodzisław Duch, Łukasz Itert and Karol Grudziński Department of Informatics, Nicholas Copernicus University, Torun,
Support Feature Machine for DNA microarray data Tomasz Maszczyk and Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland.
Support Feature Machines: Support Vectors are not enough Tomasz Maszczyk and Włodzisław Duch Department of Informatics, Nicolaus Copernicus University,
Aula 4 Radial Basis Function Networks
Radial Basis Function (RBF) Networks
Radial-Basis Function Networks
Radial Basis Function Networks
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Cascade Correlation Architecture and Learning Algorithm for Neural Networks.
Matlab Matlab Sigmoid Sigmoid Perceptron Perceptron Linear Linear Training Training Small, Round Blue-Cell Tumor Classification Example Small, Round Blue-Cell.
NEURAL NETWORKS FOR DATA MINING
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Radial Basis Function Networks:
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
An Introduction to Support Vector Machines (M. Law)
Multi-Layer Perceptron
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Towards CI Foundations Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland Google: W. Duch WCCI’08 Panel Discussion.
Non-Bayes classifiers. Linear discriminants, neural networks.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall Perceptron Rule and Convergence Proof Capacity.
Computational Intelligence: Methods and Applications Lecture 29 Approximation theory, RBF and SFN networks Włodzisław Duch Dept. of Informatics, UMK Google:
CHAPTER 14 Competitive Networks Ming-Feng Yeh.
Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Machine Learning Supervised Learning Classification and Regression
Neural networks and support vector machines
CSSE463: Image Recognition Day 14
CS 9633 Machine Learning Support Vector Machines
Support Feature Machine for DNA microarray data
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Tomasz Maszczyk and Włodzisław Duch Department of Informatics,
Projection of network outputs
Neuro-Computing Lecture 4 Radial Basis Function Network
CSSE463: Image Recognition Day 14
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Support Vector Neural Training
Heterogeneous adaptive systems
Support Vector Machines 2
Machine Learning for Cyber
Presentation transcript:

Almost Random Projection Machine with Margin Maximization and Kernel Features Tomasz Maszczyk and Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland ICANN 2010

PlanPlan Main ideaMain idea InspirationsInspirations Description of our approachDescription of our approach Influence of margin maximizationInfluence of margin maximization ResultsResults ConclusionsConclusions

Main idea backpropagation algorithm for training MLP – non-linear systems, non-separable problems, slowlybackpropagation algorithm for training MLP – non-linear systems, non-separable problems, slowly MLP are simpler than biological NNMLP are simpler than biological NN backpropagation is hard to justify from neurobiological perspectivebackpropagation is hard to justify from neurobiological perspective algorithms capable of solving problems of similar complexity – important chalange, needed to create ambitious ML applicationsalgorithms capable of solving problems of similar complexity – important chalange, needed to create ambitious ML applications

InspirationsInspirations neurons in association cortex form strongly connected microcircuits found in cortical minicolumns, resonate in diffrent frequenciesneurons in association cortex form strongly connected microcircuits found in cortical minicolumns, resonate in diffrent frequencies perceptron observing these minicolumns learn to react on specific signals around particular frequencyperceptron observing these minicolumns learn to react on specific signals around particular frequency resonators do not get excited when overall activity is high, but rather when specific levels of activity are reachedresonators do not get excited when overall activity is high, but rather when specific levels of activity are reached

Main idea Following these inspirations - aRPM, single hidden layer constructive network based on random projections added only if useful; easily solve highly-non-separable problems Kernel-based features; extends the hypothesis space – Cover theorem

Main idea Linear kernels define new features t(X;W)=K L (X,W)=X·W based on a projection on the W direction. Gaussian kernels g(X,W)=exp(-||X-W||/(2σ 2 )) evaluate similarity between two vectors using weighted radial distance function. The final discriminant function is constructed as a linear combination of such kernels. Focus on margin maximization in aRPM; new projections added only if increase correct classification probability of those examples that are wrong classified or are close to the decision border.

aRPMaRPM Some projections may not be very useful, but the distribution of the training data along direction may have a range of values that includes a pure cluster of projected patterns. This creates binary features b i (X) = t(X;W i,[t a,t b ]) {0,1}, based on linear restricted projection in the direction W. Good candidate feature should cover some minimal number η of training vectors.

aRPMaRPM Features based on kernels - here only Gaussian kernels with several values of dispersion σ g(X;X i,σ)=exp(-||X i -X|| 2 /2σ 2 ). Local kernel features have values close to zero except around their support vectors X i. Therefore their usefulness is limited to the neighborhood O(X i ) in which g i (X)>.

aRPMaRPM To create multi-resolution kernel features – first with large σ (smooth decision borders), then smaller σ (features more localized). The candidate feature is converted into a permanent node only if it increases classification margin. Incremental algorithm, expand feature space until no improvements; move vectors away from decision border.

aRPMaRPM WTA - sum of the activation of hidden nodes. Projections with added intervals give binary activations b i (X), but the values of kernel features g(X;X i,σ) are summed, giving a total activation A(C|X) for each class. Plotting A(C|X) versus A(¬C|X) for each vector leads to scatterograms, giving an idea how far is a given vector from the decision border.

no margin maximization with margin maximization

aRPMaRPM In the WTA |A(C|X)-A(¬C|X)| estimates distance from the decision border. Specifying confidence of the model for vector XC using logistic function: F(X)=1/(1+exp(-(A(C|X)-A(¬C|X)))) gives values around 1 if X is on the correct side and far from the border, and goes to zero if it is on the wrong side. Total confidence in the model may then be estimated by summing over all vectors.

aRPMaRPM The final effect of adding new feature h(X) to the total confidence measure is therefore: U(H,h)=Σ (F(X;H+h) - F(X;H)) If U(H,h)>α than the new feature is accepted providing a larger margin. To make final decision aRPM with margin maximization uses WTA mechanism or LD.

DatasetsDatasets

ResultsResults

ConclusionsConclusions aRPM algorithm improved in two ways: by adding selection of network nodes to ensure wide margins, and by adding kernel features. Relations to kernel SVM and biological plausibility make aRPM a very interesting subject of study. In contrast to typical neural networks that learn by parameter adaptation there is no learning involved, just generation and selection of features. Shifting the focus to generation of new features followed by the WTA, makes it a candidate for a fast-learning neural algorithm that is simpler and learns faster than MLP or RBF networks.

ConclusionsConclusions Feature selection and construction, finding interesting views on the data is the basis of natural categorization and learning processes. Scatterogram of WTA output shows the effect of margin optimization, and allows for estimation of confidence in classification of a given data. Further improvements: admission of impure clusters instead of binary features, use of other kernel features, selection of candidate SV and optimization of the algorithm.

Thank You!