Universal Learning Machines (ULM) Włodzisław Duch and Tomasz Maszczyk Department of Informatics, Nicolaus Copernicus University, Toruń, Poland ICONIP 2009,

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

ECG Signal processing (2)

Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.

Face Recognition and Biometric Systems Eigenfaces (2)

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Meta-Learning: the future of data mining Włodzisław Duch & Co Department of Informatics, Nicolaus Copernicus University, Toruń, Poland School of Computer.

Support Vector Machines

SVM—Support Vector Machines

An Overview of Machine Learning

Heterogeneous Forests of Decision Trees Krzysztof Grąbczewski & Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Torun, Poland.

Robust Multi-Kernel Classification of Uncertain and Imbalanced Data

Discriminative and generative methods for bags of features

Exploratory Data Mining and Data Preparation

Heterogeneous adaptive systems Włodzisław Duch & Krzysztof Grąbczewski Department of Informatics, Nicholas Copernicus University, Torun, Poland.

K-separability Włodzisław Duch Department of Informatics Nicolaus Copernicus University, Torun, Poland School of Computer Engineering, Nanyang Technological.

Almost Random Projection Machine with Margin Maximization and Kernel Features Tomasz Maszczyk and Włodzisław Duch Department of Informatics, Nicolaus Copernicus.

Support Vector Neural Training Włodzisław Duch Department of Informatics Nicolaus Copernicus University, Toruń, Poland School of Computer Engineering,

Transfer functions: hidden possibilities for better neural networks. Włodzisław Duch and Norbert Jankowski Department of Computer Methods, Nicholas Copernicus.

Support Vector Machines for Visualization and Dimensionality Reduction Tomasz Maszczyk and Włodzisław Duch Department of Informatics, Nicolaus Copernicus.

The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of non- linear features.

Competent Undemocratic Committees Włodzisław Duch, Łukasz Itert and Karol Grudziński Department of Informatics, Nicholas Copernicus University, Torun,

Support Feature Machine for DNA microarray data Tomasz Maszczyk and Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Representation of hypertext documents based on terms, links and text compressibility Julian Szymański Department of Computer Systems Architecture, Gdańsk.

Support Feature Machines: Support Vectors are not enough Tomasz Maszczyk and Włodzisław Duch Department of Informatics, Nicolaus Copernicus University,

For Better Accuracy Eick: Ensemble Learning

An Introduction to Support Vector Machines Martin Law.

Machine Learning CS 165B Spring 2012

Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.

A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.

Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

IJCNN 2012 Competition: Classification of Psychiatric Problems Based on Saccades Włodzisław Duch 1,2, Tomasz Piotrowski 1 and Edward Gorzelańczyk 3 1 Department.

Computational Intelligence: Methods and Applications Lecture 30 Neurofuzzy system FSM and covering algorithms. Włodzisław Duch Dept. of Informatics, UMK.

Meta-Learning: towards universal learning paradigms Włodzisław Duch & Co Department of Informatics, Nicolaus Copernicus University, Toruń, Poland Google:

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Line detection Assume there is a binary image, we use F(ά,X)=0 as the parametric equation of a curve with a vector of parameters ά=[α 1, …, α m ] and X=[x.

An Introduction to Support Vector Machines (M. Law)

Computational Intelligence: Methods and Applications Lecture 36 Meta-learning: committees, sampling and bootstrap. Włodzisław Duch Dept. of Informatics,

Computational Intelligence: Methods and Applications Lecture 20 SSV & other trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

Towards CI Foundations Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland Google: W. Duch WCCI’08 Panel Discussion.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Ensemble Methods.  “No free lunch theorem” Wolpert and Macready 1995.

Towards Science of DM Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland Google: W. Duch WCCI’08 Panel Discussion.

Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Computational Intelligence: Methods and Applications Lecture 33 Decision Tables & Information Theory Włodzisław Duch Dept. of Informatics, UMK Google:

Computational Intelligence: Methods and Applications Lecture 21 Linear discrimination, linear machines Włodzisław Duch Dept. of Informatics, UMK Google:

Kernel Regression Prof. Bennett Math Model of Learning and Discovery 1/28/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.

Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.

Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.

Computational Intelligence: Methods and Applications Lecture 34 Applications of information theory and selection of information Włodzisław Duch Dept. of.

Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Support Feature Machine for DNA microarray data

Zaman Faisal Kyushu Institute of Technology Fukuoka, JAPAN

Support Vector Machines and Kernels

Basic machine learning background with Python scikit-learn

Machine Learning Basics

Tomasz Maszczyk and Włodzisław Duch Department of Informatics,

Connecting Data with Domain Knowledge in Neural Networks -- Use Deep learning in Conventional problems Lizhong Zheng.

Artificial Intelligence Lecture No. 28

Support Vector Machine _ 2 (SVM)

Support Vector Neural Training

Heterogeneous adaptive systems

Presentation transcript:

Universal Learning Machines (ULM) Włodzisław Duch and Tomasz Maszczyk Department of Informatics, Nicolaus Copernicus University, Toruń, Poland ICONIP 2009, Bangkok

Plan Meta-learning Learning from others ULM algorithm Types of features Illustrative results Conclusions Learn from others, not only from your own mistakes! Then you will always have free lunch!

Meta-learning Meta-learning means different things for different people. Some people call “meta” learning of many models (ex. Weka), ranking them, arcing, boosting, bagging, or creating an ensemble in many ways  optimization of parameters to integrate models. Here meta-learning means learning how to learn. Goal: replace experts who search for the best models making a lot of experiments – there is no free lunch, but why to cook yourself? Search space of models is too large to explore it exhaustively, design system architecture to support knowledge-based search. One direction towards universal learning: use any method you like, but take the best from your competition! Best what? Best fragments of models, combinations of features.

CI and “no free lunch” “No free lunch" theorem: no single system may reach the best results for all possible distributions of data. Decision trees & rule-based systems: best for data with logical structure, require sharp decision borders, fail on problems where linear discrimination provides accurate solution. SVM in kernelized form works well when complex topology is required but may miss simple solutions that rule-based systems find, fails when sharp decision borders are needed, fail on complex Boolean problems. The key to general intelligence: specific information filters that make learning possible; chunking mechanisms that combine partial results into higher-level mental representations. More attention should be paid to generation of useful features.

ULM idea ULM is composed from two main modules: feature constructors, simple classifiers. In machine learning features are used to calculate: linear combinations of feature values, calculate distances (dissimilarites), scaled (includes selection) Is this sufficient? No, non-linear functions of features carry information that cannot be easily recovered by CI methods. Kernel approaches: linear solutions in the kernel space, implicitly add new features based on similarity K(X,S V ). => Create potentially useful, redundant set of futures. How? Learn what other models do well!

Binary features Binary features: B1: unrestricted projections; MAP classifiers, p(C|b); 2N C regions, complexity O(1) B2: Binary: restricted by other binary features; complexes b 1 ᴧ b 2 … ᴧ b k ; complexity O(2 k ) B3: Binary: restricted by distance; b ᴧ r 1 є [r 1-, r 1+ ]... ᴧ r k є [r k-, r k+ ]; separately for each b value. Ex: b=1, r 1 є [0, 1] take vectors only from this slice N1: Nominal – like binary. r1r1 b

Real features R1: Line, original features x i, sigmoidal transformation  (x i ) for contrast enhancement; search for 1D patterns (k-sep intervals). R2: Line, like R1 but restricted by other features, for example z i = x i (X) only for |x j | < t j. R3: Line, z i = x i (X) like R2 but restricted by distance R4: Line – linear combinations z=W. X optimized by projection pursuit (PCA, ICA, QPC...). P1: Prototypes: weighted distance functions, or specialized kernels z i = K(X,X i ). M1: Motifs, based on correlations between elements rather than input values.

Datasets Dataset#Features#Samples#Samples per class Australian no307 yes Appendicitis C 1 21 C 2 Heart absence139 presence Diabetes C C 2 Wisconsin benign241 malignant Hypothyroid C C C 3

B1 Features Dataset B1 Features AustralianF8 < 0.5F8 ≥ 0.5 ᴧ F9 ≥ 0.5 AppendicitisF7 ≥ F7 < ᴧ F4 < 12 HeartF13 < 4.5 ᴧ F12 < 0.5F13 ≥ 4.5 ᴧ F3 ≥ 3.5 DiabetesF2 < 123.5F2 ≥ WisconsinF2 < 2.5F2 ≥ 4.5 HypothyroidF17 < F17 ≥ ᴧ F21 < Example of B1 features taken from segments of decision trees. Other features that frequently proved useful on this data: P1 prototypes, enhanced contrast R1. These features used in various learning systems greatly simplify their models and increase their accuracy.

Dataset Classifier SVM (#SV)SSV (#Leafs)NB Australian84.9±5.6 (203)84.9±3.9 (4)80.3±3.8 UMU86.8±5.3(166)87.1±2.5(4)85.5±3.4 FeaturesB1(2) + P1(3)B1(2) + R1(1) + P1(3)B1(2) Appendicitis87.8±8.7 (31)88.0±7.4 (4)86.7±6.6 UMU91.4±8.2(18)91.7±6.7(3)91.4±8.2 FeaturesB1(2) Heart82.1±6.7 (101)76.8±9.6 (6)84.2±6.1 UMU83.4±3.5(98)79.2±6.3(6)84.5±6.8 FeaturesData + R1(3) Data + B1(2) Diabetes77.0±4.9 (361)73.6±3.4 (4)75.3±4.7 UMU78.5±3.6(338)75.0±3.3(3)76.5±2.9 FeaturesData + R1(3) + P1(4)B1(2)Data + B1(2) Wisconsin96.6±1.6 (46)95.2±1.5 (8)96.0±1.5 UMU97.2±1.8(45)97.4±1.6(2)97.2±2.0 FeaturesData + R1(1) + P1(4)R1(1) Hypothyroid94.1±0.6 (918)99.7±0.5 (12)41.3±8.3 UMU99.5±0.4(80)99.6±0.4(8)98.1±0.7 FeaturesData + B1(2)

Conclusions Systematic explorations of features transformations allows for discovery of simple models that more sophisticated learning systems may miss; results always improve and models simplify! Some benchmark problems have been found rather trivial, and have been solved with a single binary feature, one constrained nominal feature, or one new feature constructed as a projection on a line connecting means of two classes. Kernel-based features offer an attractive alternative to current kernel- based SVM approaches, offering multiresolution and adaptive regularization possibilities, combined with LDA or SVNT. Analysis of images, multimedia streams or biosequences may require even more sophisticated ways of constructing features starting from available input features. Learn from others, not only on your own errors!