Feature selection with Neural Networks Dmitrij Lagutin, T-61.6040 - Variable Selection for Regression 24.10.2006.

Slides:



Advertisements
Similar presentations
Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
Advertisements

ECG Signal processing (2)
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
NEURAL NETWORKS Perceptron
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
An Introduction of Support Vector Machine
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Artificial Neural Networks
Artificial Neural Networks - Introduction -
Artificial Neural Networks - Introduction -
Lecture 4: Embedded methods
Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban.
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Feature Selection for Regression Problems
Speaker Adaptation for Vowel Classification
1 Chapter 9 Variable Selection and Model building Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Chapter 6: Multilayer Neural Networks
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Radial Basis Function Networks
Radial Basis Function Networks
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
by B. Zadrozny and C. Elkan
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
Combining Regression Trees and Radial Basis Function Networks paper by: M. Orr, J. Hallam, K. Takezawa, A. Murray, S. Ninomiya, M. Oide, T. Leonard presentation.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Radial Basis Function Networks:
Artificial Intelligence Techniques Multilayer Perceptrons.
Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.
Ensemble Methods: Bagging and Boosting
Non-Bayes classifiers. Linear discriminants, neural networks.
A Simulated-annealing-based Approach for Simultaneous Parameter Optimization and Feature Selection of Back-Propagation Networks (BPN) Shih-Wei Lin, Tsung-Yuan.
Intro. ANN & Fuzzy Systems Lecture 14. MLP (VI): Model Selection.
BCS547 Neural Decoding.
Artificial Neural Networks Students: Albu Alexandru Deaconescu Ionu.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Implementing Local Relative Sensitivity Pruning Paul Victorey.
Data Mining and Decision Support
Machine Learning 5. Parametric Methods.
Learning Neural Networks (NN) Christina Conati UBC
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
CS 9633 Machine Learning Support Vector Machines
Deep Feedforward Networks
Artificial Neural Networks
Going Backwards In The Procedure and Recapitulation of System Identification By Ali Pekcan 65570B.
Forward & Backward selection in hybrid network
Hidden Markov Models Part 2: Algorithms
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Neural Networks Geoff Hulten.
Lecture Notes for Chapter 4 Artificial Neural Networks
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Neural networks (1) Traditional multi-layer perceptrons
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Evolutionary Ensembles with Negative Correlation Learning
Presentation transcript:

Feature selection with Neural Networks Dmitrij Lagutin, T Variable Selection for Regression

Contents Introduction Model independent feature selection Feature selection with neural networks Experimental comparison between different methods

Introduction Feature selection consists usually of following –A feature evaluation criteria to evaluate and select variable subsets –A search procedure, to explore a subspace of possible variable combinations –A stop criterion or a model selection strategy

Model independent feature selection These methods are not neural network oriented, they are mentioned here because they are used in experimental comparison These methods do not take into account the classification or regression model during variable selection

Model independent feature selection Bonnlander method utilizes mutual information –Mutual information for variables a and b that have probability densities P(a) and P(b) is –It is a forward search and it selects variable x p that maximises: –SV p-1 is the set of p-1 already selected variables

Model independent feature selection Stepdisc is a stepwise feature selection method for classification

Feature selection with neural networks Feature selection with neural networks uses mostly backwards search: in the beginning all variables are present and unnecessary variables are eliminated Neural networks are usually non linear models. Thus methods that assume that input-output variables dependency is linear are not suited for neural networks

Feature selection with neural networks Different feature selection algorithms using neural networks can be classified using following criteria –Zero order methods which use only the network parameter values –First order methods which use first derivatives of network parameters –Second order methods which use second derivatives of network parameters

Zero order methods Yacoub and Bennani has proposed a method with following evaluation criterion that uses both weights and the network structure (I, H, O denote input, hidden and output layers)

Zero order methods This method uses a backward search and the neural network is retrained after each variable deletion Stop criterion is based on the evolution of the performances on a validation set, as soon as performances decrease, the elimination is stopped

First order methods First order methods evaluate the relevance of a variable by computing derivative of the error or of the output with respect to the variable Method proposed by Moody and Utans uses variation of the learning error as evaluation criterion:

First order methods Because the computation of S i is difficult for large values of N, S i can be approximated

Comparison of first order methods There are several first order methods that use output derivatives and which mainly differs on the derivative used On the next slide there is a comparison of these methods –C/R describes tasks on which the method can be used, C = classification, R = regression

Experiments

Second order methods Use second derivatives of network parameters Optimal Cell Damage method was proposed by Cibas, its evaluation criteria is Where fan-out(i) is set of weights of input i

Second order methods Early Cell Damage method is somehow similar. Leray has proposed following evaluation criteria:

Experimental comparison between different methods Neural networks used in comparison are multilayer perceptrons with one hidden layer containing 10 neurons First problem is a three class waveforms classification problem with 21 noisy dependent features (Breiman et al. 1984) In the first example, 19 pure noise variables were added to 21 initial variables, thus there were 40 input variables in total

Experimental comparison between different methods MethodpSelected variablesPerf. None ,51% Stepdisc ,35% Bonnlander ,12% Yacoub ,16% Moody ,19% Ruck, Dorizzi ,51% Czernichow ,67% Cibas ,26% Leray ,56%

Experimental comparison between different methods In the first example all methods removed pure noise variables Bonnlander and Stepdisc methods performed quite well Ruck, Dorizzi and Czernichow methods did not remove enought variables while Cibas method removed too many variables

Experimental comparison between different methods In the second example, the problem is the same, but now only original 21 variables are present Leray method performed very well, Yacoub method removed too few variable while Bonnlander and Czernichow methods removed too many variables and have poor performance

Experimental comparison between different methods

Second problem is a two class problem in a 20 dimensional space. The class are distributed according two gaussians. Again, Bonnlander method removed too many variables and performance suffered while Yacoub method removed too few variables In this example, Dorizzi and Ruck methods performed quite well, they removed a lot variables and achieved a good performance

Experimental comparison between different methods

Conclusions Methods using neural networks can be divided in three categories: zero order methods, first order methods and second order methods The best method depends on the task. For example in the first problem second order methods performed poorly when noise was added. But without an additional noise, performance of second order methods was very good –Non neural network methods (Stepdisc and Bonnlander) performed well in the original example, but quite poorly in other examples

References P. Leray and P. Gallinari. Feature selection with neural networks. Behaviormetrica, Breiman, L., Friedman, J., Olshen R., and Stone, C. (1984). Classification and Regression Trees. Wadsworth Internation Group