Feature Selection for Regression Problems

Slides:



Advertisements
Similar presentations
Data Mining Classification: Alternative Techniques
Advertisements

Chapter 5 Multiple Linear Regression
From Decision Trees To Rules
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Feature Selection Presented by: Nafise Hatamikhah
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Decision Tree Algorithm
A new crossover technique in Genetic Programming Janet Clegg Intelligent Systems Group Electronics Department.
1 Chapter 9 Variable Selection and Model building Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Three kinds of learning
Dimension Reduction and Feature Selection Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Recursion Chapter 7. Chapter 7: Recursion2 Chapter Objectives To understand how to think recursively To learn how to trace a recursive method To learn.
Genetic Algorithm What is a genetic algorithm? “Genetic Algorithms are defined as global optimization procedures that use an analogy of genetic evolution.
Chapter 5 Data mining : A Closer Look.
1 Time Scales Virtual Clocks and Algorithms Ricardo José de Carvalho National Observatory Time Service Division February 06, 2008.
Jeff Howbert Introduction to Machine Learning Winter Machine Learning Feature Creation and Selection.
1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University.
A REVIEW OF FEATURE SELECTION METHODS WITH APPLICATIONS Alan Jović, Karla Brkić, Nikola Bogunović {alan.jovic, karla.brkic,
Genetic Algorithm.
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
Feature Selection.
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
315 Feature Selection. 316 Goals –What is Feature Selection for classification? –Why feature selection is important? –What is the filter and what is the.
Recursion Chapter 7. Chapter Objectives  To understand how to think recursively  To learn how to trace a recursive method  To learn how to write recursive.
CS 484 – Artificial Intelligence1 Announcements Lab 3 due Tuesday, November 6 Homework 6 due Tuesday, November 6 Lab 4 due Thursday, November 8 Current.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
GENETIC ALGORITHMS.  Genetic algorithms are a form of local search that use methods based on evolution to make small changes to a popula- tion of chromosomes.
Bab 5 Classification: Alternative Techniques Part 1 Rule-Based Classifer.
2005MEE Software Engineering Lecture 11 – Optimisation Techniques.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Feature selection with Neural Networks Dmitrij Lagutin, T Variable Selection for Regression
MAE 552 Heuristic Optimization Instructor: John Eddy Lecture #12 2/20/02 Evolutionary Algorithms.
Feature Selection and Weighting using Genetic Algorithm for Off-line Character Recognition Systems Faten Hussein Presented by The University of British.
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Dr. Gheith Abandah 1.  Feature selection is typically a search problem for finding an optimal or suboptimal subset of m features out of original M features.
Genetic Algorithm Dr. Md. Al-amin Bhuiyan Professor, Dept. of CSE Jahangirnagar University.
1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
Rule Induction for Classification Using
Chapter 6 Classification and Prediction
Data Mining Classification: Alternative Techniques
Data Mining (and machine learning)
Advanced Artificial Intelligence Feature Selection
Machine Learning Feature Creation and Selection
Discriminative Frequent Pattern Analysis for Effective Classification
Classification and Prediction
CSCI N317 Computation for Scientific Applications Unit Weka
Aiman H. El-Maleh Sadiq M. Sait Syed Z. Shazli
Chapter 7: Transformations
FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR
Presentation transcript:

Feature Selection for Regression Problems M. Karagiannopoulos, D. Anyfantis, S. B. Kotsiantis, P. E. Pintelas Educational Software Development Laboratory and Computers and Applications Laboratory Department of Mathematics, University of Patras, Greece

Scope To investigate the most suitable wrapper feature selection technique (if any) for some well known regression algorithms.

Contents Introduction Feature selection techniques Wrapper algorithms Experiments Conclusions

Introduction What is the feature subset selection problem? Occurs prior to the learning (induction) algorithm Selection of the relevant features (variables) that influence the prediction of the learning algorithm

Why feature selection is important? May improve performance of learning algorithm Learning algorithm may not scale up to the size of the full feature set either in sample or time Allows us to better understand the domain Cheaper to collect a reduced set of features

Characterising features Generally, features are characterized as: Relevant: These are features which have an influence on the output and their role can not be assumed by the rest Irrelevant: Irrelevant features are defined as those features not having any influence on the output, and whose values are generated at random for each example. Redundant: A redundancy exists whenever a feature can take the role of another (perhaps the simplest way to model redundancy).

Typical Feature Selection – First step 1 2 Original Feature Set Generation Subset Evaluation Generates subset of features for evaluation Can start with: no features all features random subset of features Goodness of the subset Stopping Criterion No Yes Validation 3 4

Typical Feature Selection – Second step Measures the goodness of the subset Compares with the previous best subset if found better, then replaces the previous best subset 1 2 Original Feature Set Generation Subset Evaluation Goodness of the subset Stopping Criterion No Yes Validation 3 4

Typical Feature Selection – Third step Based on Generation Procedure: Pre-defined number of features Pre-defined number of iterations Based on Evaluation Function: whether addition or deletion of a feature does not produce a better subset whether optimal subset based on some evaluation function is achieved 1 2 Original Feature Set Generation Subset Evaluation Goodness of the subset Stopping Criterion No Yes Validation 3 4

Typical Feature Selection - Fourth step 1 2 Original Feature Set Generation Subset Evaluation Basically not part of the feature selection process itself - compare results with already established results or results from competing feature selection methods Goodness of the subset Stopping Criterion No Yes Validation 3 4

Categorization of feature selection techniques Feature selection methods are grouped into two broad groups: Filter methods that take the set of data (features) attempting to trim some and then hand this new set of features to the learning algorithm Wrapper methods that use as evaluation measure the accuracy of the learning algorithm

Argument for wrapper methods The estimated accuracy of the learning algorithm is the best available heuristic for measuring the values of features. Different learning algorithms may perform better with different feature sets, even if they are using the same training set.

Wrapper selection algorithms (1) The simplest method is forward selection (FS). It starts with the empty set and greedily adds features one at a time (without backtracking). Backward stepwise selection (BS) starts with all features in the feature set and greedily removes them one at a time (without backtracking).

Wrapper selection algorithms (2) The Best First search starts with an empty set of features and generates all possible single feature expansions. The subset with the highest evaluation is chosen and is expanded in the same manner by adding single features (with backtracking). The Best First search (BFFS) can be combined with forward or backward selection (BFBS). Genetic algorithm selection. A solution is typically a fixed length binary string representing a feature subset—the value of each position in the string represents the presence or absence of a particular feature. The algorithm is an iterative process where each successive generation is produced by applying genetic operators such as crossover and mutation to the members of the current generation.

Experiments For the purpose of the present study, we used 4 well known learning algorithms (RepTree, M5rules, K*, SMOreg), the presented feature selection algorithms and 12 dataset from the UCI repository.

Methodology of experiments The whole training set was divided into ten mutually exclusive and equal-sized subsets and for each subset the learner was trained on the union of all of the other subsets. The best features are selected according to the feature selection algorithm and the performance of the subset is measured by how well it predicts the values of the test instances. This cross validation procedure was run 10 times for each algorithm and the average value of the 10-cross validations was calculated.

Experiment with regression tree - RepTree BS is slightly better feature selection method (on the average) than the others for the RepTree.

Experiment with rule learner- M5rules BS, BFBS and GS are the best feature selection methods (on the average) for the M5rules learner.

Experiment with instance based learner - K* BS and BFBS is the best feature selection methods (on the average) for K* algorithm

Experiment with SMOreg Similar results from all feature selection methods

Conclusions None of the described feature selection algorithms is superior to others in all data sets for a specific learning algorithm. None of the described feature selection algorithms is superior to others in all data sets. Backward selection strategies are very inefficient for large-scale datasets, which may have hundreds of original features. Forward selection wrapper methods are less able to improve performance of a given classifier, but they are less expensive in terms of the computational effort and use fewer features for the induction. Genetic selection typically requires a large number of evaluations to reach a minimum.

Future Work We will use a light filter feature selection procedure as a preprocessing step in order to reduce the computational cost of the wrapping procedure without harming accuracy.