Presentation is loading. Please wait.

Presentation is loading. Please wait.

Feature Selection for Regression Problems

Similar presentations


Presentation on theme: "Feature Selection for Regression Problems"— Presentation transcript:

1 Feature Selection for Regression Problems
M. Karagiannopoulos, D. Anyfantis, S. B. Kotsiantis, P. E. Pintelas Educational Software Development Laboratory and Computers and Applications Laboratory Department of Mathematics, University of Patras, Greece

2 Scope To investigate the most suitable wrapper feature selection technique (if any) for some well known regression algorithms.

3 Contents Introduction Feature selection techniques Wrapper algorithms
Experiments Conclusions

4 Introduction What is the feature subset selection problem?
Occurs prior to the learning (induction) algorithm Selection of the relevant features (variables) that influence the prediction of the learning algorithm

5 Why feature selection is important?
May improve performance of learning algorithm Learning algorithm may not scale up to the size of the full feature set either in sample or time Allows us to better understand the domain Cheaper to collect a reduced set of features

6 Characterising features
Generally, features are characterized as: Relevant: These are features which have an influence on the output and their role can not be assumed by the rest Irrelevant: Irrelevant features are defined as those features not having any influence on the output, and whose values are generated at random for each example. Redundant: A redundancy exists whenever a feature can take the role of another (perhaps the simplest way to model redundancy).

7 Typical Feature Selection – First step
1 2 Original Feature Set Generation Subset Evaluation Generates subset of features for evaluation Can start with: no features all features random subset of features Goodness of the subset Stopping Criterion No Yes Validation 3 4

8 Typical Feature Selection – Second step
Measures the goodness of the subset Compares with the previous best subset if found better, then replaces the previous best subset 1 2 Original Feature Set Generation Subset Evaluation Goodness of the subset Stopping Criterion No Yes Validation 3 4

9 Typical Feature Selection – Third step
Based on Generation Procedure: Pre-defined number of features Pre-defined number of iterations Based on Evaluation Function: whether addition or deletion of a feature does not produce a better subset whether optimal subset based on some evaluation function is achieved 1 2 Original Feature Set Generation Subset Evaluation Goodness of the subset Stopping Criterion No Yes Validation 3 4

10 Typical Feature Selection - Fourth step
1 2 Original Feature Set Generation Subset Evaluation Basically not part of the feature selection process itself - compare results with already established results or results from competing feature selection methods Goodness of the subset Stopping Criterion No Yes Validation 3 4

11 Categorization of feature selection techniques
Feature selection methods are grouped into two broad groups: Filter methods that take the set of data (features) attempting to trim some and then hand this new set of features to the learning algorithm Wrapper methods that use as evaluation measure the accuracy of the learning algorithm

12 Argument for wrapper methods
The estimated accuracy of the learning algorithm is the best available heuristic for measuring the values of features. Different learning algorithms may perform better with different feature sets, even if they are using the same training set.

13 Wrapper selection algorithms (1)
The simplest method is forward selection (FS). It starts with the empty set and greedily adds features one at a time (without backtracking). Backward stepwise selection (BS) starts with all features in the feature set and greedily removes them one at a time (without backtracking).

14 Wrapper selection algorithms (2)
The Best First search starts with an empty set of features and generates all possible single feature expansions. The subset with the highest evaluation is chosen and is expanded in the same manner by adding single features (with backtracking). The Best First search (BFFS) can be combined with forward or backward selection (BFBS). Genetic algorithm selection. A solution is typically a fixed length binary string representing a feature subset—the value of each position in the string represents the presence or absence of a particular feature. The algorithm is an iterative process where each successive generation is produced by applying genetic operators such as crossover and mutation to the members of the current generation.

15 Experiments For the purpose of the present study, we used 4 well known learning algorithms (RepTree, M5rules, K*, SMOreg), the presented feature selection algorithms and 12 dataset from the UCI repository.

16 Methodology of experiments
The whole training set was divided into ten mutually exclusive and equal-sized subsets and for each subset the learner was trained on the union of all of the other subsets. The best features are selected according to the feature selection algorithm and the performance of the subset is measured by how well it predicts the values of the test instances. This cross validation procedure was run 10 times for each algorithm and the average value of the 10-cross validations was calculated.

17 Experiment with regression tree - RepTree
BS is slightly better feature selection method (on the average) than the others for the RepTree.

18 Experiment with rule learner- M5rules
BS, BFBS and GS are the best feature selection methods (on the average) for the M5rules learner.

19 Experiment with instance based learner - K*
BS and BFBS is the best feature selection methods (on the average) for K* algorithm

20 Experiment with SMOreg
Similar results from all feature selection methods

21 Conclusions None of the described feature selection algorithms is superior to others in all data sets for a specific learning algorithm. None of the described feature selection algorithms is superior to others in all data sets. Backward selection strategies are very inefficient for large-scale datasets, which may have hundreds of original features. Forward selection wrapper methods are less able to improve performance of a given classifier, but they are less expensive in terms of the computational effort and use fewer features for the induction. Genetic selection typically requires a large number of evaluations to reach a minimum.

22 Future Work We will use a light filter feature selection procedure as a preprocessing step in order to reduce the computational cost of the wrapping procedure without harming accuracy.


Download ppt "Feature Selection for Regression Problems"

Similar presentations


Ads by Google