Feature Selection Benjamin Biesinger - Manuel Maly - Patrick Zwickl.

Feature Selection Benjamin Biesinger - Manuel Maly - Patrick Zwickl

Agenda Introduction: What is feature selection? What is our contribution? Phases: What is the sequence of actions in our solution? Solution: How does it work in particular? Results: What is returned? Analysis: What to do with it? What can we conclude from it?

Introduction Not all features of a data set are useful for classification A large number of attributes negatively influences the computation time The most essential features should be used for classification Feature selection is an approach Different search strategies and evaluations are available, but which is the best? Automatic feature selection: Several algorithms are run, compared and analyzed for trends → Implemented by us

Phases Phases: (I) Meta-classification - (II) Classification Before: File loading & preparation Afterwards: Comparison + output generation

Fish for sale buy now and get an ugly fish for free! call FIIIISSSHHH

Solution

Results Tested on 3 different datasets Tic Tac Toe Wine Quality (red) Balance Scale 2 comparisons per dataset were made For each feature selection individually Between different feature selection techniques Is there a trend which features are selected by most techniques?

1st Comparison Influence of number of selected features on Runtime Classification accuracy (measured in MAE)

1st Comparison Result Only those search algorithms used that implement RankedOutputSearch interface Capable to influence the number of features to select Number of features selected and MAE behave to each other directly proportional – to runtime inversely proportional

2nd Comparison Feature Selection Technique consists of Search algorithm Evaluation algorithm Not all combinations possible! Different feature selection techniques compared to each other concerning: Runtime Performance (measured in MAE)

2nd Comparison Result Different techniques select different amount of attributes In some extent, different attributes, too Some techniques are slower than others Huge runtime differences between search algorithms Some techniques select insufficient attributes to give acceptable results

Trend In all tested datasets there was a trend on which features were selected Higher count of selection implies bigger influence to the output

Analysis Different feature selection techniques – different characteristics ClassifierSubsetEval / RaceSearch very good classification results Less attributes – faster classification Algorithms that select less features are faster e.g. GeneticSearch

Lowest error rate DatasetFeature Selection Technique RuntimeMean absolute error Tic Tac Toe ClassifierSubsetEval / RaceSearch 64215,25 Wine Quality (red) ClassifierSubsetEval / RaceSearch 359450,8 Balance Scale many9-3421,96

Lowest runtime DatasetFeature Selection Technique RuntimeMean absolute error Tic Tac Toe x / RankSearch1750,85 Wine Quality (red) WrapperSubsetEval / GeneticSearch 173263,57 Balance Scale many5-34-

Trend DatasetFirstSecondThird Tic Tac Toe Top-left-square Top-right- square Top-middle- square Wine Quality (red) Volatile acidity Fixed acidityChlorides Balance Scale Right-weightRight-distanceLeft-distance

Feature Selection Benjamin Biesinger - Manuel Maly - Patrick Zwickl Any questions ? The essential features ;) hääh? Anything missed? thx

Feature Selection Benjamin Biesinger - Manuel Maly - Patrick Zwickl.

Similar presentations

Presentation on theme: "Feature Selection Benjamin Biesinger - Manuel Maly - Patrick Zwickl."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Feature Selection Benjamin Biesinger - Manuel Maly - Patrick Zwickl.

Similar presentations

Presentation on theme: "Feature Selection Benjamin Biesinger - Manuel Maly - Patrick Zwickl."— Presentation transcript:

Similar presentations

About project

Feedback