Feature Selection Benjamin Biesinger - Manuel Maly - Patrick Zwickl.

Slides:



Advertisements
Similar presentations
Naïve-Bayes Classifiers Business Intelligence for Managers.
Advertisements

SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
MIS 2000 Class 20 System Development Process Updated 2014.
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Tic Tac Toe Architecture CSE 5290 – Artificial Intelligence 06/13/2011 Christopher Hepler.
Style checker for JAVA Baile Herculane, – U. Sacklowski, Dept. of Comp. Sc., HU-Berlin1 A style checker for JAVA and its application at.
Memory-Based Recommender Systems : A Comparative Study Aaron John Mani Srinivasan Ramani CSCI 572 PROJECT RECOMPARATOR.
The Decision-Making Process IT Brainpower
RIT Software Engineering
SE 450 Software Processes & Product Metrics 1 Defect Removal.
Text Classification: An Implementation Project Prerak Sanghvi Computer Science and Engineering Department State University of New York at Buffalo.
Robust Real-Time Object Detection Paul Viola & Michael Jones.
Recommender systems Ram Akella November 26 th 2008.
Quality-driven Integration of Heterogeneous Information System by Felix Naumann, et al. (VLDB1999) 17 Feb 2006 Presented by Heasoo Hwang.
A Search-based Method for Forecasting Ad Impression in Contextual Advertising Defense.
1 Residual Vectors & Error Estimation in Substructure based Model Reduction - A PPLICATION TO WIND TURBINE ENGINEERING - MSc. Presentation Bas Nortier.
Evaluating Performance for Data Mining Techniques
A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.
Introduction to Systems Analysis and Design Trisha Cummings.
Introduction To System Analysis and design
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
Preparing Data for Analysis and Analyzing Spatial Data/ Geoprocessing Class 11 GISG 110.
High level & Low level language High level programming languages are more structured, are closer to spoken language and are more intuitive than low level.
Face Model Fitting with Generic, Group-specific, and Person- specific Objective Functions Chair for Image Understanding and Knowledge-based Systems Institute.
Software Engineering Chapter 23 Software Testing Ku-Yaw Chang Assistant Professor Department of Computer Science and Information.
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
DATA MINING FINAL REPORT Vipin Saini M 許博淞 M 陳昀志 M
Advanced Computer Architecture 0 Lecture # 1 Introduction by Husnain Sherazi.
Introduction to Algorithms By Mr. Venkatadri. M. Two Phases of Programming A typical programming task can be divided into two phases: Problem solving.
CMSC 1041 Algorithms II Software Development Life-Cycle.
Weka: Experimenter and Knowledge Flow interfaces Neil Mac Parthaláin
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Christopher Moh 2005 Competition Programming Analyzing and Solving problems.
PSMS for Neural Networks on the Agnostic vs Prior Knowledge Challenge Hugo Jair Escalante, Manuel Montes and Enrique Sucar Computer Science Department.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
Feature Selection Benjamin Biesinger - Manuel Maly - Patrick Zwickl.
Record Linkage in a Distributed Environment
Solving Inequalities. C + 3 < 12 Guess a reasonable solution and write your guess.
Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.
Testing OO software. State Based Testing State machine: implementation-independent specification (model) of the dynamic behaviour of the system State:
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
The Big Three What are the three most common complaints we hear about testing?
The Theory of Production  Relationship between factors of production and the output of goods and services  How output changes when inputs change  Based.
Chapter 2-OPTIMIZATION
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Copyright  2004 limsoon wong Using WEKA for Classification (without feature selection)
Introduction to CADStat. CADStat and R R is a powerful and free statistical package [
Agenda  INTRODUCTION  GENETIC ALGORITHMS  GENETIC ALGORITHMS FOR EXPLORING QUERY SPACE  SYSTEM ARCHITECTURE  THE EFFECT OF DIFFERENT MUTATION RATES.
Algorithm Analysis Lakshmish Ramaswamy. What Constitutes Good Software? Code correctness Good design Code reusability OO design can help us in achieving.
LSM733-PRODUCTION OPERATIONS MANAGEMENT By: OSMAN BIN SAIF LECTURE 30 1.
Design Evaluation Overview Introduction Model for Interface Design Evaluation Types of Evaluation –Conceptual Design –Usability –Learning Outcome.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
The PLA Model: On the Combination of Product-Line Analyses 강태준.
Software Testing.
Intro to Computer Science II
Efficient Image Classification on Vertically Decomposed Data
A Distributed Genetic Algorithm for Learning Evaluation Parameters in a Strategy Game Gary Matthias.
Chapter 15 QUERY EXECUTION.
Efficient Image Classification on Vertically Decomposed Data
Introduction to Systems Analysis and Design
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
A Unifying View on Instance Selection
Data Warehousing Concepts
Algorithm Analysis How can we demonstrate that one algorithm is superior to another without being misled by any of the following problems: Special cases.
Presentation transcript:

Feature Selection Benjamin Biesinger - Manuel Maly - Patrick Zwickl

Agenda Introduction: What is feature selection? What is our contribution? Phases: What is the sequence of actions in our solution? Solution: How does it work in particular? Results: What is returned? Analysis: What to do with it? What can we conclude from it?

Introduction Not all features of a data set are useful for classification A large number of attributes negatively influences the computation time The most essential features should be used for classification Feature selection is an approach Different search strategies and evaluations are available, but which is the best? Automatic feature selection: Several algorithms are run, compared and analyzed for trends → Implemented by us

Phases Phases: (I) Meta-classification - (II) Classification Before: File loading & preparation Afterwards: Comparison + output generation

Fish for sale buy now and get an ugly fish for free! call FIIIISSSHHH

Solution

Results Tested on 3 different datasets Tic Tac Toe Wine Quality (red) Balance Scale 2 comparisons per dataset were made For each feature selection individually Between different feature selection techniques Is there a trend which features are selected by most techniques?

1st Comparison Influence of number of selected features on Runtime Classification accuracy (measured in MAE)

1st Comparison Result Only those search algorithms used that implement RankedOutputSearch interface Capable to influence the number of features to select Number of features selected and MAE behave to each other directly proportional – to runtime inversely proportional

2nd Comparison Feature Selection Technique consists of Search algorithm Evaluation algorithm Not all combinations possible! Different feature selection techniques compared to each other concerning: Runtime Performance (measured in MAE)

2nd Comparison Result Different techniques select different amount of attributes In some extent, different attributes, too Some techniques are slower than others Huge runtime differences between search algorithms Some techniques select insufficient attributes to give acceptable results

Trend In all tested datasets there was a trend on which features were selected Higher count of selection implies bigger influence to the output

Analysis Different feature selection techniques – different characteristics ClassifierSubsetEval / RaceSearch very good classification results Less attributes – faster classification Algorithms that select less features are faster e.g. GeneticSearch

Lowest error rate DatasetFeature Selection Technique RuntimeMean absolute error Tic Tac Toe ClassifierSubsetEval / RaceSearch 64215,25 Wine Quality (red) ClassifierSubsetEval / RaceSearch ,8 Balance Scale many9-3421,96

Lowest runtime DatasetFeature Selection Technique RuntimeMean absolute error Tic Tac Toe x / RankSearch1750,85 Wine Quality (red) WrapperSubsetEval / GeneticSearch ,57 Balance Scale many5-34-

Trend DatasetFirstSecondThird Tic Tac Toe Top-left-square Top-right- square Top-middle- square Wine Quality (red) Volatile acidity Fixed acidityChlorides Balance Scale Right-weightRight-distanceLeft-distance

Feature Selection Benjamin Biesinger - Manuel Maly - Patrick Zwickl Any questions ? The essential features ;) hääh? Anything missed? thx