Feature Selection Methods Part-I By: Dr. Rajeev Srivastava IIT(BHU), Varanasi.

Slides:



Advertisements
Similar presentations
Data Mining Classification: Alternative Techniques
Advertisements

Chapter 5 Multiple Linear Regression
Random Forest Predrag Radenković 3237/10
CS6800 Advanced Theory of Computation
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Minimum Redundancy and Maximum Relevance Feature Selection
Tuesday, May 14 Genetic Algorithms Handouts: Lecture Notes Question: when should there be an additional review session?
Feature Selection Presented by: Nafise Hatamikhah
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
TEMPLATE DESIGN © Genetic Algorithm and Poker Rule Induction Wendy Wenjie Xu Supervised by Professor David Aldous, UC.
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
Feature Selection for Regression Problems
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
A TABU SEARCH APPROACH TO POLYGONAL APPROXIMATION OF DIGITAL CURVES.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
Special Topic: Missing Values. Missing Values Common in Real Data  Pneumonia: –6.3% of attribute values are missing –one attribute is missing in 61%
Jeff Howbert Introduction to Machine Learning Winter Machine Learning Feature Creation and Selection.
Genetic Algorithms Overview Genetic Algorithms: a gentle introduction –What are GAs –How do they work/ Why? –Critical issues Use in Data Mining –GAs.
Genetic Programming.
Genetic Algorithm.
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, April 3, 2000 DingBing.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
315 Feature Selection. 316 Goals –What is Feature Selection for classification? –Why feature selection is important? –What is the filter and what is the.
Intro. ANN & Fuzzy Systems Lecture 36 GENETIC ALGORITHM (1)
Feature Selection Jain, A.K.; Duin, P.W.; Jianchang Mao, “Statistical pattern recognition: a review”, IEEE Transactions on Pattern Analysis and Machine.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Genetic algorithms Charles Darwin "A man who dares to waste an hour of life has not discovered the value of life"
1/27 Discrete and Genetic Algorithms in Bioinformatics 許聞廉 中央研究院資訊所.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Chapter 7 FEATURE EXTRACTION AND SELECTION METHODS Part 2 Cios / Pedrycz / Swiniarski / Kurgan.
Genetic Algorithms. Evolutionary Methods Methods inspired by the process of biological evolution. Main ideas: Population of solutions Assign a score or.
2005MEE Software Engineering Lecture 11 – Optimisation Techniques.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Waqas Haider Bangyal 1. Evolutionary computing algorithms are very common and used by many researchers in their research to solve the optimization problems.
Feature Selection and Weighting using Genetic Algorithm for Off-line Character Recognition Systems Faten Hussein Presented by The University of British.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Data Mining and Decision Support
GENETIC ALGORITHM Basic Algorithm begin set time t = 0;
D Nagesh Kumar, IIScOptimization Methods: M8L5 1 Advanced Topics in Optimization Evolutionary Algorithms for Optimization and Search.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Discovering Interesting Patterns for Investment Decision Making with GLOWER-A Genetic Learner Overlaid With Entropy Reduction Advisor : Dr. Hsu Graduate.
Dr. Gheith Abandah 1.  Feature selection is typically a search problem for finding an optimal or suboptimal subset of m features out of original M features.
Data Mining 2, Filter methods T statistic Information Distance Correlation Separability …
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Computational Intelligence: Methods and Applications Lecture 34 Applications of information theory and selection of information Włodzisław Duch Dept. of.
Genetic Algorithms An Evolutionary Approach to Problem Solving.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
 Presented By: Abdul Aziz Ghazi  Roll No:  Presented to: Sir Harris.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Introduction to Machine Learning, its potential usage in network area,
Advanced Artificial Intelligence Feature Selection
Machine Learning Feature Creation and Selection
Feature Selection To avid “curse of dimensionality”
FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR
Presentation transcript:

Feature Selection Methods Part-I By: Dr. Rajeev Srivastava IIT(BHU), Varanasi

Introduction The feature is defined as a function of one or more measurements each of which specifies some quantifiable property of an image, and is computed such that it quantifies some significant characteristics of the object. Feature selection is the process of selecting a subset of relevant features for use in model construction. The features removed should be useless, redundant, or of the least possible use The goal of feature selection is to find the subset of features that produces the best target detection and recognition performance and requires the least computational effort.

Reasons of Feature Selection Feature selection is important to target detection and recognition systems mainly for three reasons: First, using more features can increase system complexity, yet it may not always lead to higher detection/recognition accuracy. Sometimes, many features are available to a detection/recognition system. These features are not independent and may be correlated. A bad feature may greatly degrade the performance of the system. Thus, selecting a subset of good features is important Second, Selecting many features means a complicated model being used to approximate the training data. According to the minimum description length principle (MDLP), a simple model is better than a complex model Third, using fewer features can reduce the computational cost, which is important for real-time applications. Also it may lead to better classification accuracy due to the finite sample size effect. Feature selection techniques provide three main benefits when constructing predictive models: Improved model interpretability Shorter Computation times Enhanced generalisation by reduction

Advantages of Feature Selection It reduces the dimensionality of the feature space, to limit storage requirements and increase algorithm speed It removes the redundant, irrelevant or noisy data. The immediate effects for data analysis tasks are speeding up the running time of the learning algorithms. Improving the data quality. Increasing the accuracy of the resulting model. Feature set reduction, to save resources in the next round of data collection or during utilization; Performance improvement, to gain in predictive accuracy; Data understanding, to gain knowledge about the process that generated the data or simply visualize the data

Taxonomy of Feature Selection (Statistical pattern Recognition) (Produce same subset on a given problem every time)

Feature Selection Approaches There are two approaches in Feature selection: Forward Selection: Start with no variables and add them one by one, at each step adding the one that decreases the error the most, until any further addition does not significantly decrease the error. Backward Selection: Start with all the variables and remove them one by one, at each step removing the one that decreases the error the most (or increases it only slightly), until any further removal increases the error significantly. To reduce over fitting, the error referred to above is the error on a validation set that is distinct from the training set.

Schemes for Feature Selection The relationship between a FSA and the inducer chosen to evaluate the usefulness of the feature selection process can take three main forms: Filter Methods : These methods select features based on discriminating criteria that are relatively independent of classification Minimum redundancy-maximum relevance (MRMR) method is example of filter method. They supplement the maximum relevance criteria along with minimum redundancy criteria to choose additional features that are maximally dissimilar to already identified ones.

Wrapper Methods : These methods select features based on discriminating criteria that are relatively independent of classification Embedded Methods : The inducer has its own FSA (either explicit or implicit). The methods to induce logical conjunctions provide an example of this embedding. Other traditional machine learning tools like decision trees or artificial neural networks are included in this scheme.

Filters vs Wrappers Filters: Fast execution (+): Filters generally involve a non-iterative computation on the dataset, which can execute much faster than a classifier training session Generality (+): Since filters evaluate the intrinsic properties of the data, rather than their interactions with a particular classifier, their results exhibit more generality: the solution will be “good” for a larger family of classifiers Tendency to select large subsets (-): Since the filter objective functions are generally monotonic, the filter tends to select the full feature set as the optimal solution. This forces the user to select an arbitrary cutoff on the number of features to be selected Wrappers: Accuracy (+): wrappers generally achieve better recognition rates than filters since they are tuned to the specific interactions between the classifier and the dataset Ability to generalize (+): wrappers have a mechanism to avoid overfitting, since they typically use cross-validation measures of predictive accuracy Slow execution (-): since the wrapper must train a classifier for each feature subset (or several classifiers if cross-validation is used), the method can become unfeasible for computationally intensive methods Lack of generality (-): the solution lacks generality since it is tied to the bias of the classifier used in the evaluation function. The “optimal” feature subset will be specific to the classifier under consideration

Naïve method Begin with a single solution (feature subset) & iteratively add or remove features until some termination criterion is met Bottom up (forward method): begin with an empty set & add features Top-down (backward method): begin with a full set & delete features These “greedy” methods do not examine all possible subsets, so no guarantee of finding the optimal subset Sequential methods Sequential methods Sort the given d features in order of their prob. of correct recognition Select the top m features from this sorted list Disadvantage: Feature correlation is not considered; best pair of features may not even contain the best individual feature

Sequential Forward Selection 1. Start with the empty set Y 0 ={ ∅ } 2. Select the next best feature X + 3. Update Y k+1 =Y k + X + ; =+1 4. Go to 2 SFS performs best when the optimal subset is small, When the search is near the empty set, a large number of states can be potentially evaluated Towards the full set, the region examined by SFS is narrower since most features have already been selected The search space is drawn like an ellipse to emphasize the fact that there are fewer states towards the full or empty sets Disadvantage: Once a feature is retained, it cannot be discarded; nesting problem

Sequential Backward Selection 1. Start with the full set Y 0 = 2. Remove the worst feature X - 3. Update Y k+1 =Y k – X - ; =+1 4. Go to 2 Sequential Backward Elimination works in the opposite direction of SFS SBS works best when the optimal feature subset is large, since SBS spends most of its time visiting large subsets The main limitation of SBS is its inability to re-evaluate the usefulness of a feature after it has been discarded

Generalized sequential forward selection Generalized sequential forward selection Start with the empty set, X=0 Repeatedly add the most significant m-subset of (Y - X) (found through exhaustive search) Generalized sequential backward selection Generalized sequential backward selection Start with the full set, X=Y Repeatedly delete the least significant m-subset of X (found through exhaustive search)

Bidirectional Search (BDS) BDS is a parallel implementation of SFS and SBS SFS is performed from the empty set SBS is performed from the full set To guarantee that SFS and SBS converge to the same solution: Features already selected by SFS are not removed by SBS Features already removed by SBS are not selected by SFS 1.Start SFS with Y f ={ ∅ } 2.Start SBS with Y B = 3.Select the best feature X + Update Y F(k+1) =Y Fk + X + ; =+1 4.Remove the worst feature X - Update Y B(k+1) =Y Bk + X - ; =+1

Sequential floating selection (SFFS & SFBS) There are two floating methods Sequential floating forward selection (SFFS) starts from the empty set After each forward step, SFFS performs backward steps as long as the objective function increases Sequential floating backward selection (SFBS) starts from the full set After each backward step, SFBS performs forward steps as long as the objective function increases SFFS algorithm: 1. Y 0 ={ ∅ } 2. Select the best feature X + Update Y k+1 =Y k + X + ; =+1 3.Select the best feature X - 4.If J(Y k -x - )>J(Y k ) then { J(x)=Criterion Function} Y k+1 =Y k -x - ;k=k+1 go to step 3 Else go to step 2 (We need to do some book-keeping to avoid infinite loop)

Genetic Algorithm Feature Selection In a GA approach, a given feature subset is represented as a binary string a “chromosome" of length n with a zero or one in position i denoting the absence or presence of feature i in the set ( n = total number of available features) A population of chromosomes is maintained Each chromosome is evaluated from evaluation function to determine its “fitness" which determines how likely the chromosome is to survive and breed into the next generation New chromosomes are created from old chromosomes by the processes of Crossover, where parts of two different parent chromosomes are mixed to create offspring Mutation: where the bits of a single parent are randomly perturbed to create a child Choosing an appropriate evaluation function is an essential step for successful application of GAs to any problem domain

Minimum Redundancy Maximum Relevance Feature Selection This approach is based on recognizing that the combinations of individually good variables do not necessarily lead to good classification To maximize the joint dependency of top ranking variables on the target variable, the redundancy among them must be reduced, So we select maximally relevant variables and avoiding the redundant ones First, mutual information (MI) between the candidate variable and the target variable is calculated (relevance term) Then average MI between the candidate variable and the variables that are already selected is computed (redundancy term) The entropy-based mRMR score (higher it is for a feature, more that feature is needed) is obtained by subtracting the redundancy from relevance Both relevance and redundancy estimation are low dimensional problems (involves only 2 variables). This is much easier than directly estimating multivariate density or mutual information in the high dimensional space It only measures the quantity of redundancy between the candidate variables and the selected variables but does not deal with the type of this redundancy

References FEATURE SELECTION METHODS AND ALGORITHMS L.Ladha, Research Scholar, Department Of Computer Science, Sri Ramakrishna College Of Arts and Science for Women, Coimbatore, Tamilnadu, India Feature Selection: Evaluation, application and small sample performance, Anil Jaiin, Douglas Zongker Michigan State University USA Using covariates for improving the minimum Redundancy Maximum Relevance feature selection Method Olcay KURS¸UN1, C. Okan S¸AKAR2, Oleg FAVOROV3