1 Feature Selection Jamshid Shanbehzadeh, Samaneh Yazdani Department of Computer Engineering, Faculty Of Engineering, Khorazmi University (Tarbiat Moallem.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Chapter 5: Introduction to Information Retrieval
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Feature Grouping-Based Fuzzy-Rough Feature Selection Richard Jensen Neil Mac Parthaláin Chris Cornelis.
Ping-Tsun Chang Intelligent Systems Laboratory Computer Science and Information Engineering National Taiwan University Text Mining with Machine Learning.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Feature Selection Presented by: Nafise Hatamikhah
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Exploratory Data Mining and Data Preparation
Principal Component Analysis
Decision Tree Algorithm
Feature Selection for Regression Problems
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Three kinds of learning
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Feature Screening Concept: A greedy feature selection method. Rank features and discard those whose ranking criterions are below the threshold. Problem:
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
CS Instance Based Learning1 Instance Based Learning.
Jeff Howbert Introduction to Machine Learning Winter Machine Learning Feature Creation and Selection.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
A REVIEW OF FEATURE SELECTION METHODS WITH APPLICATIONS Alan Jović, Karla Brkić, Nikola Bogunović {alan.jovic, karla.brkic,
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Case Base Maintenance(CBM) Fabiana Prabhakar CSE 435 November 6, 2006.
Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.
Presented by Tienwei Tsai July, 2005
315 Feature Selection. 316 Goals –What is Feature Selection for classification? –Why feature selection is important? –What is the filter and what is the.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Universit at Dortmund, LS VIII
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Chapter Ⅳ. Categorization 2007 년 2 월 15 일 인공지능연구실 송승미 Text : THE TEXT MINING HANDBOOK Page. 64 ~ 81.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
An Efficient Greedy Method for Unsupervised Feature Selection
Image Classification for Automatic Annotation
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Data Mining, ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics Hitotsubashi, Chiyoda-ku Tokyo,
Feature Selection and Weighting using Genetic Algorithm for Off-line Character Recognition Systems Faten Hussein Presented by The University of British.
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
Data Mining and Decision Support
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
CLUSTERING HIGH-DIMENSIONAL DATA Elsayed Hemayed Data Mining Course.
Relevance Feedback in Image Retrieval System: A Survey Tao Huang Lin Luo Chengcui Zhang.
Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC Relevance Feedback for Image Retrieval.
Introduction to Machine Learning, its potential usage in network area,
Information Organization: Overview
Presented by Jingting Zeng 11/26/2007
Machine Learning Feature Creation and Selection
PEBL: Web Page Classification without Negative Examples
K Nearest Neighbor Classification
Feature Selection To avid “curse of dimensionality”
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Information Organization: Overview
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Presentation transcript:

1 Feature Selection Jamshid Shanbehzadeh, Samaneh Yazdani Department of Computer Engineering, Faculty Of Engineering, Khorazmi University (Tarbiat Moallem University of Teheran)

Dimension Reduction Feature Selection Application Of Feature Selection and Software Outline 2

Part 1: Dimension Reduction  Dimension  Feature Space  Definition & Goals  Curse of dimensionality  Research and Application  Grouping of dimension reduction methods Part 2: Feature selection  Parts of feature set  Feature Selection Approach 3 Part 3: Application Of Feature Selection and Software

Part 1: Dimension Reduction 4

Dimension Dimension Reduction Dimension Dimension (Feature or Variable): A measurement of a certain aspect of an object Two feature of person: weight hight 5

Dimension Reduction Feature Space Dimension Reduction Feature Space Feature Space: An abstract space where each pattern sample is represented as point 6

Large and high-dimensional data Web documents, etc… A large amount of resources are needed in  Information Retrieval  Classification tasks  Data Preservation etc… Dimension Reduction Introduction Dimension Reduction Introduction

Dimension Reduction Definition & Goals Dimension Reduction Definition & Goals Dimensionality reduction: The study of methods for reducing the number of dimensions describing the object General objectives of dimensionality reduction: Reduce the computational cost Improve the quality of data for efficient data-intensive processing tasks 8

Height (cm) Weight (kg) Dimension Reduction preserves information on classification of overweight and underweight as much as possible makes classification easier reduces data size ( 2 features  1 feature ) Dimension Reduction Definition & Goals Dimension Reduction Definition & Goals Class 1: overweight Class 2: underweight

Dimension Reduction Curse of dimensionality Dimension Reduction Curse of dimensionality As the number of dimension increases, a fix data sample becomes exponentially spars Example: Observe that the data become more and more sparse in higher dimensions Effective solution to the problem of “curse of dimensionality” is: Dimensionality reduction 10

Dimension Reduction Research and Application Dimension Reduction Research and Application Why dimension reduction is a subject of much research recently? Massive data of large dimensionality in: Knowledge discovery Text mining Web mining and... 11

Dimension Reduction Grouping of dimension reduction methods Dimension Reduction Grouping of dimension reduction methods Dimensionality reduction approaches include Feature Selection Feature Extraction 12

Dimension Reduction Grouping of dimension reduction methods : Feature Selection Dimension Reduction Grouping of dimension reduction methods : Feature Selection Dimensionality reduction approaches include Feature Selection: the problem of choosing a small subset of features that ideally are necessary and sufficient to describe the target concept. Example  Feature Set= {X,Y}  Two Class Goal: Classification  Feature X Or Feature Y ?  Answer: Feature X 13

14 Feature Selection (FS) Selects feature ex. Preserves weight Dimension Reduction Grouping of dimension reduction methods : Feature Selection Dimension Reduction Grouping of dimension reduction methods : Feature Selection

Dimension Reduction Grouping of dimension reduction methods Dimension Reduction Grouping of dimension reduction methods Dimensionality reduction approaches include Feature Extraction: Create new feature based on transformations or combinations of the original feature set.  Original Feature {X 1,X 2 } New Feature 15

16 Feature Extraction (FE) Generates feature ex.  Preserves weight / height Dimension Reduction Grouping of dimension reduction methods Dimension Reduction Grouping of dimension reduction methods

Dimension Reduction Grouping of dimension reduction methods Dimension Reduction Grouping of dimension reduction methods Dimensionality reduction approaches include Feature Extraction: Create new feature based on transformations or combinations of the original feature set.  N: Number of original features  M: Number of extracted features  M<N 17

Dimension Reduction Question: Feature Selection Or Feature Extraction Dimension Reduction Question: Feature Selection Or Feature Extraction Feature Selection Or Feature Extraction  It is depend on the problem. Example  Pattern recognition: problem of dimensionality reduction is to extract a small set of features that recovers most of the variability of the data.  Text mining: problem is defined as selecting a small subset of words or terms (not new features that are combination of words or terms).  Image Compression: problem is finding the best extracted features to describe the image 18

Part 2: Feature selection 19

Feature selection  Thousands to millions of low level features: select the most relevant one to build better, faster, and easier to understand learning machines. X n N m 20

21 Feature selection Parts of feature set  Irrelevant OR Relevant Feature selection Parts of feature set  Irrelevant OR Relevant Three disjoint categories of features:  Irrelevant  Weakly Relevant  Strongly Relevant

22 Feature selection Parts of feature set  Irrelevant OR Relevant Feature selection Parts of feature set  Irrelevant OR Relevant Goal: Classification Two Class : {Lion and Deer} We use some features to classify a new instance To which class does this animal belong

23 Feature selection Parts of feature set  Irrelevant OR Relevant Feature selection Parts of feature set  Irrelevant OR Relevant Goal: Classification Two Class : {Lion and Deer} We use some feature to classify a new instance Q: Number of legs? A: 4 So, number of legs is irrelevant feature  Feature 1: Number of legs

24 Feature selection Parts of feature set  Irrelevant OR Relevant Feature selection Parts of feature set  Irrelevant OR Relevant Goal: Classification Two Class : {Lion and Deer} We use some features to classify a new instance Q: What is its color? A: So, Color is an irrelevant feature  Feature 1: Number of legs  Feature 2: Color

25 Feature selection Parts of feature set  Irrelevant OR Relevant Feature selection Parts of feature set  Irrelevant OR Relevant Goal: Classification Two Class : {Lion and Deer} We use some features to classify a new instance Q: What does it eat? A: Grass So, Feature 3 is a relevant feature  Feature 1: Number of legs  Feature 2: Color  Feature 3: Type of food

26 Feature selection Parts of feature set  Irrelevant OR Relevant Feature selection Parts of feature set  Irrelevant OR Relevant Goal: Classification Three Class : {Lion, Deer and Leopard} We use some features to classify a new instance To which class does this animal belong

27 Feature selection Parts of feature set  Irrelevant OR Relevant Feature selection Parts of feature set  Irrelevant OR Relevant Goal: Classification Three Class : {Lion, Deer and Leopard} We use some features to classify a new instance Q: Number of legs? A: 4 So, number of legs is an irrelevant feature  Feature 1: Number of legs

28 Feature selection Parts of feature set  Irrelevant OR Relevant Feature selection Parts of feature set  Irrelevant OR Relevant Goal: Classification Three Class : {Lion, Deer and Leopard} We use some features to classify a new instance So, Color is a relevant feature Q: What is its color? A:  Feature 1: Number of legs  Feature 2: Color

29 Feature selection Parts of feature set  Irrelevant OR Relevant Feature selection Parts of feature set  Irrelevant OR Relevant Goal: Classification Three Class : {Lion and Deer and Leopard} We use some features to classify a new instance So, Feature 3 is a relevant feature Q: What does it eat? A: meat  Feature 1: Number of legs  Feature 2: Color  Feature3: Type of food

30 Feature selection Parts of feature set  Irrelevant OR Relevant Feature selection Parts of feature set  Irrelevant OR Relevant Goal: Classification Three Class : {Lion and Deer and Leopard} We use some feature to classify a new instance  Feature 1: Number of legs  Feature 2: Color  Feature3: Type of food  Add new feature: Felidae It is weakly relevant feature  Optimal set: {Color, Type of food} Or {Color, Felidae}

Feature selection Parts of feature set  Irrelevant OR Relevant Feature selection Parts of feature set  Irrelevant OR Relevant  Traditionally, feature selection research has focused on searching for relevant features. Irrelevant Relevant Feature set 31

32 Data set Five Boolean features C = F 1 ∨ F 2 F 3 = ┐ F 2, F 5 = ┐ F 4 Optimal subset: {F 1, F 2 } or {F 1, F 3 } Feature selection Parts of feature set  Irrelevant OR Relevant: An Example for the Problem Feature selection Parts of feature set  Irrelevant OR Relevant: An Example for the Problem F1F1 F2F2 F3F3 F4F4 F5F5 C

Feature selection Parts of feature set  Irrelevant OR Relevant Feature selection Parts of feature set  Irrelevant OR Relevant Formal Definition 1 (Irrelevance) :  Irrelevance indicates that the feature is not necessary at all.  In previous Example:  F 4, F 5 irrelevance F 4 and F 5 Relevant 33

Definition1(Irrelevance) A feature F i is irrelevant if Feature selection Parts of feature set  Irrelevant OR Relevant Feature selection Parts of feature set  Irrelevant OR Relevant 34 Irrelevance indicates that the feature is not necessary at all  F be a full set of features  F i a feature  S i = F −{F i }.

Feature selection Parts of feature set  Irrelevant OR Relevant Feature selection Parts of feature set  Irrelevant OR Relevant Categories of relevant features:  Strongly Relevant  Weakly Relevant Strongly IrrelevantWeaklyRelevant 35

36 Data set Five Boolean features C = F 1 ∨ F 2 F 3 = ┐ F 2, F 5 = ┐ F 4 Feature selection Parts of feature set  Irrelevant OR Relevant: An Example for the Problem Feature selection Parts of feature set  Irrelevant OR Relevant: An Example for the Problem F1F1 F2F2 F3F3 F4F4 F5F5 C

Feature selection Parts of feature set  Irrelevant OR Relevant Feature selection Parts of feature set  Irrelevant OR Relevant Formal Definition2 (Strong relevance) :  Strong relevance of a feature indicates that the feature is always necessary for an optimal subset  It cannot be removed without affecting the original conditional class distribution.  In previous Example:  Feature F 1 is strongly relevant F 4 and F 5 WeaklyF1F1 37

Feature selection Parts of feature set  Irrelevant OR Relevant Feature selection Parts of feature set  Irrelevant OR Relevant Definition 2 (Strong relevance) A feature F i is strongly relevant if 38  Strong relevance of a feature cannot be removed without affecting the original conditional class distribution

Feature selection Parts of feature set  Irrelevant OR Relevant Feature selection Parts of feature set  Irrelevant OR Relevant Formal Definition 3 (Weak relevance) :  Weak relevance suggests that the feature is not always necessary but may become necessary for an optimal subset at certain conditions.  In previous Example:  F 2, F 3 weakly relevant F 4 and F 5 F1F1 F 2 and F 3 39

Feature selection Parts of feature set  Irrelevant OR Relevant Feature selection Parts of feature set  Irrelevant OR Relevant Definition 3 (Weak relevance) A feature F i is weakly relevant if 40 Weak relevance suggests that the feature is not always necessary but may become necessary for an optimal subset at certain conditions.

41  Example: In order to determine the target concept (C=g(F1, F2)):  F 1 is indispensable  One of F 2 and F 3 can be disposed  Both F 4 and F 5 can be discarded. optimal subset: Either {F 1, F 2 } or {F 1, F 3 }  The goal of feature selection is to find either of them. Feature selection Parts of feature set  Optimal Feature Subset Feature selection Parts of feature set  Optimal Feature Subset

Feature selection Parts of feature set  Optimal Feature Subset Feature selection Parts of feature set  Optimal Feature Subset Conclusion An optimal subset should include all strongly relevant features, none of irrelevant features, and a subset of weakly relevant features. optimal subset: Either {F 1, F 2 } or {F 1, F 3 } which of weakly relevant features should be selected and which of them removed 42

Feature selection Parts of feature set  Redundancy Feature selection Parts of feature set  Redundancy Solution  Defining Feature Redundancy 43

Redundancy It is widely accepted that two features are redundant to each other if their values are completely correlated Feature selection Parts of feature set  Redundancy Feature selection Parts of feature set  Redundancy  In previous Example:  F 2, F 3 ( ) 44

Feature selection Parts of feature set  Redundancy Feature selection Parts of feature set  Redundancy Markov blanket It used when one feature is correlated with a set of features. Given a feature F i, let,M i is said to be a Markov blanket for F i if The Markov blanket condition requires that M i subsume not only the information that F i has about C, but also about all of the other features. 45

Feature selection Parts of feature set  Redundancy Feature selection Parts of feature set  Redundancy  Redundancy definition further divides weakly relevant features into redundant and non-redundant ones. Strongly IrrelevantWeakly IIIII II : Weakly relevant and redundant features III: Weakly relevant but non-redundant features Optimal Subset: Strongly relevant features +Weakly relevant but non-redundant features 46

Feature selection Approaches Feature selection Approaches Feature selection Approach Subset Evaluation Individual evaluation New Framework 47

Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) Framework of feature selection via subset evaluation 48

49 Generation Evaluation Stopping Criterion Validation Original Feature Set Subset Goodness of the subset No Yes Generates subset of features for evaluation Can start with: no features all features random subset of features Subset Generation Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) Feature selection Approaches : Subset Evaluation ( Feature Subset Selection )

50 Examine all combinations of feature subset. Example: {f1,f2,f3} => { {f1},{f2},{f3},{f1,f2},{f1,f3},{f2,f3},{f1,f2,f3} } Order of the search space O(2 d ), d - # feature. Optimal subset is achievable. Too expensive if feature space is large. Subset search method - Exhaustive Search Example Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) Feature selection Approaches : Subset Evaluation ( Feature Subset Selection )

51 Generation Evaluation Stopping Criterion Validation Original Feature Set Subset Goodness of the subset No Yes Measures the goodness of the subset Compares with the previous best subset if found better, then replaces the previous best subset Subset Evaluation Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) Feature selection Approaches : Subset Evaluation ( Feature Subset Selection )

52 Each feature and feature subset needs to be evaluated based on importance by a criterion. The existing feature selection algorithms, based on criterion functions used in searching for informative features can be generally categorized as:  Filter model  Wrapper model  Embedded methods Note: Different criteria may select different features. Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) Subset Evaluation

53 Filter The filter approach utilizes the data alone to decide which features should be kept, without running the learning algorithm. The filter approach basically pre-selects the features, and then applies the selected feature subset to the clustering algorithm. Evaluation function <> Classifier Ignored effect of selected subset on the performance of classifier. Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) Feature selection Approaches : Subset Evaluation ( Feature Subset Selection )

54 Filter (1)- Independent Criterion  Some popular independent criteria are  Distance measures (Euclidean distance measure).  Information measures (Entropy, Information gain, etc.)  Dependency measures (Correlation coefficient)  Consistency measures Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) Feature selection Approaches : Subset Evaluation ( Feature Subset Selection )

55 Wrappers In wrapper methods, the performance of a learning algorithm is used to evaluate the goodness of selected feature subsets.  Evaluation function = classifier  Take classifier into account. Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) Feature selection Approaches : Subset Evaluation ( Feature Subset Selection )

56 Wrappers (2)  Wrappers utilize a learning machine as a “black box” to score subsets of features according to their predictive power. Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) Feature selection Approaches : Subset Evaluation ( Feature Subset Selection )

57 Filters Advantages  Fast execution : Filters generally involve a non-iterative computation on the dataset, which can execute much faster than a classifier training session  Generality : Since filters evaluate the intrinsic properties of the data, rather than their interactions with a particular classifier, their results exhibit more generality: the solution will be “good” for a larger family of classifiers Disadvantages  The main disadvantage of the filter approach is that it totally ignores the effects of the selected feature subset on the performance of the induction algorithm Filters vs. Wrappers Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) Feature selection Approaches : Subset Evaluation ( Feature Subset Selection )

58 Filters vs. Wrappers Wrappers Advantages  Accuracy: wrappers generally achieve better recognition rates than filters since they are tuned to the specific interactions between the classifier and the dataset Disadvantages  Slow execution : since the wrapper must train a classifier for each feature subset (or several classifiers if cross-validation is used), the method can become infeasible for computationally intensive methods  Lack of generality : the solution lacks generality since it is tied to the bias of the classifier used in the evaluation function. Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) Feature selection Approaches : Subset Evaluation ( Feature Subset Selection )

Yes, stop! All features Eliminate useless feature(s) Train SVM Eliminate useless feature(s) Train SVM Performance degradation? Train SVM Eliminate useless feature(s) Train SVM Train SVM Eliminate useless feature(s) Eliminate useless feature(s) No, continue… Recursive Feature Elimination (RFE) SVM. Guyon-Weston, US patent 7,117,188 Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) Embedded methods

60 Generation Evaluation Stopping Criterion Validation Original Feature Set Subset Goodness of the subset No Yes Based on Generation rcdure: Pre-defined number of features Pre-defined number of iterations Based on Evaluation Function: whether addition or deletion of a feature does not produce a better subset whether optimal subset based on some evaluation function is achieved Stopping Criterion Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) Feature selection Approaches : Subset Evaluation ( Feature Subset Selection )

61 Generation Evaluation Stopping Criterion Validation Original Feature Set Subset Goodness of the subset No Yes Basically not part of the feature selection process itself - compare results with already established results or results from competing feature selection methods Result Validation Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) Feature selection Approaches : Subset Evaluation ( Feature Subset Selection )

62 Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) A feature subset selected by this approach approximates the optimal subset: Subset Evaluation: Advantage Strongly IrrelevantWeakly IIIII II : Weakly relevant and redundant features III: Weakly relevant but non-redundant features Optimal Subset: Strongly relevant features +Weakly relevant but non-redundant features

63 Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) Feature selection Approaches : Subset Evaluation ( Feature Subset Selection ) Subset Evaluation: Disadvantages High computational cost of the subset search makes subset evaluation approach inefficient for high dimensional data.

Feature selection Approaches Feature selection Approaches Feature selection Approach Subset Evaluation Individual evaluation New Framework 64

65 Individual method (Feature Ranking / Feature weighting) Individual methods evaluate each feature individually according to a criterion. They then select features, which either satisfy a condition or are top-ranked.  Exhaustive, greedy and random searches are subset search methods because they evaluate each candidate subset. Feature selection Approaches : Individual Evaluation (Feature Weighting/Ranking) Feature selection Approaches : Individual Evaluation (Feature Weighting/Ranking)

66 linear time complexity in terms of dimensionality N. Individual method is efficient for high-dimensional data. Individual Evaluation: Advantage Feature selection Approaches : Individual Evaluation (Feature Weighting/Ranking) Feature selection Approaches : Individual Evaluation (Feature Weighting/Ranking)

67 Individual Evaluation: Disadvantages Feature selection Approaches : Individual Evaluation (Feature Weighting/Ranking) Feature selection Approaches : Individual Evaluation (Feature Weighting/Ranking) It is incapable of removing redundant features. For high-dimensional data which may contain a large number of redundant features, this approach may produce results far from optimal. Strongly IrrelevantWeakly IIIII Select= Weakly + Strongly

Feature selection Approaches Feature selection Approaches Feature selection Approach Subset Evaluation Individual evaluation New Framework 68

Feature selection New Framework Feature selection New Framework 69 New framework of feature selection composed of two steps: First Step (Relevance analysis): determines the subset of relevant features by removing irrelevant ones. Second Step (redundancy analysis): determines and eliminates redundant features from relevant ones and thus produces the final subset. New Framework

Part 3: Applications of Feature Selection And Software 70

71 Feature selection Applications of Feature Selection Feature selection Applications of Feature Selection

Internet    Information explosive  80% information stored in text documents: journals, web pages, s...  Difficult to extract special information  Current technologies... Feature selection Applications of Feature Selection  Text categorization: Importance Feature selection Applications of Feature Selection  Text categorization: Importance

73 Feature selection Applications of Feature Selection  Text categorization Feature selection Applications of Feature Selection  Text categorization Assigning documents to a fixed set of categories News article categorizer sports cultures health politics economics vacations

74 Text-Categorization Documents are represented by a vector of dimension the size of the vocabulary containing word frequency counts Vocabulary ~ words (i.e. each document is represented by a dimensional vector) Typical tasks: - Automatic sorting of documents into web-directories - Detection of spam- Feature selection Applications of Feature Selection  Text categorization Feature selection Applications of Feature Selection  Text categorization

75 Feature selection Applications of Feature Selection  Text categorization Feature selection Applications of Feature Selection  Text categorization Major characteristic, or difficulty of text categorization: High dimensionality of the feature space Goal: Reduce the original feature space without sacrificing categorization accuracy

76 Feature selection Applications of Feature Selection  Image retrieval Feature selection Applications of Feature Selection  Image retrieval Importance: R apid increase of the size and amount of image collections from both civilian and military equipments Problem: Cannot access to or make use of the information unless it is organized. Content-based image retrieval: Instead of being manually annotated by text-based keywords, images would be indexed by their own visual contents (features), such as color, texture, shape, etc. One of the biggest problems to make content-based image retrieval truly scalable to large size image collections is still the “curse of dimensionality

77 Paper: ReliefF Based Feature Selection In Content-Based Image Retrieval A. sarrafzadeh, Habibollah Agh Atabay, Mir Mosen Pedram, Jamshid Shanbehzadeh Feature selection Applications of Feature Selection  Image retrieval Feature selection Applications of Feature Selection  Image retrieval Image dataset: Coil-20 contains : 1440 grayscale pictures from 20 classes of objects.

78 Feature selection Applications of Feature Selection  Image retrieval Feature selection Applications of Feature Selection  Image retrieval In this paper They use : Legendre moments to extract features ReliefF algorithm to select the most relevant and non-redundant features Support vector machine to classify images. The effects of features on classification accuracy

Weka is a piece of software, written in Java, that provides an array of machine learning tools, many of which can be used for data mining Pre-processing data Features selection Features extraction Regression Classify data Clustering data Associate rules More functions Create random data set Connect data sets in other formats Visualize data ……. Feature selection Weka Software: What we can do with ? Feature selection Weka Software: What we can do with ?

80 References [1] M. Dash and H.Liu, “Dimensionality Reduction, in Encyclopedia of Computer Science and Engineering,” John Wiley & Sons, Inc 2, , [2]H. Liu and L. Yu, "Toward Integrating Feature Selection Algorithms for Classification and Clustering", presented at IEEE Trans. Knowl. Data Eng, vol. 17, no.4, pp , [3]I.Guyon and A.Elisseeff, "An introduction to variable and feature selection", Journal of Machine Learning Research 3, 1157–1182, [4] L. Yu and H. Liu, “Efficient Feature Selection via Analysis of Relevance and Redundancy", presented at Journal of Machine Learning Research, vol. 5, pp , [5] H. Liu, and H. Motoda, "Computational methods of feature selection", Chapman and Hall/CRC Press, [6] I.Guyon, Lecture 2: Introduction to Feature Selection. [7] M.Dash and H.liu, Feature selection for classification.

81 References [8] Makoto Miwa, A Survey on Incremental Feature Extraction [9] Lei Yu, Feature Selection and Its Application in Genomic Data Analysis