1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University.

Slides:



Advertisements
Similar presentations
Lazy Paired Hyper-Parameter Tuning
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
DECISION TREES. Decision trees  One possible representation for hypotheses.
Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.
Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach Xiaoli Zhang Fern, Carla E. Brodley ICML’2003 Presented by Dehong Liu.
Self Organization of a Massive Document Collection
Feature Selection Presented by: Nafise Hatamikhah
WRSTA, 13 August, 2006 Rough Sets in Hybrid Intelligent Systems For Breast Cancer Detection By Aboul Ella Hassanien Cairo University, Faculty of Computer.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Exploratory Data Mining and Data Preparation
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Feature Selection for Regression Problems
Three kinds of learning
Dimension Reduction and Feature Selection Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Sparsity, Scalability and Distribution in Recommender Systems
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin The Chinese.
CS Instance Based Learning1 Instance Based Learning.
Jeff Howbert Introduction to Machine Learning Winter Machine Learning Feature Creation and Selection.
Evaluating Performance for Data Mining Techniques
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
A REVIEW OF FEATURE SELECTION METHODS WITH APPLICATIONS Alan Jović, Karla Brkić, Nikola Bogunović {alan.jovic, karla.brkic,
Feature Selection.
1 Mining with Noise Knowledge: Error Awareness Data Mining Xindong Wu Department of Computer Science University of Vermont, USA; Hong Kong Polytechnic.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
1 Local search and optimization Local search= use single current state and move to neighboring states. Advantages: –Use very little memory –Find often.
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
Fuzzy Entropy based feature selection for classification of hyperspectral data Mahesh Pal Department of Civil Engineering National Institute of Technology.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
Particle Filters for Shape Correspondence Presenter: Jingting Zeng.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Data Mining and Machine Learning Lab Unsupervised Feature Selection for Linked Social Media Data Jiliang Tang and Huan Liu Computer Science and Engineering.
Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department.
1 Effective Feature Selection Framework for Cluster Analysis of Microarray Data Gouchol Pok Computer Science Dept. Yanbian University China Keun Ho Ryu.
An Efficient Greedy Method for Unsupervised Feature Selection
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
© 2002 IBM Corporation IBM Research 1 Policy Transformation Techniques in Policy- based System Management Mandis Beigi, Seraphin Calo and Dinesh Verma.
Improving Support Vector Machine through Parameter Optimized Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo , China { brj,
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Data Mining and Decision Support
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Feature Selection Poonam Buch. 2 The Problem  The success of machine learning algorithms is usually dependent on the quality of data they operate on.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Unsupervised Streaming Feature Selection in Social Media
CLUSTERING HIGH-DIMENSIONAL DATA Elsayed Hemayed Data Mining Course.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Friday, Febuary 2, 2001 Presenter:Ajay.
Feature Selection Methods Part-I By: Dr. Rajeev Srivastava IIT(BHU), Varanasi.
Dr. Gheith Abandah 1.  Feature selection is typically a search problem for finding an optimal or suboptimal subset of m features out of original M features.
Data Mining 2, Filter methods T statistic Information Distance Correlation Separability …
3/13/2016Data Mining 1 Lecture 1-2 Data and Data Preparation Phayung Meesad, Ph.D. King Mongkut’s University of Technology North Bangkok (KMUTNB) Bangkok.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Experience Report: System Log Analysis for Anomaly Detection
Feature Selection: Algorithms and Challenges
Information Management course
Advanced Artificial Intelligence Feature Selection
Machine Learning Feature Creation and Selection
Feature Selection To avid “curse of dimensionality”
A Unifying View on Instance Selection
ECE539 final project Instructor: Yu Hen Hu Fall 2005
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Chapter 7: Transformations
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Microarray Data Set The microarray data set we are dealing with is represented as a 2d numerical array.
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Presentation transcript:

1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University of Technology, China 合肥工业大学计算机应用长江学者讲座教授

2 Deduction Induction: My Research Background 1988 Expert Systems Expert Systems 2004 … …

3 Outlines 1. Why feature selection 2. What is feature selection 3. Components of feature selection 4. Some research efforts by myself 5. Challenges in feature selection

4 1. Why Feature Selection?  High-dimensional data often contain irrelevant or redundant features reduce the accuracy of data mining algorithms slow down the mining process be a problem in storage and retrieval hard to interpret

5 2. What Is Feature Selection?  Select the most “relevant” subset of attributes according to some selection criteria.

6 Outlines 1. Why feature selection 2. What is feature selection 3. Components of feature selection 4. Some research efforts by myself 5. Challenges in feature selection

7 Traditional Taxonomy  Wrapper approach Features are selected as part of the mining algorithm  Filter approach Features selected before a mining algorithm,using heuristics based on general characteristics of the data, rather than a learning algorithm to evaluate the merit of feature subsets  Wrapper approach is generally more accurate but also more computationally expensive.

8 Components of Feature Selection  Feature selection is actually a search problem, including four basic components: 1. an initial subset 2. one or more selection criteria ( * ) 3. a search strategy ( * ) 4. some given stopping conditions

9 Feature Selection Criteria  Selection criteria generally use “relevance” to estimate the goodness of a selected feature subset in one way or another : Distance Measure Information Measure Information Measure Inconsistency Measure Relevance Estimation Selection Criteria related to Learning Algorithms (wrapper approach)  Some unified framework for relevance has been proposed recently.

10 Search Strategy  Exhaustive Search Every possible subset is evaluated and the best one is chosen  Guarantee the optimal solution  Low efficiency  A modified approach: B&B

11 Search Strategy (2)  Heuristic search Sequential search, including SFS,SFFS,SBS and SBFS SFS: Start with empty attribute set  Add “best” of attributes  Add “best” of remaining attributes  Repeat until the maximum performance is reached SBS: Start with the entire attribute set  Remove “worst” of attributes  Repeat until the maximum performance has been reached.

12 Search Strategy (3)  Random search It proceeds in two different ways Inject randomness into classical sequential approaches (simulated annealing, beam search, the genetic algorithm, and random-start hill-climbing) Generate the next subset randomly The use of randomness can help to escape local optima in the search space, and the optimality of the selected subset would depend on the available resources.

13 Outlines 1. Why feature selection 2. What is feature selection 3. Components of feature selection 4. Some research efforts by myself 5. Challenges in feature selection

14 RITIO: Rule Induction Two In One  Feature selection using the information gain in a reverse order  Delete features that are lest informative  Results are significant compared to forward selection  [Wu et al 1999, TKDE].

15 Induction as Pre-processing  Use one induction algorithm to select attributes for another induction algorithm Can be a decision-tree method for rule induction, or vice versa  Accuracy results are not as good as expected  Reason: feature selection normally causes information loss  Details: [Wu 1999, PAKDD].

16 Subspacing with Asysmetric Bagging  When the number of examples is less than the number of attributes  When the number of positive examples is smaller than the number of negative examples  An example: content-based information retrieval  Details: [Tao et al., 2006, TPAMI].

17 Outlines 1. Why feature selection 2. What is feature selection 3. Components of feature selection 4. Some research efforts by myself 5. Challenges in feature selection

18 Challenges in Feature Selection (1)  Dealing with ultra-high dimensional data and feature interactions Traditional feature selection encounter two major problems when the dimensionality runs into tens or hundreds of thousands: 1. curse of dimensionality 2. the relative shortage of instances.

19 Challenges in Feature Selection (2)  Dealing with active instances (Liu et al., 2005) When the dataset is huge, feature selection performed on the whole dataset is inefficient, so instance selection is necessary: Random sampling (pure random sampling without exploiting any data characteristics) Active feature selection (selective sampling using data characteristics achieves better or equally good results with a significantly smaller number of instances).

20 Challenges in Feature Selection (3)  Dealing with new data types (Liu et al., 2005) traditional data type: an N*M data matrix Due to the growth of computer and Internet/Web techniques, new data types are emerging: text-based data (e.g., s, online news, newsgroups) semistructure data (e.g., HTML, XML) data streams.

21 Challenges in Feature Selection (4)  Unsupervised feature selection Feature selection vs classification: almost every classification algorithm Subspace method with the curse of dimensionality in classification Subspace clustering.

22 Challenges in Feature Selection (5)  Dealing with predictive-but-unpredictable attributes in noisy data Attribute noise is difficult to process, and removing noisy instances is dangerous Predictive attributes: essential to classification Unpredictable attributes: cannot be predicted by the class and other attributes  Noise identification, cleansing, and measurement need special attention [Yang et al., 2004]

23 Challenges in Feature Selection (6)  Deal with inconsistent and redundant features Redundancy can indicate reliability Inconsistency can also indicate a problem for handling  Researchers in Rough Set Theory: What is the purpose of feature selection? Can you really demonstrate the usefulness of reduction, in data mining accuracy, or what? Removing attributes can well result in information loss When the data is very noisy, removals can cause a very different data distribution Discretization can possibly bring new issues.

24 Concluding Remarks  Feature selection is and will remain an important issue in data mining, machine learning, and related disciplines  Feature selection has a price in accuracy for efficiency  Researchers need to have the bigger picture in mind, not just doing selection for the purpose of feature selection.