Feature Grouping-Based Fuzzy-Rough Feature Selection Richard Jensen Neil Mac Parthaláin Chris Cornelis.

Slides:



Advertisements
Similar presentations
G5BAIM Artificial Intelligence Methods
Advertisements

Sequential Three-way Decision with Probabilistic Rough Sets Supervisor: Dr. Yiyu Yao Speaker: Xiaofei Deng 18th Aug, 2011.
 Over-all: Very good idea to use more than one source. Good motivation (use of graphics). Good use of simplified, loosely defined -- but intuitive --
Conceptual Clustering
Sequential Minimal Optimization Advanced Machine Learning Course 2012 Fall Semester Tsinghua University.
Random Forest Predrag Radenković 3237/10
_ Rough Sets. Basic Concepts of Rough Sets _ Information/Decision Systems (Tables) _ Indiscernibility _ Set Approximation _ Reducts and Core _ Rough Membership.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Feature Selection Presented by: Nafise Hatamikhah
A New Biclustering Algorithm for Analyzing Biological Data Prashant Paymal Advisor: Dr. Hesham Ali.
Exploratory Data Mining and Data Preparation
Scaling Content Based Image Retrieval Systems Christine Lo, Sushant Shankar, Arun Vijayvergiya CS 267.
Data classification based on tolerant rough set reporter: yanan yean.
Proteomic Mass Spectrometry
August 2005RSFDGrC 2005, Regina, Canada 1 Feature Selection Based on Relative Attribute Dependency: An Experimental Study Jianchao Han 1, Ricardo Sanchez.
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Radford M. Neal and Jianguo Zhang the winners.
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
Richard Jensen and Qiang Shen Prof Qiang Shen Aberystwyth University, UK Dr. Richard Jensen Aberystwyth University, UK Interval-valued.
Richard Jensen, Chris Cornelis and Qiang Shen Dr. Chris Cornelis Ghent University, Belgium Dr. Richard Jensen Aberystwyth University,
CS Instance Based Learning1 Instance Based Learning.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Jeff Howbert Introduction to Machine Learning Winter Machine Learning Feature Creation and Selection.
ROUGH SET THEORY AND FUZZY LOGIC BASED WAREHOUSING OF HETEROGENEOUS CLINICAL DATABASES Yiwen Fan.
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
Basic Data Mining Technique
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
3. Rough set extensions  In the rough set literature, several extensions have been developed that attempt to handle better the uncertainty present in.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
CSC 196k Semester Project: Instance Based Learning
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Richard Jensen, Andrew Tuson and Qiang Shen Qiang Shen Aberystwyth University, UK Richard Jensen Aberystwyth University, UK Andrew Tuson City University,
Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Feature Selection and Weighting using Genetic Algorithm for Off-line Character Recognition Systems Faten Hussein Presented by The University of British.
Improving Support Vector Machine through Parameter Optimized Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo , China { brj,
Computational Intelligence: Methods and Applications Lecture 33 Decision Tables & Information Theory Włodzisław Duch Dept. of Informatics, UMK Google:
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Reporter: Shau-Shiang Hung( 洪紹祥 ) Adviser:Shu-Chen Cheng( 鄭淑真 ) Date:99/06/15.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
The Canopies Algorithm from “Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching” Andrew McCallum, Kamal Nigam, Lyle.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Richard Jensen and Chris Cornelis Chris Cornelis Chris Cornelis Ghent University, Belgium Richard Jensen Richard Jensen Aberystwyth University, UK Fuzzy-Rough.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
A distributed PSO – SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008.
Introduction to Machine Learning, its potential usage in network area,
Chapter 14: System Protection
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Rule Induction for Classification Using
Data Mining (and machine learning)
Vincent Granville, Ph.D. Co-Founder, DSC
Machine Learning Feature Creation and Selection
A Unifying View on Instance Selection
Machine Learning: Lecture 3
CSCI N317 Computation for Scientific Applications Unit Weka
What is The Optimal Number of Features
MAPO: Mining and Recommending API Usage Patterns
Donghui Zhang, Tian Xia Northeastern University
Lecture 16. Classification (II): Practical Considerations
Presentation transcript:

Feature Grouping-Based Fuzzy-Rough Feature Selection Richard Jensen Neil Mac Parthaláin Chris Cornelis

Outline Motivation/Feature Selection (FS) Rough set theory Fuzzy-rough feature selection Feature grouping Experimentation

The problem: too much data The amount of data is growing exponentially – Staggering 4300% annual growth in global data Therefore, there is a need for FS and other data reduction methods – Curse of dimensionality: a problem for machine learning techniques The complexity of the problem is vast – (e.g. the powerset of features for FS)

Feature selection Remove features that are: – Noisy – Irrelevant – Misleading Task: find a subset that – Optimises a measure of subset goodness – Has small/minimal cardinality In rough set theory, this is a search for reducts – Much research in this area

Rough set theory (RST) For a subset of features P Upper approximation Set X Lower approximation Equivalence class [x] P

Rough set feature selection By considering more features, concepts become easier to define…

Rough set theory Problems: – Rough set methods (usually) require data discretization beforehand – Extensions require thresholds, e.g. tolerance rough sets – Also no flexibility in approximations E.g. objects either belong fully to the lower (or upper) approximation, or not at all

Fuzzy-rough sets Extends rough set theory – Use of fuzzy tolerance instead of crisp equivalence – Approximations are fuzzified – Collapses to traditional RST when data is crisp New definitions: Fuzzy upper approximation: Fuzzy lower approximation:

Fuzzy-rough feature selection Search for reducts – Minimal subsets of features that preserve the fuzzy lower approximations for all decision concepts Traditional approach – Greedy hill-climbing algorithm used – Other search techniques have been applied (e.g. PSO) Problems – Complexity is problematic for large data (e.g. over several thousand features) – No explicit handling of redundancy

Feature grouping Idea: don’t need to consider all features – Those that are highly correlated with each other carry the same or similar information – Therefore, we can group these, and work on a group by group basis This paper: based on greedy hill-climbing – Group-then-rank approach Relevancy and redundancy handled by – Correlation: similar features grouped together – Internal ranking (correlation with decision feature) F1F1

Forming groups of features Calculate correlations F1F1 F1F1 F2F2 F2F2 F3F3 F3F3 FnFn FnFn... #1 f 3 #2 f 12 #3 f 1 … #m f n #1 f 3 #2 f 12 #3 f 1 … #m f n #1 f #2 f #3 f … #m f n #1 f #2 f #3 f … #m f n #1 f #2 f #3 f … #m f n #1 f #2 f #3 f … #m f n #1 f #2 f #3 f … #m f n #1 f #2 f #3 f … #m f n Feature groups Internally-ranked feature groups Correlation measure Threshold : Redundancy Relevancy Data τ

... Selecting features Feature subset search and selection Search mechanism Subset evaluation Selected subset(s)

Fuzzy-rough feature grouping

Initial experimentation Setup: – 10 datasets ( features) – 3 classifiers – Stratified 5 x 10-fold cross-validation Performance evaluation in terms of – Subset size – Classification accuracy – Execution time FRFG compared with – Traditional greedy hill-climber (GHC) – GA & PSO (200 generations, population size: 40)

Results: average subset size

Results: classification accuracy JRip IBk (k=3)

Results: execution times (s)

Conclusion FRFG – Motivation: reduce computational overhead; improve consideration of redundancy – Group-then-rank approach – Parameter determines granularity of grouping – Weka implementation available: Future work – Automatic determination of parameter τ – Experimentation using much larger data, other FS methods, etc – Clustering of features – Unsupervised selection?

Thank you!

Simple example Dataset of six features After initialisation, the following groups are formed Within each group, rank determines relevance: e.g. f 4 more relevant than f 3 Ordering of groups Greedy hill-climber F1F1 F2F2 F3F3 F4F4 etc… {F 4, F 1, F 3, F 5, F 2, F 6 }F =

Simple example... First group to be considered: F 4 – Feature f 4 is preferable over others – So, add this to current (initially empty) subset R – Evaluate M(R + {f 4 }): If better score than the current best evaluation, store f 4 Current best evaluation = M(R + {f 4 }) – Set of features which appear in F 4 : ({f 1, f 4, f 5 }) Add to the set Avoids Next feature group with elements that do not appear in Avoids: F 1 And so on… F4F4 F1F1