Richard Jensen and Qiang Shen Prof Qiang Shen Aberystwyth University, UK Dr. Richard Jensen Aberystwyth University, UK Interval-valued.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Fuzzy-Rough Feature Significance for Fuzzy Decision Trees Advanced Reasoning Group Department of Computer Science The University of Wales, Aberystwyth.
Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006.
DECISION TREES. Decision trees  One possible representation for hypotheses.
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
_ Rough Sets. Basic Concepts of Rough Sets _ Information/Decision Systems (Tables) _ Indiscernibility _ Set Approximation _ Reducts and Core _ Rough Membership.
Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Feature Grouping-Based Fuzzy-Rough Feature Selection Richard Jensen Neil Mac Parthaláin Chris Cornelis.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Decision Tree.
Minimum Redundancy and Maximum Relevance Feature Selection
Huge Raw Data Cleaning Data Condensation Dimensionality Reduction Data Wrapping/ Description Machine Learning Classification Clustering Rule Generation.
Rough Sets Theory Speaker:Kun Hsiang.
Feature Selection Presented by: Nafise Hatamikhah
The Analysis and Design of Approximation Algorithms for the Maximum Induced Planar Subgraph Problem Kerri Morgan Supervisor: Dr. G. Farr.
_ Rough Sets. Basic Concepts of Rough Sets _ Information/Decision Systems (Tables) _ Indiscernibility _ Set Approximation _ Reducts and Core.
A Study on Feature Selection for Toxicity Prediction*
Decision Tree Algorithm
Feature Selection for Regression Problems
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
1 Fuzzy Signatures in SARS Student: Bai Qifeng Client: Prof. Tom Gedeon.
Mining Hierarchical Decision Rules from Hybrid Data with Categorical and Continuous Valued Attributes Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua.
Proteomic Mass Spectrometry
August 2005RSFDGrC 2005, Regina, Canada 1 Feature Selection Based on Relative Attribute Dependency: An Experimental Study Jianchao Han 1, Ricardo Sanchez.
Classification Continued
Richard Jensen, Chris Cornelis and Qiang Shen Dr. Chris Cornelis Ghent University, Belgium Dr. Richard Jensen Aberystwyth University,
Unsupervised Rough Set Classification Using GAs Reporter: Yanan Yean.
ROUGH SET THEORY AND FUZZY LOGIC BASED WAREHOUSING OF HETEROGENEOUS CLINICAL DATABASES Yiwen Fan.
1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University.
Data Mining Chun-Hung Chou
A Simple Method to Extract Fuzzy Rules by Measure of Fuzziness Jieh-Ren Chang Nai-Jian Wang.
On Applications of Rough Sets theory to Knowledge Discovery Frida Coaquira UNIVERSITY OF PUERTO RICO MAYAGÜEZ CAMPUS
Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth
CSE & CSE6002E - Soft Computing Winter Semester, 2011 More Rough Sets.
Classifying Attributes with Game- theoretic Rough Sets Nouman Azam and JingTao Yao Department of Computer Science University of Regina CANADA S4S 0A2
Fuzzy Entropy based feature selection for classification of hyperspectral data Mahesh Pal Department of Civil Engineering National Institute of Technology.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
3. Rough set extensions  In the rough set literature, several extensions have been developed that attempt to handle better the uncertainty present in.
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Reducing the Response Time for Data Warehouse Queries Using Rough Set Theory By Mahmoud Mohamed Al-Bouraie Yasser Fouad Mahmoud Hassan Wesam Fathy Jasser.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
Richard Jensen, Andrew Tuson and Qiang Shen Qiang Shen Aberystwyth University, UK Richard Jensen Aberystwyth University, UK Andrew Tuson City University,
Peter Scully Investigating Rough Set Feature Selection for Gene Expression Analysis.
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
Summary „Rough sets and Data mining” Vietnam national university in Hanoi, College of technology, Feb.2006.
Improving Support Vector Machine through Parameter Optimized Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo , China { brj,
Data Mining and Decision Support
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Richard Jensen and Chris Cornelis Chris Cornelis Chris Cornelis Ghent University, Belgium Richard Jensen Richard Jensen Aberystwyth University, UK Fuzzy-Rough.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Rough Sets, Their Extensions and Applications 1.Introduction  Rough set theory offers one of the most distinct and recent approaches for dealing with.
國立雲林科技大學 National Yunlin University of Science and Technology Semantics-preserving dimensionality reduction rough and fuzzy rough based approaches Author.
More Rough Sets.
Combining Bagging and Random Subspaces to Create Better Ensembles
Rough Sets.
Sahar Cherif, Nesrine Baklouti , Adel Alimi and Vaclav Snasel
Rough Sets.
Pawan Lingras and Cory Butz
A Unifying View on Instance Selection
CSCI N317 Computation for Scientific Applications Unit Weka
Dependencies in Structures of Decision Tables
Avoid Overfitting in Classification
Presentation transcript:

Richard Jensen and Qiang Shen Prof Qiang Shen Aberystwyth University, UK Dr. Richard Jensen Aberystwyth University, UK Interval-valued Fuzzy-Rough Feature Selection in Datasets with Missing Values Interval-valued Fuzzy-Rough Feature Selection in Datasets with Missing Values FUZZ-IEEE 2009

Richard Jensen and Qiang Shen Outline The importance of feature selectionThe importance of feature selection Rough set theoryRough set theory Fuzzy-rough feature selection (FRFS)Fuzzy-rough feature selection (FRFS) Interval-valued FRFSInterval-valued FRFS ExperimentationExperimentation ConclusionConclusion

Richard Jensen and Qiang Shen Why dimensionality reduction/feature selection?Why dimensionality reduction/feature selection? Growth of information - need to manage this effectivelyGrowth of information - need to manage this effectively Curse of dimensionality - a problem for machine learningCurse of dimensionality - a problem for machine learning Data visualisation - graphing dataData visualisation - graphing data High dimensional data Dimensionality Reduction Low dimensional data Processing System Intractable Feature selection

Richard Jensen and Qiang Shen Feature selection Feature selection (FS) is a DR technique that preserves data semantics (meaning of data)Feature selection (FS) is a DR technique that preserves data semantics (meaning of data) Subset generation: forwards, backwards, random…Subset generation: forwards, backwards, random… Evaluation function: determines ‘goodness’ of subsetsEvaluation function: determines ‘goodness’ of subsets Stopping criterion: decide when to stop subset searchStopping criterion: decide when to stop subset search Generation Evaluation Stopping Criterion Validation Feature set Subset Subset suitability ContinueStop

Richard Jensen and Qiang Shen Rough set theory Rx is the set of all points that are indiscernible with point x in terms of feature subset B UpperApproximation Set A LowerApproximation Equivalence class Rx

Richard Jensen and Qiang Shen Rough set feature selection Attempts to remove unnecessary or redundant featuresAttempts to remove unnecessary or redundant features Evaluation: function based on rough set concept of lower approximationEvaluation: function based on rough set concept of lower approximation Generation: greedy hill-climbing algorithm employedGeneration: greedy hill-climbing algorithm employed Stopping criterion: when maximum evaluation value is reachedStopping criterion: when maximum evaluation value is reached

Richard Jensen and Qiang Shen 7 Fuzzy-rough sets Fuzzy-rough set Fuzzy similarity

Richard Jensen and Qiang Shen Fuzzy-rough sets Fuzzy-rough feature selectionFuzzy-rough feature selection Evaluation: function based on fuzzy-rough lower approximationEvaluation: function based on fuzzy-rough lower approximation Generation: greedy hill-climbingGeneration: greedy hill-climbing Stopping criterion: when maximal ‘goodness’ is reached (or to degree α)Stopping criterion: when maximal ‘goodness’ is reached (or to degree α) Problem #1: how to choose fuzzy similarity?Problem #1: how to choose fuzzy similarity? Problem #2: how to handle missing values?Problem #2: how to handle missing values?

Richard Jensen and Qiang Shen Interval-valued FRFS IV fuzzy rough set IV fuzzy similarity Answer #1: Model uncertainty in fuzzy similarity by interval-valued similarityAnswer #1: Model uncertainty in fuzzy similarity by interval-valued similarity

Richard Jensen and Qiang Shen Interval-valued FRFS Missing values When comparing two object values for a given attribute – what to do if at least one is missing?When comparing two object values for a given attribute – what to do if at least one is missing? Answer #2: Model missing values via the unit intervalAnswer #2: Model missing values via the unit interval

Richard Jensen and Qiang Shen Other measures Boundary regionBoundary region Discernibility functionDiscernibility function

Richard Jensen and Qiang Shen Experimentation Datasets corrupted with noiseDatasets corrupted with noise 10-fold cross validation with JRip10-fold cross validation with JRip

Richard Jensen and Qiang Shen Results: lower

Richard Jensen and Qiang Shen Results: boundary

Richard Jensen and Qiang Shen Results: discernibility

Richard Jensen and Qiang Shen Conclusion New approaches to fuzzy-rough feature selection based on IVFSNew approaches to fuzzy-rough feature selection based on IVFS Can handle missing values effectivelyCan handle missing values effectively Allows greater flexibility w.r.t. similarity relationsAllows greater flexibility w.r.t. similarity relations Future workFuture work Further investigationsFurther investigations Development and extension of other fuzzy-rough methods to handle missing values – classifiers, clusterers etc.Development and extension of other fuzzy-rough methods to handle missing values – classifiers, clusterers etc.

Richard Jensen and Qiang Shen WEKA implementations of all fuzzy-rough feature selectors and classifiers can be downloaded from:WEKA implementations of all fuzzy-rough feature selectors and classifiers can be downloaded from:

Richard Jensen and Qiang Shen

RSAR approximations Approximating a concept X using knowledge in PApproximating a concept X using knowledge in P Lower approximation: contains objects that definitely belong to XLower approximation: contains objects that definitely belong to X Upper approximation: contains objects that possibly belong to XUpper approximation: contains objects that possibly belong to X

Richard Jensen and Qiang Shen FRFS Based on fuzzy similarityBased on fuzzy similarity Lower/upper approximationsLower/upper approximations