Richard Jensen and Chris Cornelis Chris Cornelis Chris Cornelis Ghent University, Belgium Richard Jensen Richard Jensen Aberystwyth University, UK Fuzzy-Rough Instance Selection
Richard Jensen and Chris Cornelis Outline The importance of instance selection Rough set theory Fuzzy-rough sets Fuzzy-rough instance selection Experimentation Conclusion
Richard Jensen and Chris Cornelis Knowledge discovery The problem of too much data Requires storage Intractable for data mining algorithms Removing data that is noisy or irrelevant Instance selection
Richard Jensen and Chris Cornelis Rough set theory Rx is the set of all points that are indiscernible with point x Upper Approximation Set A Lower Approximation Equivalence class Rx
Richard Jensen and Chris Cornelis Fuzzy-rough sets Approximate equality Handle real-valued features via fuzzy tolerance relations instead of crisp equivalence Better noise and uncertainty handling Focus has been on feature selection, not instance selection
Richard Jensen and Chris Cornelis Fuzzy-rough sets Parameterized relation Fuzzy-rough definitions:
Richard Jensen and Chris Cornelis Instance selection: basic idea Not needed Remove objects to keep the underlying approximations unchanged
Richard Jensen and Chris Cornelis Instance selection: basic idea Remove objects to keep the underlying approximations unchanged
Richard Jensen and Chris Cornelis FRIS-I
Richard Jensen and Chris Cornelis FRIS-II
Richard Jensen and Chris Cornelis FRIS-III
Richard Jensen and Chris Cornelis Experimentation: setup
Richard Jensen and Chris Cornelis Results: FRIS-I (heart) (214 objects, 9 features)
Richard Jensen and Chris Cornelis Results: FRIS-II (heart)
Richard Jensen and Chris Cornelis Results: FRIS-III (heart)
Richard Jensen and Chris Cornelis Conclusion Proposed new techniques for instance selection based on fuzzy-rough sets Managed to reduce the number of instances significantly, retaining classification accuracy Future work Many possibilities for novel fuzzy-rough instance selection methods Comparisons with non-rough techniques Improving the complexity of FRIS-III Combined instance/feature selection
Richard Jensen and Chris Cornelis WEKA implementations of all fuzzy-rough methods can be downloaded from: