Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Redundant Feature Elimination for Multi-Class Problems Annalisa Appice, Michelangelo Ceci Dipartimento di Informatica, Università degli Studi di Bari,

Similar presentations


Presentation on theme: "1 Redundant Feature Elimination for Multi-Class Problems Annalisa Appice, Michelangelo Ceci Dipartimento di Informatica, Università degli Studi di Bari,"— Presentation transcript:

1 1 Redundant Feature Elimination for Multi-Class Problems Annalisa Appice, Michelangelo Ceci Dipartimento di Informatica, Università degli Studi di Bari, Italy Simon Rawles, Peter Flach Department of Computer Science, University of Bristol, UK

2 2 Redundant feature reduction REFER: an efficient, scalable, logic- based method for eliminating Boolean features which are redundant for multi- class classifier learning. –Why? Size of hypothesis space, predictive performance, model comprehensibility. –Distinct from feature selection.

3 3 Overview of this talk Redundant feature reduction –What is feature redundancy? –Doing multi-class reduction Related approaches Theoretical and experimental results Summary Current and future work

4 4 Example: Redundancy of features A fixed number of Boolean features One of several class labels (‘multiclass’)

5 5 Discriminating a against b True values in examples of class a make the feature better for distinguishing a from b in a classification rule.

6 6 Discriminating a against b False values in examples of class b make the feature better for distinguishing a from b in a rule.

7 7 Discriminating a against b f 2 covers f 1 and f 3 is useless. f 1 and f 3 are redundant. Negated features are not automatically considered.

8 8 More formally... For discriminating class a examples from class b, f covers g if T a (g)  T a (f) and F b (g)  F b (f). A feature is redundant if another feature covers it. T a (f 2 ) = {e 1, e 2 }. T a (f 1 ) = {e 1 }. F b (f 2 ) = {e 4, e 5 }. F b (f 1 ) = {e 5 }. a is the ‘positive class’ here

9 9 Neighbourhoods of examples A way to upgrade to multi-class data. Each class is partitioned into subsets of similar examples. –REFER-N finds non-redundant features between each neighbourhood pair in turn. –Builds up list of non-redundant features between each neighbourhood pair in turn. Efficient, more reduction, logic-based.

10 10 Neighbourhood construction

11 11 Neighbourhood construction 1

12 12 Neighbourhood construction 1

13 13 Neighbourhood construction 1

14 14 Neighbourhood construction 1 1 1 1

15 15 Neighbourhood construction 2

16 16 Neighbourhood construction 2

17 17 Neighbourhood construction 2

18 18 Neighbourhood construction 2 2 2

19 19 Neighbourhood construction 3

20 20 Neighbourhood construction 3 3 3 3 3

21 21 Neighbourhood construction 4

22 22 Neighbourhood construction 4

23 23 Neighbourhood construction 5

24 24 Neighbourhood construction 5 5 5

25 25 Neighbourhood construction 1 1 1 1 2 2 3 2 3 4 3 3 3 5 5 5 Groups of similar examples with the same class label

26 26 Neighbourhood comparison 1 2 3 4 5

27 27 Neighbourhood comparison 1 2 3 4 5

28 28 Neighbourhood comparison Comparing all neighbourhoods of differing class 1 2 3 4 5

29 29 Ancestry of REFER REDUCE (Lavrač et al. 1999) –Feature reduction for propositionalised ILP datasets –Preserves learnability of a complete and consistent hypothesis REFER uses a variant of REDUCE –Redundant features found between the examples in each neighbourhood pair –Prefers features already found non-redundant

30 30 Related multiclass filters FOCUS for noise-free Boolean data (Almuallim & Dietterich 1991) –Exhaustive evaluation of all subsets –A time complexity of O(n p ) SCRAP relevance filter (Raman 2003) –Also uses neighbourhood approach –No guarantee that selected features (still) discriminate among all classes.

31 31 Theoretical results REFER preserves the learnability of a complete and consistent theory. –If a C&C rule was in the original data, it’ll be in the reduced data. REFER is efficient. Time complexity is –… linear in number of examples –… quadratic in number of features

32 32 Experimental results Mutagenesis data from SINUS –Feature set greatly reduced (13118  44) –Accuracy still competitive (approx. 85%)

33 33 Thirteen UCI benchmark datasets –Compared with LVF, CFS and Relief using discrete/discretised data –Generally conservative –Faster: 8 out of 13 faster, 3 very close. –Competitive predictive accuracy using several classifiers: Experimental results

34 34 Experimental results Reuters-21578 large-scale high- dimensionality sparse data –16,582 preprocessed features were reduced to 1450. –REFER supports parallel execution well. REFER runs in parallel on subsets of the feature set and again on the combination.

35 35Summary A method for eliminating redundant Boolean features for multi-class classification tasks. Uses logical coverage of examples Efficient and scalable –requiring less time than the three feature selection algorithms we used Amenable to parallel execution

36 36 Current and future investigations Interaction between feature selection and feature reduction –Benefits of combination Noise handling using non-pure neighbourhoods (‘relaxed REFER’) –Overcoming sensitivity to noise REFER for example reduction

37 37Questions

38 38

39 39 Average reduction on UCI data

40 40 Effect of choice of starting point Number of reduced features 0 20 40 60 80 100 120 Number of neighbourhoods constructed 0 200 400 600 800 1000 AudCarBridF1MF1CPostF3MNurMusF3CYeaPimTic AudCarBridF1MF1CPostF3MNurMusF3CYeaPimTic

41 41 Comparison of running times Machine spec: Pentium IV 1.4GHz PC running Windows XP

42 42 Full accuracy results

43 43 REFER for propositionalisation SettingM1M2M3M4 Instances produced 1692 Features produced 10162114398613118 S INUS parameters (L, V, T) 3, 3, 20 4, 4, 20 inda and ind1 yes bonds yes atom element and type yes atom charge noyes lumo and logp noyes 2D molecular structures yesnoyes

44 44 REFER for propositionalisation

45 45 REFER for propositionalisation

46 46 Neighbourhoods of examples E c1c1 c1c1 c1c1 c2c2 c3c3 c3c3 c3c3 c3c3 c1c1 c2c2 c1c1 c1c1 c1c1 c1c1 E1E1 E2E2 E4E4 E3E3 c2c2 c3c3 c3c3 c2c2 E5E5 e1e1 a) b) E1E1 c1c1 E2E2 E3E3 E4E4 E5E5 c2c2 c3c3 c2c2 c1c1 R 2 analogy of neighbourhood construction Comparison between neighbourhood pairs

47 47 Another simple example f 2 is a useless feature - any feature can cover it.

48 48 Introducing negated features … but its negation is a perfectly non-redundant feature. REFER assumes that the user will provide negated features if the language for rules requires it.

49 49 Introducing negated features If all features are considered together, f 2 is chosen...

50 50 Introducing negated features … but REFER considers positive against positive and negative against negative only.


Download ppt "1 Redundant Feature Elimination for Multi-Class Problems Annalisa Appice, Michelangelo Ceci Dipartimento di Informatica, Università degli Studi di Bari,"

Similar presentations


Ads by Google