FCTA 2016 Porto, Portugal, 9-11 November 2016 Classification confusion within NEFCLASS caused by feature value skewness in multi-dimensional datasets Jamileh Yousefi & Dr. Andrew Hamilton-Wright University of Guelph, Guelph, ON, Canada Mount Allison University, Sackville, NB, Canada
FCTA 2016 Porto, Portugal, 9-11 November 2016 Research Questions 2 Does the skewness of feature values affect classifier’s accuracy? How does the NEFCLASS classifier’s behavior change when dealing with skewed distributed data? Does changing discretization method result in higher classification accuracy of the NEFCLASS classifier when dealing with skewed data?
FCTA 2016 Porto, Portugal, 9-11 November 2016 Motivation 3 Skewed feature values are commonly observed in biological datasets. Some EMG features are highly skewed which lead to a lower accuracy in diagnosis of neuromuscular disorders.
FCTA 2016 Porto, Portugal, 9-11 November 2016 The Problem 4 Skewed data cause a higher misclassification percentage in many classifiers.
FCTA 2016 Porto, Portugal, 9-11 November 2016 A NEFCLASS model with two inputs, three rules, and two output classes NEFCLASS: Neuro-Fuzzy Classifier 5 ClassAClassB Output Neurons (Class Labels) Fuzzy Rules Unweighted Connections Fuzzy Sets Input Neurons Rule1 Rule2 Rule3 W X
FCTA 2016 Porto, Portugal, 9-11 November 2016 Heuristics: a fuzzy set (i.e. P, Q, R) is shifted and its support is reduced (enlarged), in order to reduce (enlarge) the degree of membership function. 6 Constructing Fuzzy Sets
FCTA 2016 Porto, Portugal, 9-11 November Three synthesized datasets were used for experiments. Datasets generated by random numbers from the F- distribution. Three pairs of degrees of freedom have been used to generate datasets with different levels of skewness, including low, medium and, high skewed feature values. The datasets have similar overlapping nature in between the three classes. 7 Datasets
FCTA 2016 Porto, Portugal, 9-11 November 2016 Degree of Skewness for Each Class and Feature 8 Low-100_100Medium-100_20High-35_8 Skewness
FCTA 2016 Porto, Portugal, 9-11 November Misclassification% Number of Rules NEFCLASS Results Summary Misclassification % / Number of rules
FCTA 2016 Porto, Portugal, 9-11 November 2016 Equal-Width Maximum Marginal Entropy(MME) Class-Attribute Interdependence Maximization (CAIM) 10 Discretization Methods
FCTA 2016 Porto, Portugal, 9-11 November EQUAL-WIDTH Low-100_100Medium-100_20High-35_8 Fuzzy sets and membership functions for feature X
FCTA 2016 Porto, Portugal, 9-11 November MME Low-100_100Medium-100_20High-35_8
FCTA 2016 Porto, Portugal, 9-11 November CAIM Low-100_100Medium-100_20High-35_8
FCTA 2016 Porto, Portugal, 9-11 November Summary of Number of Rules
FCTA 2016 Porto, Portugal, 9-11 November Summary of Misclassification%
FCTA 2016 Porto, Portugal, 9-11 November 2016 Discretization Low-100_100 vs Medium Low-100_100 vs High-35_8 MED-100_20 vs HIGH-35,8 EQUAL-WIDTH 2.9 X 10^-11*.9300 MME2.9 X 10^-11*.0001* CAIM2.9 X 10^-11*.0001* M-W-W results comparing the misclassification percentages based on level of skew *significant at 99.9% confidence (p <.001) 16 M-W-W Test Results
FCTA 2016 Porto, Portugal, 9-11 November 2016 Dataset EQUAL-WIDTH vs MME EQUAL-WIDTH vs CAIM MME vs CAIM Low-100_ Medium-100_ *.5000 High-35_8.0001*.9100 M-W-W results comparing the misclassification percentages based on discretization method *significant at 99.9% confidence (p <.001) 17 M-W-W Test Results
FCTA 2016 Porto, Portugal, 9-11 November 2016 In general, all of classifiers perform significantly better on low skewed data distribution. The mean of misclassification percentage increases with increase of skewness levels. NEFCLASS classifier Is sensitive to skewed distribution. Generates a lower number of rules for high skewed data. Significantly performs better when MME or CAIM used. 18 Thank you for listening! Conclusion
FCTA 2016 Porto, Portugal, 9-11 November Thank you for listening! 19
FCTA 2016 Porto, Portugal, 9-11 November Supporting Slides and Additional Information 20
FCTA 2016 Porto, Portugal, 9-11 November 2016 maximizes interdependence between the class labels and the generated discrete intervals generates the smallest number of intervals for a given continuous attribute automatically selects the number of intervals in contrast to many other discretization algorithms 21 CAIM Discretization algorithm 21
FCTA 2016 Porto, Portugal, 9-11 November 2016 CAIM Discretization Criterion where: n is the number of intervals r iterates through all intervals, i.e. r = 1, 2,..., n max r is the maximum value among all q ir values (maximum in the r th column of the quanta matrix), i = 1, 2,..., S, M +r is the total number of continuous values of attribute F that are within the interval (d r-1, d r ] Quanta matrix 22 Class Interval Class Total [d 0, d 1 ]…(d r-1, d r ]…(d n-1, d n ] C1:Ci:CSC1:Ci:CS q 11 : q i1 : q S1 ………………………… q 1r : q ir : q Sr ………………………… q 1n : q in : q Sn M 1+ : M i+ : M S+ Interval Total M +1 …M +r …M +n M 22
FCTA 2016 Porto, Portugal, 9-11 November 2016 Selection of a Rule Base Order rules by performance. Either select the best r rules or the best r/m rules per class. r is either given or is determined automatically such that all patterns are covered. 23
FCTA 2016 Porto, Portugal, 9-11 November 2016 Computing the Error Signal Rule Error: Fuzzy Error ( jth output): xy c1c1 c2c2 R1R1 R3R3 R2R2 Error Signal 24
FCTA 2016 Porto, Portugal, 9-11 November 2016 Training of Fuzzy Sets Observing the error on a validation set Algorithm: repeat for (all patterns) do accumulate parameter updates; accumulate error; end; modify parameters; until (no change in error); local minimum 25
FCTA 2016 Porto, Portugal, 9-11 November 2016 Constraints for Training Fuzzy Sets Valid parameter values Non-empty intersection of adjacent fuzzy sets Keep relative positions Maintain symmetry Complete coverage (degrees of membership add up to 1 for each element) Correcting a partition after modifying the parameters 26
FCTA 2016 Porto, Portugal, 9-11 November 2016 DatasetMinimum skewnessMaximum skewness WXYZWXYZ LOW-100, MED-100, HIGH-35, Table 3:Minimum and maximum skewness observed for each Synthetic dataset Skewness of Feature Values 27
FCTA 2016 Porto, Portugal, 9-11 November 2016 Figure 11: Pairplots showing degrees of skew and overlapping for LOW-100,100 Low Skewed Data 28
FCTA 2016 Porto, Portugal, 9-11 November 2016 Figure 10: Pairplots showing degrees of skew and overlapping for MED-100,20 Medium Skewed Data 29
FCTA 2016 Porto, Portugal, 9-11 November 2016 Figure 10: Pairplots showing degrees of skew and overlapping for HIGH-35,8 High Skewed Data 30
FCTA 2016 Porto, Portugal, 9-11 November 2016 LOW-100,100 MED-100,3 HIGH-35,8 Figure 10: Pairplots showing different degrees of skew for the various data sets 31
FCTA 2016 Porto, Portugal, 9-11 November 2016 Dataset Degrees of Freedom Level of Skewness ClassA MeanClassB MeanClassC Mean LOW-100, , 100 Low [1.0, 1.8, 2.3, 2.5][1.5, 2.3, 2.8, 3.0][2.0, 2.8, 3.3, 3.5] MED-100,3 100, 3Medium [1.0, 1.9, 3.3, 2.6][1.6, 2.4, 2.8, 3.0][2.1, 2.9, 3.4, 3.6] HIGH-35,8 35, 8High [1.3, 2.0, 2.5, 2.7][1.8, 2.6, 3.0, 3.3][2.2, 3.0, 3.5, 3.8] Table 1: Randomly generated F-distribution datasets with different degrees of freedom Datasets Characteristics Label W XYZ Class A Class B Class C Table 2: Median per feature for generated F-distribution 32
FCTA 2016 Porto, Portugal, 9-11 November 2016 DiscretizationLOW-100,100MED-100,20HIGH-35,8 EQUAL-WIDTH22.63 ± ± ± 6.77 MME25.67 ± ± ± 2.78 CAIM24.13 ± ± ± 3.08 EQUAL-WIDTH shows higher mean and variability of misclassification rates for medium and high skewed datasets. 33 Summary of Misclassification Rates
FCTA 2016 Porto, Portugal, 9-11 November 2016 DiscretizationLOW-100,100MED-100,20HIGH-35,8 EQUAL-WIDTH MME CAIM Summary of Number of Rules