FCTA 2016 Porto, Portugal, 9-11 November 2016 Classification confusion within NEFCLASS caused by feature value skewness in multi-dimensional datasets Jamileh.

Slides:



Advertisements
Similar presentations
Sequential Three-way Decision with Probabilistic Rough Sets Supervisor: Dr. Yiyu Yao Speaker: Xiaofei Deng 18th Aug, 2011.
Advertisements

ECG Signal processing (2)
Introduction to Neural Networks Computing
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Artificial neural networks:
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Mutual Information Mathematical Biology Seminar
TRADING OFF PREDICTION ACCURACY AND POWER CONSUMPTION FOR CONTEXT- AWARE WEARABLE COMPUTING Presented By: Jeff Khoshgozaran.
Lesson 8: Machine Learning (and the Legionella as a case study) Biological Sequences Analysis, MTA.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
Aula 4 Radial Basis Function Networks
Lecture 09 Clustering-based Learning
1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,
Chapter 8 DISCRETIZATION
Efficient Model Selection for Support Vector Machines
Black-Box Testing Techniques I Software Testing Lecture 4.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Artificial Neural Network Unsupervised Learning
Black-Box Testing Techniques I
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Selecting Input Probability Distribution. Simulation Machine Simulation can be considered as an Engine with input and output as follows: Simulation Engine.
DISCRETIZATION ALGORITHMS Sai Jyothsna Jonnalagadda MS Computer Science.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Data Mining Spring 2007 Noisy data Data Discretization using Entropy based and ChiMerge.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Chapter 8 DISCRETIZATION Cios / Pedrycz / Swiniarski / Kurgan.
Ensemble Classifiers.
Learning to Align: a Statistical Approach
Fig. 1 Computing the four node TOMs for nodes A,B,C,D in two simple networks 1) tA,B,C,D=0+40+6=0.667 and 2) tA,B,C,D=1+41+6= From: Network neighborhood.
Fuzzy Logic in Pattern Recognition
Koichi Odajima & Yoichi Hayashi
Neural Network Architecture Session 2
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
CEE 6410 Water Resources Systems Analysis
Deep Feedforward Networks
Data Science Algorithms: The Basic Methods
Trees, bagging, boosting, and stacking
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Basic machine learning background with Python scikit-learn
Support Vector Machines (SVM)
Clustering (3) Center-based algorithms Fuzzy k-means
An Introduction to Support Vector Machines
CHAPTER 1: Picturing Distributions with Graphs
Interval Estimation.
Linear Contrasts and Multiple Comparisons (§ 8.6)
Hyperparameters, bias-variance tradeoff, validation
Ying shen Sse, tongji university Sep. 2016
Neuro-Computing Lecture 4 Radial Basis Function Network
network of simple neuron-like computing elements
Discriminative Frequent Pattern Analysis for Effective Classification
COSC 4335: Other Classification Techniques
Data Transformations targeted at minimizing experimental variance
CS 485G: Special Topics in Data Mining
FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR
CSE 802. Prepared by Martin Law
Roc curves By Vittoria Cozza, matr
Lecture 16. Classification (II): Practical Considerations
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Outlines Introduction & Objectives Methodology & Workflow
Presentation transcript:

FCTA 2016 Porto, Portugal, 9-11 November 2016 Classification confusion within NEFCLASS caused by feature value skewness in multi-dimensional datasets Jamileh Yousefi & Dr. Andrew Hamilton-Wright University of Guelph, Guelph, ON, Canada Mount Allison University, Sackville, NB, Canada

FCTA 2016 Porto, Portugal, 9-11 November 2016 Research Questions 2 Does the skewness of feature values affect classifier’s accuracy? How does the NEFCLASS classifier’s behavior change when dealing with skewed distributed data? Does changing discretization method result in higher classification accuracy of the NEFCLASS classifier when dealing with skewed data?

FCTA 2016 Porto, Portugal, 9-11 November 2016 Motivation 3 Skewed feature values are commonly observed in biological datasets. Some EMG features are highly skewed which lead to a lower accuracy in diagnosis of neuromuscular disorders.

FCTA 2016 Porto, Portugal, 9-11 November 2016 The Problem 4 Skewed data cause a higher misclassification percentage in many classifiers.

FCTA 2016 Porto, Portugal, 9-11 November 2016 A NEFCLASS model with two inputs, three rules, and two output classes NEFCLASS: Neuro-Fuzzy Classifier 5 ClassAClassB Output Neurons (Class Labels) Fuzzy Rules Unweighted Connections Fuzzy Sets Input Neurons Rule1 Rule2 Rule3 W X

FCTA 2016 Porto, Portugal, 9-11 November 2016 Heuristics: a fuzzy set (i.e. P, Q, R) is shifted and its support is reduced (enlarged), in order to reduce (enlarge) the degree of membership function. 6 Constructing Fuzzy Sets

FCTA 2016 Porto, Portugal, 9-11 November Three synthesized datasets were used for experiments. Datasets generated by random numbers from the F- distribution. Three pairs of degrees of freedom have been used to generate datasets with different levels of skewness, including low, medium and, high skewed feature values. The datasets have similar overlapping nature in between the three classes. 7 Datasets

FCTA 2016 Porto, Portugal, 9-11 November 2016 Degree of Skewness for Each Class and Feature 8 Low-100_100Medium-100_20High-35_8 Skewness

FCTA 2016 Porto, Portugal, 9-11 November Misclassification% Number of Rules NEFCLASS Results Summary Misclassification % / Number of rules

FCTA 2016 Porto, Portugal, 9-11 November 2016 Equal-Width Maximum Marginal Entropy(MME) Class-Attribute Interdependence Maximization (CAIM) 10 Discretization Methods

FCTA 2016 Porto, Portugal, 9-11 November EQUAL-WIDTH Low-100_100Medium-100_20High-35_8 Fuzzy sets and membership functions for feature X

FCTA 2016 Porto, Portugal, 9-11 November MME Low-100_100Medium-100_20High-35_8

FCTA 2016 Porto, Portugal, 9-11 November CAIM Low-100_100Medium-100_20High-35_8

FCTA 2016 Porto, Portugal, 9-11 November Summary of Number of Rules

FCTA 2016 Porto, Portugal, 9-11 November Summary of Misclassification%

FCTA 2016 Porto, Portugal, 9-11 November 2016 Discretization Low-100_100 vs Medium Low-100_100 vs High-35_8 MED-100_20 vs HIGH-35,8 EQUAL-WIDTH 2.9 X 10^-11*.9300 MME2.9 X 10^-11*.0001* CAIM2.9 X 10^-11*.0001* M-W-W results comparing the misclassification percentages based on level of skew *significant at 99.9% confidence (p <.001) 16 M-W-W Test Results

FCTA 2016 Porto, Portugal, 9-11 November 2016 Dataset EQUAL-WIDTH vs MME EQUAL-WIDTH vs CAIM MME vs CAIM Low-100_ Medium-100_ *.5000 High-35_8.0001*.9100 M-W-W results comparing the misclassification percentages based on discretization method *significant at 99.9% confidence (p <.001) 17 M-W-W Test Results

FCTA 2016 Porto, Portugal, 9-11 November 2016 In general, all of classifiers perform significantly better on low skewed data distribution. The mean of misclassification percentage increases with increase of skewness levels. NEFCLASS classifier  Is sensitive to skewed distribution.  Generates a lower number of rules for high skewed data.  Significantly performs better when MME or CAIM used. 18 Thank you for listening! Conclusion

FCTA 2016 Porto, Portugal, 9-11 November Thank you for listening! 19

FCTA 2016 Porto, Portugal, 9-11 November Supporting Slides and Additional Information 20

FCTA 2016 Porto, Portugal, 9-11 November 2016 maximizes interdependence between the class labels and the generated discrete intervals generates the smallest number of intervals for a given continuous attribute automatically selects the number of intervals in contrast to many other discretization algorithms 21 CAIM Discretization algorithm 21

FCTA 2016 Porto, Portugal, 9-11 November 2016 CAIM Discretization Criterion where: n is the number of intervals r iterates through all intervals, i.e. r = 1, 2,..., n max r is the maximum value among all q ir values (maximum in the r th column of the quanta matrix), i = 1, 2,..., S, M +r is the total number of continuous values of attribute F that are within the interval (d r-1, d r ] Quanta matrix 22 Class Interval Class Total [d 0, d 1 ]…(d r-1, d r ]…(d n-1, d n ] C1:Ci:CSC1:Ci:CS q 11 : q i1 : q S1 ………………………… q 1r : q ir : q Sr ………………………… q 1n : q in : q Sn M 1+ : M i+ : M S+ Interval Total M +1 …M +r …M +n M 22

FCTA 2016 Porto, Portugal, 9-11 November 2016 Selection of a Rule Base  Order rules by performance.  Either select the best r rules or the best r/m rules per class.  r is either given or is determined automatically such that all patterns are covered. 23

FCTA 2016 Porto, Portugal, 9-11 November 2016 Computing the Error Signal Rule Error: Fuzzy Error ( jth output): xy c1c1 c2c2 R1R1 R3R3 R2R2 Error Signal 24

FCTA 2016 Porto, Portugal, 9-11 November 2016 Training of Fuzzy Sets Observing the error on a validation set Algorithm: repeat for (all patterns) do accumulate parameter updates; accumulate error; end; modify parameters; until (no change in error); local minimum 25

FCTA 2016 Porto, Portugal, 9-11 November 2016 Constraints for Training Fuzzy Sets Valid parameter values Non-empty intersection of adjacent fuzzy sets Keep relative positions Maintain symmetry Complete coverage (degrees of membership add up to 1 for each element) Correcting a partition after modifying the parameters 26

FCTA 2016 Porto, Portugal, 9-11 November 2016 DatasetMinimum skewnessMaximum skewness WXYZWXYZ LOW-100, MED-100, HIGH-35, Table 3:Minimum and maximum skewness observed for each Synthetic dataset Skewness of Feature Values 27

FCTA 2016 Porto, Portugal, 9-11 November 2016 Figure 11: Pairplots showing degrees of skew and overlapping for LOW-100,100 Low Skewed Data 28

FCTA 2016 Porto, Portugal, 9-11 November 2016 Figure 10: Pairplots showing degrees of skew and overlapping for MED-100,20 Medium Skewed Data 29

FCTA 2016 Porto, Portugal, 9-11 November 2016 Figure 10: Pairplots showing degrees of skew and overlapping for HIGH-35,8 High Skewed Data 30

FCTA 2016 Porto, Portugal, 9-11 November 2016 LOW-100,100 MED-100,3 HIGH-35,8 Figure 10: Pairplots showing different degrees of skew for the various data sets 31

FCTA 2016 Porto, Portugal, 9-11 November 2016 Dataset Degrees of Freedom Level of Skewness ClassA MeanClassB MeanClassC Mean LOW-100, , 100 Low [1.0, 1.8, 2.3, 2.5][1.5, 2.3, 2.8, 3.0][2.0, 2.8, 3.3, 3.5] MED-100,3 100, 3Medium [1.0, 1.9, 3.3, 2.6][1.6, 2.4, 2.8, 3.0][2.1, 2.9, 3.4, 3.6] HIGH-35,8 35, 8High [1.3, 2.0, 2.5, 2.7][1.8, 2.6, 3.0, 3.3][2.2, 3.0, 3.5, 3.8] Table 1: Randomly generated F-distribution datasets with different degrees of freedom Datasets Characteristics Label W XYZ Class A Class B Class C Table 2: Median per feature for generated F-distribution 32

FCTA 2016 Porto, Portugal, 9-11 November 2016 DiscretizationLOW-100,100MED-100,20HIGH-35,8 EQUAL-WIDTH22.63 ± ± ± 6.77 MME25.67 ± ± ± 2.78 CAIM24.13 ± ± ± 3.08 EQUAL-WIDTH shows higher mean and variability of misclassification rates for medium and high skewed datasets. 33 Summary of Misclassification Rates

FCTA 2016 Porto, Portugal, 9-11 November 2016 DiscretizationLOW-100,100MED-100,20HIGH-35,8 EQUAL-WIDTH MME CAIM Summary of Number of Rules