FCTA 2016 Porto, Portugal, 9-11 November 2016 Classification confusion within NEFCLASS caused by feature value skewness in multi-dimensional datasets Jamileh.

Slides:

Advertisements

Similar presentations

Sequential Three-way Decision with Probabilistic Rough Sets Supervisor: Dr. Yiyu Yao Speaker: Xiaofei Deng 18th Aug, 2011.

Advertisements

ECG Signal processing (2)

Introduction to Neural Networks Computing

Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.

Artificial neural networks:

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

Mutual Information Mathematical Biology Seminar

TRADING OFF PREDICTION ACCURACY AND POWER CONSUMPTION FOR CONTEXT- AWARE WEARABLE COMPUTING Presented By: Jeff Khoshgozaran.

Lesson 8: Machine Learning (and the Legionella as a case study) Biological Sequences Analysis, MTA.

ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques

Aula 4 Radial Basis Function Networks

Lecture 09 Clustering-based Learning

1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,

Chapter 8 DISCRETIZATION

Efficient Model Selection for Support Vector Machines

Black-Box Testing Techniques I Software Testing Lecture 4.

Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.

Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.

Artificial Neural Network Unsupervised Learning

Black-Box Testing Techniques I

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Selecting Input Probability Distribution. Simulation Machine Simulation can be considered as an Engine with input and output as follows: Simulation Engine.

DISCRETIZATION ALGORITHMS Sai Jyothsna Jonnalagadda MS Computer Science.

DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.

Data Mining Spring 2007 Noisy data Data Discretization using Entropy based and ChiMerge.

Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.

Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.

Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &

 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.

Chapter 8 DISCRETIZATION Cios / Pedrycz / Swiniarski / Kurgan.

Ensemble Classifiers.

Learning to Align: a Statistical Approach

Fig. 1 Computing the four node TOMs for nodes A,B,C,D in two simple networks 1) tA,B,C,D=0+40+6=0.667 and 2) tA,B,C,D=1+41+6= From: Network neighborhood.

Fuzzy Logic in Pattern Recognition

Koichi Odajima & Yoichi Hayashi

Neural Network Architecture Session 2

Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.

CEE 6410 Water Resources Systems Analysis

Deep Feedforward Networks

Data Science Algorithms: The Basic Methods

Trees, bagging, boosting, and stacking

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

Basic machine learning background with Python scikit-learn

Support Vector Machines (SVM)

Clustering (3) Center-based algorithms Fuzzy k-means

An Introduction to Support Vector Machines

CHAPTER 1: Picturing Distributions with Graphs

Interval Estimation.

Linear Contrasts and Multiple Comparisons (§ 8.6)

Hyperparameters, bias-variance tradeoff, validation

Ying shen Sse, tongji university Sep. 2016

Neuro-Computing Lecture 4 Radial Basis Function Network

network of simple neuron-like computing elements

Discriminative Frequent Pattern Analysis for Effective Classification

COSC 4335: Other Classification Techniques

Data Transformations targeted at minimizing experimental variance

CS 485G: Special Topics in Data Mining

FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR

CSE 802. Prepared by Martin Law

Roc curves By Vittoria Cozza, matr

Lecture 16. Classification (II): Practical Considerations

Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project

Outlines Introduction & Objectives Methodology & Workflow

Presentation transcript:

FCTA 2016 Porto, Portugal, 9-11 November 2016 Classification confusion within NEFCLASS caused by feature value skewness in multi-dimensional datasets Jamileh Yousefi & Dr. Andrew Hamilton-Wright University of Guelph, Guelph, ON, Canada Mount Allison University, Sackville, NB, Canada

FCTA 2016 Porto, Portugal, 9-11 November 2016 Research Questions 2 Does the skewness of feature values affect classifier’s accuracy? How does the NEFCLASS classifier’s behavior change when dealing with skewed distributed data? Does changing discretization method result in higher classification accuracy of the NEFCLASS classifier when dealing with skewed data?

FCTA 2016 Porto, Portugal, 9-11 November 2016 Motivation 3 Skewed feature values are commonly observed in biological datasets. Some EMG features are highly skewed which lead to a lower accuracy in diagnosis of neuromuscular disorders.

FCTA 2016 Porto, Portugal, 9-11 November 2016 The Problem 4 Skewed data cause a higher misclassification percentage in many classifiers.

FCTA 2016 Porto, Portugal, 9-11 November 2016 A NEFCLASS model with two inputs, three rules, and two output classes NEFCLASS: Neuro-Fuzzy Classifier 5 ClassAClassB Output Neurons (Class Labels) Fuzzy Rules Unweighted Connections Fuzzy Sets Input Neurons Rule1 Rule2 Rule3 W X

FCTA 2016 Porto, Portugal, 9-11 November 2016 Heuristics: a fuzzy set (i.e. P, Q, R) is shifted and its support is reduced (enlarged), in order to reduce (enlarge) the degree of membership function. 6 Constructing Fuzzy Sets

FCTA 2016 Porto, Portugal, 9-11 November Three synthesized datasets were used for experiments. Datasets generated by random numbers from the F- distribution. Three pairs of degrees of freedom have been used to generate datasets with different levels of skewness, including low, medium and, high skewed feature values. The datasets have similar overlapping nature in between the three classes. 7 Datasets

FCTA 2016 Porto, Portugal, 9-11 November 2016 Degree of Skewness for Each Class and Feature 8 Low-100_100Medium-100_20High-35_8 Skewness

FCTA 2016 Porto, Portugal, 9-11 November Misclassification% Number of Rules NEFCLASS Results Summary Misclassification % / Number of rules

FCTA 2016 Porto, Portugal, 9-11 November 2016 Equal-Width Maximum Marginal Entropy(MME) Class-Attribute Interdependence Maximization (CAIM) 10 Discretization Methods

FCTA 2016 Porto, Portugal, 9-11 November EQUAL-WIDTH Low-100_100Medium-100_20High-35_8 Fuzzy sets and membership functions for feature X

FCTA 2016 Porto, Portugal, 9-11 November MME Low-100_100Medium-100_20High-35_8

FCTA 2016 Porto, Portugal, 9-11 November CAIM Low-100_100Medium-100_20High-35_8

FCTA 2016 Porto, Portugal, 9-11 November Summary of Number of Rules

FCTA 2016 Porto, Portugal, 9-11 November Summary of Misclassification%

FCTA 2016 Porto, Portugal, 9-11 November 2016 Discretization Low-100_100 vs Medium Low-100_100 vs High-35_8 MED-100_20 vs HIGH-35,8 EQUAL-WIDTH 2.9 X 10^-11*.9300 MME2.9 X 10^-11*.0001* CAIM2.9 X 10^-11*.0001* M-W-W results comparing the misclassification percentages based on level of skew *significant at 99.9% confidence (p <.001) 16 M-W-W Test Results

FCTA 2016 Porto, Portugal, 9-11 November 2016 Dataset EQUAL-WIDTH vs MME EQUAL-WIDTH vs CAIM MME vs CAIM Low-100_ Medium-100_ *.5000 High-35_8.0001*.9100 M-W-W results comparing the misclassification percentages based on discretization method *significant at 99.9% confidence (p <.001) 17 M-W-W Test Results

FCTA 2016 Porto, Portugal, 9-11 November 2016 In general, all of classifiers perform significantly better on low skewed data distribution. The mean of misclassification percentage increases with increase of skewness levels. NEFCLASS classifier  Is sensitive to skewed distribution.  Generates a lower number of rules for high skewed data.  Significantly performs better when MME or CAIM used. 18 Thank you for listening! Conclusion

FCTA 2016 Porto, Portugal, 9-11 November Thank you for listening! 19

FCTA 2016 Porto, Portugal, 9-11 November Supporting Slides and Additional Information 20

FCTA 2016 Porto, Portugal, 9-11 November 2016 maximizes interdependence between the class labels and the generated discrete intervals generates the smallest number of intervals for a given continuous attribute automatically selects the number of intervals in contrast to many other discretization algorithms 21 CAIM Discretization algorithm 21

FCTA 2016 Porto, Portugal, 9-11 November 2016 CAIM Discretization Criterion where: n is the number of intervals r iterates through all intervals, i.e. r = 1, 2,..., n max r is the maximum value among all q ir values (maximum in the r th column of the quanta matrix), i = 1, 2,..., S, M +r is the total number of continuous values of attribute F that are within the interval (d r-1, d r ] Quanta matrix 22 Class Interval Class Total [d 0, d 1 ]…(d r-1, d r ]…(d n-1, d n ] C1:Ci:CSC1:Ci:CS q 11 : q i1 : q S1 ………………………… q 1r : q ir : q Sr ………………………… q 1n : q in : q Sn M 1+ : M i+ : M S+ Interval Total M +1 …M +r …M +n M 22

FCTA 2016 Porto, Portugal, 9-11 November 2016 Selection of a Rule Base  Order rules by performance.  Either select the best r rules or the best r/m rules per class.  r is either given or is determined automatically such that all patterns are covered. 23

FCTA 2016 Porto, Portugal, 9-11 November 2016 Computing the Error Signal Rule Error: Fuzzy Error ( jth output): xy c1c1 c2c2 R1R1 R3R3 R2R2 Error Signal 24

FCTA 2016 Porto, Portugal, 9-11 November 2016 Training of Fuzzy Sets Observing the error on a validation set Algorithm: repeat for (all patterns) do accumulate parameter updates; accumulate error; end; modify parameters; until (no change in error); local minimum 25

FCTA 2016 Porto, Portugal, 9-11 November 2016 Constraints for Training Fuzzy Sets Valid parameter values Non-empty intersection of adjacent fuzzy sets Keep relative positions Maintain symmetry Complete coverage (degrees of membership add up to 1 for each element) Correcting a partition after modifying the parameters 26

FCTA 2016 Porto, Portugal, 9-11 November 2016 DatasetMinimum skewnessMaximum skewness WXYZWXYZ LOW-100, MED-100, HIGH-35, Table 3:Minimum and maximum skewness observed for each Synthetic dataset Skewness of Feature Values 27

FCTA 2016 Porto, Portugal, 9-11 November 2016 Figure 11: Pairplots showing degrees of skew and overlapping for LOW-100,100 Low Skewed Data 28

FCTA 2016 Porto, Portugal, 9-11 November 2016 Figure 10: Pairplots showing degrees of skew and overlapping for MED-100,20 Medium Skewed Data 29

FCTA 2016 Porto, Portugal, 9-11 November 2016 Figure 10: Pairplots showing degrees of skew and overlapping for HIGH-35,8 High Skewed Data 30

FCTA 2016 Porto, Portugal, 9-11 November 2016 LOW-100,100 MED-100,3 HIGH-35,8 Figure 10: Pairplots showing different degrees of skew for the various data sets 31

FCTA 2016 Porto, Portugal, 9-11 November 2016 Dataset Degrees of Freedom Level of Skewness ClassA MeanClassB MeanClassC Mean LOW-100, , 100 Low [1.0, 1.8, 2.3, 2.5][1.5, 2.3, 2.8, 3.0][2.0, 2.8, 3.3, 3.5] MED-100,3 100, 3Medium [1.0, 1.9, 3.3, 2.6][1.6, 2.4, 2.8, 3.0][2.1, 2.9, 3.4, 3.6] HIGH-35,8 35, 8High [1.3, 2.0, 2.5, 2.7][1.8, 2.6, 3.0, 3.3][2.2, 3.0, 3.5, 3.8] Table 1: Randomly generated F-distribution datasets with different degrees of freedom Datasets Characteristics Label W XYZ Class A Class B Class C Table 2: Median per feature for generated F-distribution 32

FCTA 2016 Porto, Portugal, 9-11 November 2016 DiscretizationLOW-100,100MED-100,20HIGH-35,8 EQUAL-WIDTH22.63 ± ± ± 6.77 MME25.67 ± ± ± 2.78 CAIM24.13 ± ± ± 3.08 EQUAL-WIDTH shows higher mean and variability of misclassification rates for medium and high skewed datasets. 33 Summary of Misclassification Rates

FCTA 2016 Porto, Portugal, 9-11 November 2016 DiscretizationLOW-100,100MED-100,20HIGH-35,8 EQUAL-WIDTH MME CAIM Summary of Number of Rules