Classification of class-imbalanced data Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Class Imbalance
Class Imbalance Problem
What is the Class Imbalance Problem? where the total number of a class of data (positive) is far less than the total number of another class of data (negative). This problem is extremely common in practice and can be observed in various disciplines including fraud detection, anomaly detection, medical diagnosis, oil spillage detection, facial recognition, etc
Why is it a problem? Most machine learning algorithms and works best when the number of instances of each classes are roughly equal. When the number of instances of one class far exceeds the other, problems arise.
Solutions to imbalanced learning Sampling methods Cost-sensitive methods Kernel and Active Learning methods
Sampling methods Oversampling Under-sampling by adding more of the minority class so it has more effect on the machine learning algorithm Under-sampling by removing some of the majority class so it has less effect on the machine learning algorithm
Oversampling By oversampling, just duplicating the minority classes could lead the classifier to overfitting to a few examples, which can be illustrated below:
On the left hand side is before oversampling, where as on the right hand side is oversampling has been applied. On the right side, The thick positive signs indicate there are multiple repeated copies of that data instance. The machine learning algorithm then sees these cases many times and thus designs to overfit to these examples specifically, resulting in a blue line boundary as above.
Under-sampling By undersampling, we could risk removing some of the majority class instances which is more representative, thus discarding useful information. This can be illustrated as follows:
Here the green line is the ideal decision boundary we would like to have, and blue is the actual result. On the left side is the result of just applying a general machine learning algorithm without using undersampling. On the right, we undersampled the negative class but removed some informative negative class, and caused the blue decision boundary to be slanted, causing some negative class to be classified as positive class wrongly.
methods require a performance measure to be specified a priori before learning. An alternative is to use a so-called threshold-moving method that a posteriori changes the decision threshold of a model to counteract the imbalance, thus has a potential to adapt to the performance measure of interest. Surprisingly, little attention has been paid to the potential of combining bagging ensemble with threshold-moving. our method preserves the natural class distribution of the data resulting in well calibrated posterior probabilities.
Cost-Sensitive Methods Utilize cost-sensitive methods for imbalanced learning Considering the cost of misclassification Instead of modifying data…
Cost-Sensitive Learning Framework
Cost-Sensitive Dataspace Weighting with Adaptive Boosting
Cost-Sensitive Dataspace Weighting with Adaptive Boosting
Cost-Sensitive Decision Trees Cost-sensitive adjustments for the decision threshold The final decision threshold shall yield the most dominant point on the ROC curve Cost-sensitive considerations for split criteria The impurity function shall be insensitive to unequal costs Cost-sensitive pruning schemes The probability estimate at each node needs improvement to reduce removal of leaves describing the minority concept Laplace smoothing method and Laplace pruning techniques
Cost-Sensitive Neural Network Four ways of applying cost sensitivity in neural networks Modifying probability estimate of outputs Applied only at testing stage Maintain original neural networks Altering outputs directly Bias neural networks during training to focus on expensive class Modify learning rate Set η higher for costly examples and lower for low-cost examples Replacing error-minimizing function Use expected cost minimization function instead
Kernel-based learning framework Based on statistical learning and Vapnik-Chervonenkis (VC) dimensions Problems with Kernel-based support vector machines (SVMs) Support vectors from the minority concept may contribute less to the final hypothesis Optimal hyperplane is also biased toward the majority class To minimize the total error Biased toward the majority
Thank you