Presentation is loading. Please wait.

Presentation is loading. Please wait.

InCob 20091 A particle swarm based hybrid system for imbalanced medical data sampling Pengyi Yang School of Information Technologies.

Similar presentations


Presentation on theme: "InCob 20091 A particle swarm based hybrid system for imbalanced medical data sampling Pengyi Yang School of Information Technologies."— Presentation transcript:

1 InCob 20091 A particle swarm based hybrid system for imbalanced medical data sampling Pengyi Yang yangpy@it.usyd.edu.au School of Information Technologies The University of Sydney, NSW 2006, Australia NICTA, Australian Technology Park, Eveleigh NSW 2015, Australia

2 InCob 20092 Road map Imbalanced class distribution in medical data Sampling –Over-sampling –Under-sampling Convert feature selection techniques as sampling strategy System overview Results Conclusion

3 InCob 20093 Imbalanced class distribution in medical data Medical data are commonly with imbalanced class distribution Why? -Positive samples are special cases (rare) while negative samples are abundant (Reversely, only positive samples are collected) -Data contain subtypes each with limited samples subtypes

4 InCob 20094 Problem… Building classification model with imbalanced dataset will cause the under represented class been overlooked or even ignored. Yet, the rare classes often carry important biological implication. The difficulty becomes: how to remedy the imbalanced class distribution.

5 InCob 20095 Remedy Via sampling – before model building process –Over sampling: increase sample size of minority class (could introduce noise and redundancy) –Under sampling: decrease sample size of majority class (could remove representative samples) Via cost-sensitive learning – within model building process –Need to choose an appropriate cost-metric (hard to determine a prior)

6 InCob 20096 Current methods The most straightforward way – random over- sampling and under-sampling –Naive method but work well in different situations Clustering and sampling –Clustering dataset and sampling according to the characteristic of each cluster Synthesizing new examples –Most popular is “smote” which creates “artificial” samples to increase the size of minority class

7 InCob 20097 Our contribution – Proposing a novel sampling strategy Convert feature selection technique as sampling strategy –Selecting a subset of “optimal” samples from majority class Supervised sample selection (imbalanced dataset) (balanced dataset)

8 InCob 20098 classifierpredictionranking high low imbalanced balanced majority minority particle swarm optimization optimization test set minority sample majority sample train predict Conceptual representation

9 InCob 20099 Particle swarm optimization Problem encoding Each particle is a subset of samples from the majority class sample 1 sample 2 … sample m m is the sample size of majority class

10 InCob 200910 Final Schema

11 InCob 200911 Results (1) PSO achieved better classification results. classification results. (2) Different evaluation metrics could gives metrics could gives a different evaluation a different evaluation indication. indication.

12 InCob 200912 Results continue (3) Different classifiers also perform differently within the same sampling method

13 InCob 200913 Key observation The evaluation of data sampling strategy is compounded by the type of classifier applied and the evaluation metric used. Therefore, caution should be drawn when the conclusion is made on the basis of a single type of classifier or evaluation metric.

14 InCob 200914 Conclusion The study shows that with proper modification feature selection techniques can be applied to sampling of imbalanced data. The application of such technique to medical domain demonstrates it can help to increase the classification accuracy which is valuable to prediction or decision support systems.

15 InCob 200915 Publication P Yang, L Hsu, B. Zhou, Z. Zhang, A Zomaya, A particle swarm based hybrid system for imbalanced medical data sampling, accepted by BMC Genomics.

16 InCob 200916 Questions!


Download ppt "InCob 20091 A particle swarm based hybrid system for imbalanced medical data sampling Pengyi Yang School of Information Technologies."

Similar presentations


Ads by Google