Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Learning from imbalanced data in surveillance of nosocomial.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Learning from imbalanced data in surveillance of nosocomial."— Presentation transcript:

1 Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Learning from imbalanced data in surveillance of nosocomial infection Advisor : Dr. Hsu Presenter : Ai-Chen Liao Authors : Gilles Cohen, Melanie Hilario, Hugo Sax, Stephane Hugonnet, Antoine Geissbuhler 2006. Artificial Intelligence in Medicine. Page(s) : 7 - 18 positive or infected(11%) and negative(89%) cases

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Method  Strategies for handling imbalanced data  Prototype-based resampling  Overview of support vector classification  Asymmetrical margin support vector classification Experimental Result Conclusion Comments

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation  An important problem that arises in hospitals is the monitoring and detection of nosocomial or hospital acquired infections (NIs).  The gold standard is hospital-wide prospective surveillance. The method is labor-intensive, infeasible at a hospital level.

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objective  Our goal is to identify patients with one or more NIs on the basis of clinical and other data collected during the prevalence survey.

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Method ─ Strategies for handling imbalanced data 將資料進行事前的處理 Resampling  upsizing the minority class (oversampling)  downsizing the majority class (undersampling) 修正學習演算法來處理 imbalanced data  The first is aimed at eliminating or at least attenuating class imbalance before the leaning process.  The second adjusts the learning algorithm’s bias to allow it to learn despite the handicap of imbalanced data.

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 Method ─ Prototype-based resampling  A selected class is subclustered and the resulting prototypes are reintroduced as synthetic cases.  The key difference is that in the downsizing approach, the synthetic case are used to replace all the original majority class members.  We ran K-menas clustering on the training instances of this class with K=N min, the size of the minority class.

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 Method ─ Prototype-based resampling  The second variant involves oversampling the minority class using agglomerative hierarchical clustering (AHC).

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 Overview of support vector classification

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 Overview of support vector classification 訓練資料往往會有重疊的情況發生,因此無法使用剛性邊 界限度的方式, 柔性邊界限度 (soft margin) 來解決線性不可 分離的情形。

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Overview of support vector classification 針對非線性函數的問題,發現如果將原始資料透過非線性的映 射函數 Φ 轉換到另外一個較高維度的特徵空間 (Feature Space) 中 ( Φ : Rd → F ) ,然後在特徵空間上執行線性分類,可以獲 得更好的正確率

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 Asymmetrical margin support vector classification The above formulation of the SVM is inappropriate in two common situations:  In the case of unbalanced distributions  Whenever misclassifications must be penalized more heavily for one class than for the other The basic idea is to introduce different error weights C + and C - for the positive and the negative class respectively.

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 12 Experimental Results In two-class problems: the accuracy rate on the positives, called sensitivity, is defined as sensitivity: TP/(TP+FN) the accuracy rate on the negative, also known as specificity, is specificity: TN/(TN+FP)

13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 13 Experimental Results

14 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 14 Experimental Results

15 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 15 Conclusion  Our novel resampling strategies perform remarkably better than classical random resampling.  They are outperformed by asymmetrical soft margin support vector machines which attained a sensitivity rate of 92%, significantly better than the highest sensitivity (87%) obtained via prototype-based resampling.

16 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 16 Comments Advantage  … Drawback  … Application  Handling imbalanced data


Download ppt "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Learning from imbalanced data in surveillance of nosocomial."

Similar presentations


Ads by Google