1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R.

1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R. Gaddam, Vir V. Phoha, Senior Member, IEEE, and Kiran S. Balagani Reporter : Tze Ho-Lin 2007/7/4 TKDE, 2007

2 Outline Motivation Objectives Methodology: K-Means+ID3 Experiments Conclusion Personal Comments

3 Motivation The ADS related studies cited above have two drawbacks:  1) these works evaluate the performance of anomaly detection methods on the measurements drawn from one application domain.  2) the studies build anomaly detection methods with single machine learning techniques like artificial neural-networks, pattern matching, etc. While recent advances in machine learning show that fusion, selection, and cascading of multiple machine learning methods have a better performance yield over individual methods.

4 Objectives We present “K-Means+ID3”, a method to cascade k-Means clustering and the ID3 decision tree bearning methods for classifying anomalous and normal activities in a computer network, an active electronic circuit, and a mechanical mass- beam system. K-Means clusteringID3 decision tree

5 Methodology-K-Means+ID3 1. Training 1. Partition the training space into k disjoint clusters C 1, C 2,…,C k. 2. ID3 decision tree is trained with the instances in each K-Means cluster. 2. Testing 1. Candidate Selection phase Candidate Selection phase 2. Candidate Combination phase Candidate Combination phase

6 Methodology-Candidate Selection phase

7 Methodology-Candidate Combination phase 1. Harden the anomaly scores of the K-Means method by using the Threshold Rule. 2. Nearest-Consensus Rule 3. Nearest-Neighbor Rule (ID3) In their experiments, the threshold is set to 0.5

8 Experiments Detection accuracy or true positive rate (TPR), False positive rate (FPR) Precision a/(a+c) Total accuracy (or accuracy) (a+d)/(a+b+c+d) F-measure 2a/(2a+b+c) Receiver operating characteristic (ROC) curves and areas under ROC curves (AUCs).

9 Experiments-Data Sets Network Anomaly Data (NAD) Duffing Equation Data (DED) Mechanical Systems Data (MSD)

10 Conclusion The K-Means+ID3 method outperforms the individual k-Means and the ID3 in terms of all the six performance measures over the NAD-1998 data sets. The K-Means+ID3 method has a very high detection accuracy (99.12%) and AUC performance(0.96) over the NAD-1999 data sets. The K-Means+ID3 method shows better FPR and precision performance as compared to the k-Means and ID3 over the NAD-2000. The FPR, Precision, and the F-measure of the K-Means+ID3 is higher than the k-Means method and lower than the ID3 methods over the NAD. The K-Means+ID3 method has the highest Precision and F-measure values over the MSD.

11 Personal Comments Application  Anomaly Detection System Advantage  It certainly has better performance than individual methods. Disadvantage  Parameter selection problem

12 NAD-1998

13 NAD-1999

14 NAD-2000

15 DED

16 MSD

1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R.

Similar presentations

Presentation on theme: "1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R.

Similar presentations

Presentation on theme: "1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R."— Presentation transcript:

Similar presentations

About project

Feedback