Download presentation
Presentation is loading. Please wait.
Published byKerry Porter Modified over 9 years ago
1
1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R. Gaddam, Vir V. Phoha, Senior Member, IEEE, and Kiran S. Balagani Reporter : Tze Ho-Lin 2007/7/4 TKDE, 2007
2
2 Outline Motivation Objectives Methodology: K-Means+ID3 Experiments Conclusion Personal Comments
3
3 Motivation The ADS related studies cited above have two drawbacks: 1) these works evaluate the performance of anomaly detection methods on the measurements drawn from one application domain. 2) the studies build anomaly detection methods with single machine learning techniques like artificial neural-networks, pattern matching, etc. While recent advances in machine learning show that fusion, selection, and cascading of multiple machine learning methods have a better performance yield over individual methods.
4
4 Objectives We present “K-Means+ID3”, a method to cascade k-Means clustering and the ID3 decision tree bearning methods for classifying anomalous and normal activities in a computer network, an active electronic circuit, and a mechanical mass- beam system. K-Means clusteringID3 decision tree
5
5 Methodology-K-Means+ID3 1. Training 1. Partition the training space into k disjoint clusters C 1, C 2,…,C k. 2. ID3 decision tree is trained with the instances in each K-Means cluster. 2. Testing 1. Candidate Selection phase Candidate Selection phase 2. Candidate Combination phase Candidate Combination phase
6
6 Methodology-Candidate Selection phase
7
7 Methodology-Candidate Combination phase 1. Harden the anomaly scores of the K-Means method by using the Threshold Rule. 2. Nearest-Consensus Rule 3. Nearest-Neighbor Rule (ID3) In their experiments, the threshold is set to 0.5
8
8 Experiments Detection accuracy or true positive rate (TPR), False positive rate (FPR) Precision a/(a+c) Total accuracy (or accuracy) (a+d)/(a+b+c+d) F-measure 2a/(2a+b+c) Receiver operating characteristic (ROC) curves and areas under ROC curves (AUCs).
9
9 Experiments-Data Sets Network Anomaly Data (NAD) Duffing Equation Data (DED) Mechanical Systems Data (MSD)
10
10 Conclusion The K-Means+ID3 method outperforms the individual k-Means and the ID3 in terms of all the six performance measures over the NAD-1998 data sets. The K-Means+ID3 method has a very high detection accuracy (99.12%) and AUC performance(0.96) over the NAD-1999 data sets. The K-Means+ID3 method shows better FPR and precision performance as compared to the k-Means and ID3 over the NAD-2000. The FPR, Precision, and the F-measure of the K-Means+ID3 is higher than the k-Means method and lower than the ID3 methods over the NAD. The K-Means+ID3 method has the highest Precision and F-measure values over the MSD.
11
11 Personal Comments Application Anomaly Detection System Advantage It certainly has better performance than individual methods. Disadvantage Parameter selection problem
12
12 NAD-1998
13
13 NAD-1999
14
14 NAD-2000
15
15 DED
16
16 MSD
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.