Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark H. Güneş Kayacık Nur Zincir-Heywood Malcolm I. Heywood.

Similar presentations


Presentation on theme: "1 Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark H. Güneş Kayacık Nur Zincir-Heywood Malcolm I. Heywood."— Presentation transcript:

1 1 Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark H. Güneş Kayacık Nur Zincir-Heywood Malcolm I. Heywood

2 2 Motivation Machine learning in detection. Raw data  High level events Need a set of features Not “any” feature, “good” features How do we quantify “good”?

3 3 The Data DARPA 98 and 99 datasets. Simulated activity. Network traffic  connection records 41 feature per connection. DoS1 DoS2 Normal

4 4 The Data 494,000 connections in dataset. 23 Class Labels  22 Attacks (DoS, probe, content based)  “Normal” 41 Features (few examples)  Duration  Service  Protocol  Data transfer  Failed login attempts  FTP commands  Root shells  “Su” attempts

5 5 Previous IDS Work Decision trees, neural nets, clustering, SVM, EC High detection (98%) Low FP (0.5%) Some attacks are detected better than others. Our task: Substantiate the performance of detectors.

6 6 Information Gain Used in decision trees. Which feature leads to the purest branching? Gain (“Temperature”) = 0.571 Gain (“Windy”) = 0.02 Gain (“Humidity”) = 0.971 From Data Mining Course at KDNuggets site [http://www.kdnuggets.com/dmcourse/data_mining_course]

7 7 Methodology Classes: 22 Attacks + 1 Normal Binary classification (Why?) 23 Info. Gains per feature (vs. 1 Info Gain per feature) 1, 0.5, 90, 8Class A 3, 0.01, 7, 9Class B 2, 0.1,, 7, 10Class A 5, 0.2, 10, 1Class C 1 0 1 0 For Class A:

8 8 Max. Information Gain Some relevant some not Features 20 and 21

9 9 For each class… Neptune (DoS) + smurf (DoS) + normal = 98%

10 10 Relevant Classes 31/41 most relevant for 3 major classes. 9 features contributed very little. Relevant Features  Connection Size  Diff. Service Rate  Connection state

11 11 Conclusions Relevance analysis on KDD 99 dataset. Relevance  Information gain. Key Points  Easy to classify 3 major classes.  Few features highly useful.  Few features completely useless. New measures and extended analysis.

12 12 Thank You! You can find more information about our research at: www.cs.dal.ca/projectx.www.cs.dal.ca/projectx


Download ppt "1 Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark H. Güneş Kayacık Nur Zincir-Heywood Malcolm I. Heywood."

Similar presentations


Ads by Google