Active Learning Intrusion Detection using k-Means Clustering Selection

Active Learning Intrusion Detection using k-Means Clustering Selection
Steven McElwee IEEE SoutheastCon 2017 April 1, 2017

Problem Too much data for intrusion detection…
Human fatigue and accuracy Unlabeled data Overfitting Evolving adversarial tactics Evasion IEEE SoutheastCon 2017

Machine Learning in Intrusion Detection
2001 Anomaly detection with ADAM [4] 1985 Seminal work in intrusion detection [1][2] 2006 Neural networks for worm detection [9] 1980s 1990s 2000s 2010s 1999 Decision trees [3] 2004 Genetic algorithms [8] 2003 Clustering IDS alarms [6] Bayesian classification [7] 2002 Support Vector Machines (SVM) [5] IEEE SoutheastCon 2017

Challenges of Machine Learning for Intrusion Detection
EVA-SION OVER- FIT Evasion can reduce detection accuracy to 33% [10] Overfitting makes machine learning like signature detection [11] IEEE SoutheastCon 2017

Random Forest Classification
SAMPLING RANDOM RECORDS AND FEATURES ORIGINAL DATASET  Prediction IEEE SoutheastCon 2017

Active Learning Learning Algorithm Oracle
? query Learning Algorithm Oracle response A Objective – Minimize the number of queries to the oracle while maximizing amount of labels IEEE SoutheastCon 2017

Active Learning Intrusion Detection System (ALIDS) Prototype
Unlabeled KDD Cup 99 Dataset Labeled KDD Cup 99 Dataset input ALIDS Random Forest Classifier uncertain results Active Learning Trainer Simulated “Oracle” query updated model output Classified Dataset IEEE SoutheastCon 2017

Testing Approach Full Dataset Simulated Days KDD Cup 99 1 2 3 4
Start Loop KDD Cup 99 Contains 49 days of labeled network data 1 Load records in simulated day 2 3 4 More records exist? N 5 6 Select sample using k-Means Y 7 Load and classify record 8 9 Query the oracle 10 Confidence > 95%? Add to master training dataset Y 11 Retrain random forest 12 … N Add to candidate dataset 49 End Loop IEEE SoutheastCon 2017

Results Summary Up to 300 records per simulated day sent to oracle
0.31% of total records sent to oracle for labeling 91% of normal records identified Identified 15 of 23 possible labels IEEE SoutheastCon 2017

Contributions 1 2 3 4 Active learning can be used to reduce manual human labeling of intrusion datasets Enhanced resilience through human review and ensemble learning Active learning IDS is most applicable to separating normal from abnormal records Working prototype for building upon active learning and evasion of machine learners Code and results available at: IEEE SoutheastCon 2017

References [1] D. E. Denning and P. G. Neumann, “Requirements and model for IDES—a real-time intrusion detection expert system,” Document A005, SRI International, 333, [2] D. E. Denning, “An intrusion-detection model,” IEEE Trans. on Software Engineering, vol. 2, pp , [3] C. Sinclair, L. Pierce, and S. Matzner, “An application of machine learning to network intrusion detection,” IEEE Proc. 15th Annual Computer Security Applications Conf., pp , [4] D. Barbará, J. Couto, S. Jajodia, L. Popyack, and N. Wu, “ADAM: Detecting intrusions by data mining,” Proc. IEEE Workshop on Information Assurance and Security, pp.11-16, 2001. [5] S. Mukkamala, G. Janoski, and A. Sung, “Intrusion detection using neural networks and support vector machines,” Proc. Intl. Joint Conf. Neural Networks, vol. 2, pp , [6] K. Julisch, “Clustering intrusion detection alarms to support root cause analysis” ACM Trans. Information and System Security, vol. 6, no. 4, pp , [7] C. Kruegel, D. Mutz, W. Robertson, and F. Valeur, (2003). “Bayesian event classification for intrusion detection,” IEEE Proc. 19th Annual Computer Security Applications Conf., pp , [8] W. Li, “Using genetic algorithm for network intrusion detection,” Proc. of the U.S. Department of Energy Cyber Security Group, pp. 1-8, 2004. IEEE SoutheastCon 2017

References (continued)
[9] D. Stopel, Z. Boger, R. Moskovitch, Y. Shahar, and Y. Elovici, “Application of artificial neural networks techniques to computer worm detection,” Intl. Joint Conf. Neural Networks, pp , [10] N. Šrndić and P. Laskov, (2014). “Practical evasion of a learning-based classifier: A case study,” IEEE Symp. Security and Privacy, pp , 2014. [11] R. Sommer and V. Paxson, “Outside the closed world: On using machine learning for network intrusion detection,” IEEE Symp. Security and Privacy, pp , 2010. IEEE SoutheastCon 2017

Active Learning Intrusion Detection using k-Means Clustering Selection

Similar presentations

Presentation on theme: "Active Learning Intrusion Detection using k-Means Clustering Selection"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Active Learning Intrusion Detection using k-Means Clustering Selection

Similar presentations

Presentation on theme: "Active Learning Intrusion Detection using k-Means Clustering Selection"— Presentation transcript:

Similar presentations

About project

Feedback