Active Learning Intrusion Detection using k-Means Clustering Selection

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Ensemble Learning – Bagging, Boosting, and Stacking, and other topics
Rerun of machine learning Clustering and pattern recognition.
Random Forest Predrag Radenković 3237/10
Data Mining Classification: Alternative Techniques
Date : 21 st of May, Shri Ramdeo Baba College of Engineering and Management Presentation By : Rimjhim Singh Under the Guidance of: Dr. M.B. Chandak.
UCI KDD Archive University of California at Irvine –
Properties of Machine Learning Applications for Use in Metamorphic Testing Chris Murphy, Gail Kaiser, Lifeng Hu, Leon Wu Columbia University.
Ensemble Learning: An Introduction
Unsupervised Intrusion Detection Using Clustering Approach Muhammet Kabukçu Sefa Kılıç Ferhat Kutlu Teoman Toraman 1/29.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Three kinds of learning
Ensemble-based Adaptive Intrusion Detection Wei Fan IBM T.J.Watson Research Salvatore J. Stolfo Columbia University.
Mining Behavior Models Wenke Lee College of Computing Georgia Institute of Technology.
Sparse vs. Ensemble Approaches to Supervised Learning
Machine Learning as Applied to Intrusion Detection By Christine Fossaceca.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai.
A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Jay Stokes, Microsoft Research John Platt, Microsoft Research Joseph Kravis, Microsoft Network Security Michael Shilman, ChatterPop, Inc. ALADIN: Active.
Intrusion Detection Using Neural Networks and Support Vector Machine
Chapter 13 Genetic Algorithms. 2 Data Mining Techniques So Far… Chapter 5 – Statistics Chapter 6 – Decision Trees Chapter 7 – Neural Networks Chapter.
Data mining and machine learning A brief introduction.
1 Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark H. Güneş Kayacık Nur Zincir-Heywood Malcolm I. Heywood.
Full model selection with heuristic search: a first approach with PSO Hugo Jair Escalante Computer Science Department, Instituto Nacional de Astrofísica,
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
An Overview of Intrusion Detection Using Soft Computing Archana Sapkota Palden Lama CS591 Fall 2009.
Implementation of Machine Learning and Chaos Combination for Improving Attack Detection Accuracy on Intrusion Detection System (IDS) Bisyron Wahyudi Kalamullah.
Biologically Inspired Defenses against Computer Viruses International Joint Conference on Artificial Intelligence 95’ J.O. Kephart et al.
ICNSC 2007Slide 1 A Novel Soft Computing Model Using Adaptive Neuro-Fuzzy Inference System for Intrusion Detection Authors: A. Nadjaran Toosi;
Classification and Novel Class Detection in Data Streams Classification and Novel Class Detection in Data Streams Mehedy Masud 1, Latifur Khan 1, Jing.
COMPSCI 726 Sumeet Outside the Closed World: On Using Machine Learning for Network Intrusion Detection Robin Sommer and Vern Paxson.
Apache Mahout Qiaodi Zhuang Xijing Zhang.
Neural Network Application for Fault Analysis
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Anomaly Detection. Network Intrusion Detection Techniques. Ştefan-Iulian Handra Dept. of Computer Science Polytechnic University of Timișoara June 2010.
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
A Generic Approach to Big Data Alarms Prioritization
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Introductory Seminar on Research: Fall 2017
Estimating Link Signatures with Machine Learning Algorithms
Basic machine learning background with Python scikit-learn
An Enhanced Support Vector Machine Model for Intrusion Detection
Machine Learning Week 1.
Combining Base Learners
PEBL: Web Page Classification without Negative Examples
Efficient Image Classification on Vertically Decomposed Data
A survey of network anomaly detection techniques
Adversarial Evasion-Resilient Hardware Malware Detectors
Security and Trustworthiness in Cloud Computing
RHMD: Evasion-Resilient Hardware Malware Detectors
Remah Alshinina and Khaled Elleithy DISCRIMINATOR NETWORK
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier.
Semi-Supervised Learning
Modeling IDS using hybrid intelligent systems
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
USING NLP TO MAKE UNSTRUCTURED DATA HIGHLY ACCESSABLE
Presentation transcript:

Active Learning Intrusion Detection using k-Means Clustering Selection Steven McElwee IEEE SoutheastCon 2017 April 1, 2017

Problem Too much data for intrusion detection… Human fatigue and accuracy Unlabeled data Overfitting Evolving adversarial tactics Evasion IEEE SoutheastCon 2017

Machine Learning in Intrusion Detection 2001 Anomaly detection with ADAM [4] 1985 Seminal work in intrusion detection [1][2] 2006 Neural networks for worm detection [9] 1980s 1990s 2000s 2010s 1999 Decision trees [3] 2004 Genetic algorithms [8] 2003 Clustering IDS alarms [6] Bayesian classification [7] 2002 Support Vector Machines (SVM) [5] IEEE SoutheastCon 2017

Challenges of Machine Learning for Intrusion Detection EVA-SION OVER- FIT Evasion can reduce detection accuracy to 33% [10] Overfitting makes machine learning like signature detection [11] IEEE SoutheastCon 2017

Random Forest Classification SAMPLING RANDOM RECORDS AND FEATURES ORIGINAL DATASET  Prediction IEEE SoutheastCon 2017

Active Learning Learning Algorithm Oracle ? query Learning Algorithm Oracle response A Objective – Minimize the number of queries to the oracle while maximizing amount of labels IEEE SoutheastCon 2017

Active Learning Intrusion Detection System (ALIDS) Prototype Unlabeled KDD Cup 99 Dataset Labeled KDD Cup 99 Dataset input ALIDS Random Forest Classifier uncertain results Active Learning Trainer Simulated “Oracle” query updated model output Classified Dataset IEEE SoutheastCon 2017

Testing Approach Full Dataset Simulated Days KDD Cup 99 1 2 3 4 Start Loop KDD Cup 99 Contains 49 days of labeled network data 1 Load records in simulated day 2 3 4 More records exist? N 5 6 Select sample using k-Means Y 7 Load and classify record 8 9 Query the oracle 10 Confidence > 95%? Add to master training dataset Y 11 Retrain random forest 12 … N Add to candidate dataset 49 End Loop IEEE SoutheastCon 2017

Results Summary Up to 300 records per simulated day sent to oracle 0.31% of total records sent to oracle for labeling 91% of normal records identified Identified 15 of 23 possible labels IEEE SoutheastCon 2017

Contributions 1 2 3 4 Active learning can be used to reduce manual human labeling of intrusion datasets Enhanced resilience through human review and ensemble learning Active learning IDS is most applicable to separating normal from abnormal records Working prototype for building upon active learning and evasion of machine learners Code and results available at: https://github.com/stevenmcelwee/alids IEEE SoutheastCon 2017

References [1] D. E. Denning and P. G. Neumann, “Requirements and model for IDES—a real-time intrusion detection expert system,” Document A005, SRI International, 333, 1985. [2] D. E. Denning, “An intrusion-detection model,” IEEE Trans. on Software Engineering, vol. 2, pp. 222-232, 1987. [3] C. Sinclair, L. Pierce, and S. Matzner, “An application of machine learning to network intrusion detection,” IEEE Proc. 15th Annual Computer Security Applications Conf., pp. 371-377, 1999. [4] D. Barbará, J. Couto, S. Jajodia, L. Popyack, and N. Wu, “ADAM: Detecting intrusions by data mining,” Proc. IEEE Workshop on Information Assurance and Security, pp.11-16, 2001. [5] S. Mukkamala, G. Janoski, and A. Sung, “Intrusion detection using neural networks and support vector machines,” Proc. Intl. Joint Conf. Neural Networks, vol. 2, pp. 1702-1707, 2002. [6] K. Julisch, “Clustering intrusion detection alarms to support root cause analysis” ACM Trans. Information and System Security, vol. 6, no. 4, pp. 443-471, 2003. [7] C. Kruegel, D. Mutz, W. Robertson, and F. Valeur, (2003). “Bayesian event classification for intrusion detection,” IEEE Proc. 19th Annual Computer Security Applications Conf., pp. 14-23, 2003. [8] W. Li, “Using genetic algorithm for network intrusion detection,” Proc. of the U.S. Department of Energy Cyber Security Group, pp. 1-8, 2004. IEEE SoutheastCon 2017

References (continued) [9] D. Stopel, Z. Boger, R. Moskovitch, Y. Shahar, and Y. Elovici, “Application of artificial neural networks techniques to computer worm detection,” Intl. Joint Conf. Neural Networks, pp. 2362-2369, 2006. [10] N. Šrndić and P. Laskov, (2014). “Practical evasion of a learning-based classifier: A case study,” IEEE Symp. Security and Privacy, pp. 197-211, 2014. [11] R. Sommer and V. Paxson, “Outside the closed world: On using machine learning for network intrusion detection,” IEEE Symp. Security and Privacy, pp. 305- 316, 2010. IEEE SoutheastCon 2017