1 Survey Presentation on Four Selected Research Papers on Data Mining Based Intrusion Detection System 60-564: Security and Privacy on the Internet Instructor:

Slides:

Advertisements

Similar presentations

Critical Reading Strategies: Overview of Research Process

Advertisements

Loss-Sensitive Decision Rules for Intrusion Detection and Response Linda Zhao Statistics Department University of Pennsylvania Joint work with I. Lee,

Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.

Decision Tree Approach in Data Mining

Date : 21 st of May, Shri Ramdeo Baba College of Engineering and Management Presentation By : Rimjhim Singh Under the Guidance of: Dr. M.B. Chandak.

Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)

Data Mining and Intrusion Detection

Cyber Threat Analysis  Intrusions are actions that attempt to bypass security mechanisms of computer systems  Intrusions are caused by:  Attackers accessing.

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

IDS/IPS Definition and Classification

Chapter 9 Business Intelligence Systems

 Firewalls and Application Level Gateways (ALGs)  Usually configured to protect from at least two types of attack ▪ Control sites which local users.

Report on Intrusion Detection and Data Fusion By Ganesh Godavari.

Unsupervised Intrusion Detection Using Clustering Approach Muhammet Kabukçu Sefa Kılıç Ferhat Kutlu Teoman Toraman 1/29.

Statistical based IDS background introduction. Statistical IDS background Why do we do this project Attack introduction IDS architecture Data description.

Lesson 13-Intrusion Detection. Overview Define the types of Intrusion Detection Systems (IDS). Set up an IDS. Manage an IDS. Understand intrusion prevention.

Copyright 2002, Center for Secure Information Systems 1 Panel: Role of Data Mining in Cyber Threat Analysis Professor Sushil Jajodia Center for Secure.

Mining Behavior Models Wenke Lee College of Computing Georgia Institute of Technology.

ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.

Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.

Department Of Computer Engineering

INTRUSION DETECTION SYSTEMS Tristan Walters Rayce West.

Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)

1. Introduction Generally Intrusion Detection Systems (IDSs), as special-purpose devices to detect network anomalies and attacks, are using two approaches.

Data Mining Techniques

Data Mining for Intrusion Detection: A Critical Review Klaus Julisch From: Applications of data Mining in Computer Security (Eds. D. Barabara and S. Jajodia)

Combining Supervised and Unsupervised Learning for Zero-Day Malware Detection © 2013 Narus, Inc. Prakash Comar 1 Lei Liu 1 Sabyasachi (Saby) Saha 2 Pang-Ning.

Chirag N. Modi and Prof. Dhiren R. Patel NIT Surat, India Ph. D Colloquium, CSI-2011 Signature Apriori based Network.

A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,

SharePoint 2010 Business Intelligence Module 6: Analysis Services.

Where Are the Nuggets in System Audit Data? Wenke Lee College of Computing Georgia Institute of Technology.

A Statistical Anomaly Detection Technique based on Three Different Network Features Yuji Waizumi Tohoku Univ.

Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.

Alert Correlation for Extracting Attack Strategies Authors: B. Zhu and A. A. Ghorbani Source: IJNS review paper Reporter: Chun-Ta Li ( 李俊達 )

Intrusion Detection for Grid and Cloud Computing Author Kleber Vieira, Alexandre Schulter, Carlos Becker Westphall, and Carla Merkle Westphall Federal.

Discovering Outlier Filtering Rules from Unlabeled Data Author: Kenji Yamanishi & Jun-ichi Takeuchi Advisor: Dr. Hsu Graduate: Chia- Hsien Wu.

Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,

Detecting Network Violation Based on Fuzzy Class-Association-Rule Mining Using Genetic Network Programming.

Network Intrusion Detection Using Random Forests Jiong Zhang Mohammad Zulkernine School of Computing Queen's University Kingston, Ontario, Canada.

Layered Approach using Conditional Random Fields For Intrusion Detection.

Honeypot and Intrusion Detection System

Bug Localization with Machine Learning Techniques Wujie Zheng

Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.

Basic Data Mining Technique

Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.

Report on Intrusion Detection and Data Fusion By Ganesh Godavari.

An Overview of Intrusion Detection Using Soft Computing Archana Sapkota Palden Lama CS591 Fall 2009.

Learning Rules for Anomaly Detection of Hostile Network Traffic Matthew V. Mahoney and Philip K. Chan Florida Institute of Technology.

Mining Reference Tables for Automatic Text Segmentation Eugene Agichtein Columbia University Venkatesh Ganti Microsoft Research.

Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,

Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.

3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.

Anomaly Detection in Data Mining. Hybrid Approach between Filtering- and-refinement and DBSCAN Eng. Ştefan-Iulian Handra Prof. Dr. Eng. Horia Cioc ârlie.

Intrusion Detection System (IDS). What Is Intrusion Detection Intrusion Detection is the process of identifying and responding to malicious activity targeted.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.

DETECTING TARGETED ATTACKS USING SHADOW HONEYPOTS AUTHORS: K. G. Anagnostakisy, S. Sidiroglouz, P. Akritidis, K. Xinidis, E. Markatos, A. D. Keromytisz.

Cryptography and Network Security Sixth Edition by William Stallings.

Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

Intrusion Detection Systems Paper written detailing importance of audit data in detecting misuse + user behavior 1984-SRI int’l develop method of.

Data Mining and Decision Support

Anomaly Detection. Network Intrusion Detection Techniques. Ştefan-Iulian Handra Dept. of Computer Science Polytechnic University of Timișoara June 2010.

Network Anomaly Detection Using Autonomous System Flow Aggregates Thienne Johnson 1,2 and Loukas Lazos 1 1 Department of Electrical and Computer Engineering.

1. ABSTRACT Information access through Internet provides intruders various ways of attacking a computer system. Establishment of a safe and strong network.

Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,

Active Learning Intrusion Detection using k-Means Clustering Selection

iSRD Spam Review Detection with Imbalanced Data Distributions

Statistical based IDS background introduction

Intrusion Detection Systems

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

1 Survey Presentation on Four Selected Research Papers on Data Mining Based Intrusion Detection System : Security and Privacy on the Internet Instructor: Dr. A. K. Aggarwal Presented By: Ahmedur Rahman Zillur Rahman Lawangeen Khan Date: April 05, 2006

2 Selected Research Papers  Exploiting Efficient Data Mining Techniques to Enhance Intrusion Detection Systems  Packet Vs Session Based Modeling For Intrusion Detection System  Detecting Denial-of-Service Attacks with Incomplete Audit Data  ADAM: Detecting Intrusions by Data Mining

3 Table of Contents  Introduction  Paper 1  Paper 2  Paper 3  Paper 4  Testing Methodology  Conclusion  Bibliography

4 Introduction  Intrusion detection is a process of gathering intrusion related knowledge occurring in the process of monitoring the events and analyzing them for sign or intrusion.  Detecting the intrusion based on two common practices – Misuse detection and Anomaly detection.  To apply data mining techniques in intrusion detection, first, the collected data needs to be preprocessed and converted to the format suitable for mining processing. Next, the reformatted data will be used to develop a clustering or classification model.

5 Introduction  Paper -1 discusses about various types of data mining based intrusion detection system models.  Paper -2 discusses on packet-based vs. session based modeling for intrusion detection.  Paper -3 discusses on a model named SCAN, which can work with good accuracy even with incomplete dataset.  Paper -4 discusses about ADAM, which was one of the leading data mining based intrusion detection system. Cont.

6 Paper - 1  The main motivation behind using intrusion detection in data mining is automation. Pattern of the normal behavior and pattern of the intrusion can be computed using data mining.  Two Steps for applying data mining technique to intrusion detection: –The collected monitoring data needs to be preprocessed and converted to the format suitable for mining processing. –The reformatted data will be used to develop a clustering or classification model.  Data mining is helpful in detecting new vulnerabilities and intrusions, discover previous unknown patterns of attacker behaviors, and provide decision support for intrusion management.

7 Paper - 1  Data Mining and Intrusion Detection –Different data mining approaches are frequently used to analyze network data to gain intrusion related knowledge. Clustering. Classification. Outlier Detection. Association Rule. Cont.

8 Paper - 1  Clustering –Most of the clustering techniques use some basic steps involved in identifying intrusion. These steps are as follows: Find the largest cluster, i.e., the one with the most number of instances, and label it normal. Sort the remaining clusters in an ascending order of their distances to the largest cluster. Select the first K1 clusters so that the number of data instances in these clusters sum up to ¼ ´N, and label them as normal, where ´ is the percentage of normal instances. Label all the other clusters as attacks. Cont.

9 Paper - 1  Classification –Classification is similar to clustering in that it also partitions customer records into distinct segments called classes. –Unlike clustering, classification analysis requires that the end- user/analyst know ahead of time how classes are defined. –Classifications algorithms can be classified into three types: Extensions to linear discrimination. Decision tree Rule-based methods Cont.

10 Paper - 1  Association –The Association rule is specifically designed for use in data analyses. –The objective behind using association rule based data mining is to derive multi-feature (attribute) correlations from a database table. –Association rule algorithms classified into two categories: Candidate-generation-and-test approach such as Apriori. Pattern-growth approach. –Basic steps for incorporating association rule for intrusion detection as follows. First network data need to be formatted into a database table where each row is an audit record and each column is a field of the audit records. There is evidence that intrusions and user activities shows frequent correlations among network data. Also rules based on network data can continuously merge the rules from a new run to the aggregate rule. Cont.

11 Paper - 1  Data Mining Based IDS: –Data mining is becoming one of the popular techniques for detecting intrusion. IDS can be classified on the basis of their strategy of detection. There are two categories under this classification. Misuse Detection Based IDS. Anomaly Detection Based IDS. Cont.

12 Paper - 1  IDS Using both Misuse and Anomaly Detection: –IDS’s that use both misuse and anomaly intrusion detection techniques. Thus they are capable for detecting both known and unknown intrusions. IIDS (Intelligent Intrusion Detection System Architecture). RIDS-100(Rising Intrusion Detection System). Cont.

13 Paper - 2  In this survey they report the findings of their research in the area of anomaly-based intrusion detection systems using data- mining techniques to create a decision tree model of their network using the 1999 DARPA Intrusion Detection Evaluation data set.

14 Paper - 2  Types Of IDS: –Misuse Detectors –Anomaly Based Detectors  Problem with IDS: –False Negative –False Positive Cont.

15 Paper - 2 Cont.  Dynamic Modeling: –Data Preparation  Studied the data sets’ patterns and modeled the traffic patterns around a generated target variable, “TGT”.  They used this variable as a predictor target variable for setting the stage. –Two separate data sets:  Packet-based Modeling.  Session Based Modeling

16 Paper - 2 Cont.  Classification Tree Modeling: –Advantage:  This method over traditional pattern-recognition methods is that the classification tree is an intuitive tool and relatively easy to interpret.

17 Paper - 2 Cont.  Data Modeling And Assessment: –Accomplish their data-mining goals through the use of supervised learning techniques on the binary target variable, “TGT”. –From the revised data sets they further deleted a few extraneous variables date/time, source IP address, and destination IP address variable. –Partitioned both data sets using stratified random sampling. –The TGT variable was then specified to form subsets of the original data to improve the classification precision of their model. –Created a decision tree using the Chi-Square. –After five leaves of depth, the tree maximizes its profit.

18 Paper - 3  Detecting Denial-of-Service Attacks with Incomplete Audit Data  SCAN (Stochastic Clustering Algorithm for Network anomaly detection)  Improved version of Expectation- Maximization (EM) Algorithm  Can handle missing data in audit dataset

19 Paper - 3 Cont.  Features of SCAN: –Improvement in speed of convergence Combination of Data Summaries, Bloom Filters, Arrays –Ability to detect anomaly in absence of complete audit data EM computes the maximum likelihood estimates in parametric model based on prior information  Components of SCAN: –Online Sampling and Flow Aggregation –Clustering and Data Reduction EM based Clustering Algorithm Data Summaries for Data Reduction Handling Missing Data in a dataset –Anomaly Detection

20 Paper - 3 Cont.  System Model Overview:

21 Paper - 3 Cont.  Online Sampling and Flow Aggregation: –Traffic is sampled and classified into flows Flow is all connection with a particular Dest IP and Dest Port –Connection records provide the following field: (SrcIP,SrcPort,DstIP,DstPort,ConnStatus,Duration) –First identify the flow that packet belongs to –If not found: Generate a new flow ID based on connection information –If found: Corresponding flow array is updated

22 Paper - 3 Cont.  Clustering and Data Reduction –Uses EM based clustering algorithm –Input: Dataset D and initial estimates –Output: Clustered set of Data Points –EM Algorithm has two steps: Expectation Step: Finds the expected value of complete data log likelihood Maximization Step: Maximizes the expectations computed in the E step

23 Paper - 3 Cont.  Clustering and Data Reduction –Data Summaries for Data Reduction –Execute to build summary for each time slice –This enhancement reduce dataset –Consists of 3 stages –1 st Stage: At the end of each time slice they have made a pass over the connection dataset and built data summaries. –2 nd Stage: They made another pass over the dataset to built cluster candidates using the data summaries collected in the previous step. After the frequently appearing data have been identified, the algorithm builds all cluster candidates during this pass. –3 rd Stage: In this stage clusters are selected from the set of candidates.

24 Paper - 3 Cont.  Clustering and Data Reduction –Handling missing data in a dataset –At E step the expected value is computed –At M step that expected value is inserted into the dataset  Anomaly Detection:  Anomalous distribution A  Normal distribution N  Determine which group the traffic belongs to  calculated the likelihood of the two cases to determine the distribution to which the incoming flow belongs to.  Statistical functions

25 Paper - 4  ADAM: Detecting Intrusion by Data Mining –Audit Data Analysis and Mining –Combination of Association Rule and Classification Rule –Firstly, ADAM collects known frequent datasets –Secondly, ADAM runs an online algorithm Finds last frequent connection records Compare them with known mined data Discards those, which seems to be normal Suspicious ones are forwarded to the classifier Trained classifier then classify the suspicious data as one of the following: –Known type of attack –Unknown type of attack –False alarm

26 Paper - 4  ADAM has two phases in their model  1 st Phase: Train the classifier –Offline process –Takes place only once –Before the main experiment  2 nd Phase: Using the trained classifier –Trained classifier is then used to detect anomalies –Online process Cont.

27 Paper - 4  Phase 1: Cont.

28 Paper - 4  Phase 2: Cont.

29 Testing Methodology  Paper 1: –In this paper the authors did not use any testing methodology. They described different kinds of data mining techniques and rules to implement in various kinds of data mining based IDS.  Paper 2: –The authors of this paper used MIT Lincoln Lab 1999 intrusion detection evaluation (IDEVAL) data sets. –From this data set they over-sampled and created a new data set that contained a mix of 43.6% attack sessions and 56.4% non-attack sessions. –From both of data set they took random sampling and allocated 67% of the observations to training data set and 33% to the validation data set.

30 Testing Methodology Cont.  Paper 2: (Continued) –Packet-Based Results They scored the UCF data with the packet-based network model and found that approximately 2.5% of the packets having a probability of of being an attack packet. Conversely, a packet that scored a probability of does not necessarily mean that packet is a “good” packet and poses no threat to their campus networks. The following figure shows that more than 70% of the packets captured have an attack probability of and 97% of the packets have an attack probability of or less.

31 Testing Methodology  Paper 2: (Continued) –Packet-Based Results (Continued) Overall, out of the approximately 500,000 packets with a probability, there are at least 50,000 packets that require further study. Retraining of their model and readjusting the model’s prior probabilities will allow to see if those remaining packets are truly attack packets or just simply false alarms. Cont.

32 Testing Methodology Cont.  Paper 2: (Continued) –Session-Based Results They also scored the UCF data with the session-based network model and found that approximately 32.9% of the sessions were identified as having a probability of of being an attack session. Conversely, a session that scored a probability lower than does not also necessarily mean that session is a “good” session and poses no threat to their campus networks. The vast majority of the sessions captured had a low or non- existent probability of being an attack session. Their studies showed that more than 66% of the sessions captured have an attack probability of

33 Testing Methodology Cont.  Paper 3: –The authors of this paper evaluated SCAN using the 1999 DARPA intrusion detection evaluation data. –The dataset consists of five weeks of TCPDump data. –Data from week 1 and 3 consist of normal attack-free network traffic. –Week 2 data consists of network traffic with labeled attacks. –The week 4 and 5 data are the “Test Data” and contain 201 instances of 58 different unlabelled attacks, 177 of which are visible in the inside TCPDump data. –They trained SCAN on the DARPA dataset using week 1 and 3 data, then evaluated the detector on weeks 4 and 5. –Evaluation was done in an off-line manner.

34 Testing Methodology Cont.  Paper 3: (Continued) –Simulation Results with Complete Audit Data An IDS is evaluated on the basis of accuracy and efficiency. To judge the efficiency and accuracy of SCAN, they used Receiver-Operating Characteristic (ROC) curves. An ROC curve, which graphs the rate of detection versus the false positive ratio. The performance of SCAN at detecting a SSH Process Table attack is shown in Fig. 5. The attack is similar to the Process Table attack in that the goal of the attacker is to cause the SSHD daemon to spawn the maximum number of processes that the operating system will allow.

35 Testing Methodology Cont.  Paper 3: (Continued) –Simulation Results with Complete Audit Data (Continued) In Fig. 6 the performance of SCAN at detecting a SYN flood attack is evaluated. A SYN flood attack is a type of a DoS attack. This causes the data structure in the ‘tcpd’ daemon in the server to overflow. They compared the performance of SCAN with that of the K-Nearest Neighbors (KNN) clustering algorithm. Comparisons of the ROC curves seen in Fig. 5 and 6 suggests that SCAN outperforms the KNN algorithm. The two techniques are similar to each other in that both are unsupervised techniques that work with unlabelled data.

36 Testing Methodology Cont.  Paper 4: –The authors of this paper discussed that ADAM participated in DARPA 1999 intrusion detection evaluation. –It focused on detecting DOS and PROBE attacks from tcpdump data and performed quite well. –The following Figures 3 and 4 show the results of DARPA 1999 test data.

37 Testing Methodology Cont.  Paper 4: (Continued)

38 Conclusion  In this report we have studied the details of four papers in this area.  We have tried to make summary of those four papers, their system models, their technologies and their validation methods.  We did not go through all the cross-references given in those papers rather we kept the scope of this paper limited into these four papers only.  We strongly believe that this paper will be able to give the reader a overview on currently development in this area and how data mining is evolving into the field of network intrusion detection.

39 Bibliography  [1] Chang-Tien Lu, Arnold P. Boedihardjo, Prajwal Manalwar, “Exploiting Efficient Data Mining Techniques to Enhance Intrusion Detection Systems. Information Reuse and Integration, Conf, IRI IEEE International Conference on.Information Reuse and Integration, Conf, IRI IEEE International Conference on.  [2] Caulkins, B.D.; Joohan Lee; Wang, M, “Packet- vs. session-based modeling for intrusion detection system”, Information Technology: Coding and Computing, ITCC International Conference on Volume 1, 4-6 April 2005 Page(s): Vol. 1 Digital Object Identifier /ITCC Information Technology: Coding and Computing, ITCC International Conference on  [3] Patcha, A.; Park, J.-M., “Detecting denial-of-service attacks with incomplete audit data”, Computer Communications and Networks, ICCCN Proceedings. 14th International Conference on Oct Page(s): Digital Object Identifier /ICCCN Computer Communications and Networks, ICCCN Proceedings. 14th International Conference on  [4] Daniel Barbara, Julia Couto, Sushil Jajodia, Leonard Popyack, Ningning Wu, “ADAM: Detecting Intrusions by Data Mining”, Proceedings of the 2001 IEEE Workshop on Information Assurance and Security, United States Military Academy, West Point, NY, 5-6 June 2001.

40 Questions