Download presentation
Presentation is loading. Please wait.
1
Unsupervised Intrusion Detection Using Clustering Approach Muhammet Kabukçu Sefa Kılıç Ferhat Kutlu Teoman Toraman 1/29
2
Outline Introduction Using Clustering for Intrusion Detection Methodology Overall Summary Conclusion References 2/29
3
Introduction Incidents are violations or imminent threats of violation of: * computer security policies, * acceptable use policies, * standard security practices. Intrusion detection is the process of monitoring the events occurring in a computer system or network and analyzing them for signs of possible incidents. 3/29
4
Introduction An intrusion detection system (IDS) is software that automates the intrusion detection process. IDSs are primarily focuses on identifying possible incidents and detecting when an attacker has successfully compromised a system by exploiting vulnerability in the system. 4 /29
5
Methodologies of IDS Technologies Signature- Based Detection Anomaly- Based Detection Stateful Protocol Analysis Introduction 5 /29
6
Signature-Based Detection A signature is a pattern that corresponds to a known threat (e.g. a telnet attempt with a username of "root", which is a violation of an organization's security policy). Signature-based detection is the process of comparing signatures against observed events to identify possible incidents. Advantage: Very effective at detecting known threats. Disadvantage: Ineffective at detecting previously unknown threats. 6 /29
7
Anomaly-Based Detection The process of comparing definitions of what activity is considered normal against observed events to identify significant deviations. Capable of detecting previously unknown threats. Uses host or network-specific profiles. 7 /29
8
Detection by Stateful Protocol Analysis The process of comparing predetermined profiles of generally accepted definitions of benign protocol activity for each protocol state against observed events to identify deviations. Relies on vendor-developed universal profiles that specify how particular protocols should and should not be used. 8 /29
9
Using Clustering for Intrusion Detection Methods other than Signature-Based Detection use data mining and machine learning algorithms to train on labeled network data. For training data, there are two major paradigms: Misuse Detection Anomaly Detection. 9 /29 Which one to use ???
10
Using Clustering for Intrusion Detection - Misuse Detection - In misuse detection, machine learning algorithms are used with labeled data. By using the extracted features from labeled network traffic, network data is classified. By using new data which includes new type of attacks, detection models are retrained. 10 /29
11
In anomaly detection, models are built by training on normal data, deviations are searched over the normal model. Generating purely normal data is very difficult and costly in practice. It is very hard to guarantee that there are no attacks during the time the traffic is collected from the network. 11 /29 Using Clustering for Intrusion Detection - Anomaly Detection -
12
12/29 Using Clustering for Intrusion Detection Use a mechanism to detect intrusions by using unlabeled data as a train model. Find intrusions buried within that data. Misuse Detection Anomaly Detection.
13
A Set of Unlabeled Data Unsupervised Anomaly Detection Algorithm Connection Comparison with Detected Clusters Detected Intrusion Clusters Assumptions for unsupervised anomaly detection algorithm: 1.The intrusions are rare with respect to normal network traffic. 2.The intrusions are different from normal network traffic. As a Result: The intrusions will appear as outliers in the data. Using Clustering for Intrusion Detection Detected malicious attacks 13 /29
14
The unsupervised anomaly detection algorithm clusters the unlabeled data instances together into clusters using a simple distance-based metric. 14 /29 Using Clustering for Intrusion Detection
15
Once data is clustered, all of the instances that appear in small clusters are labeled as anomalies because; The normal instances should form large clusters compared to the intrusions, Malicious intrusions and normal instances are qualitatively different, so they do not fall into the same cluster. 15 /29 Normal cluster Intrusion cluster
16
Methodology 1.Description of the dataset 2.Metric & Normalization 3.Clustering Algorithm a)Portnoy et. al. b)Y-means Algorithm 4.Labeling Clusters 5.Intrusion Detection 16 /29
17
Description of the dataset KDD Cup 1999 Data Main attack categories – DOS: Denial of Service, (e.g. synood) – R2L: Unauthorized access from a remote machine (e.g. guessing password) – U2R: Unauthorized access to local superuser (root) privileges (e.g. various buffer overflow attacks) – Probing: Surveillance and other probing (e.g. port scanning) In total, 24 attack types in training data; 14 additional ones in test data... 17/29
18
Metric & Normalization Euclidean Metric (for distance computation) Feature Normalization (to eliminate the difference in the scale of features) 18/29
19
Clustering Algorithm (Portnoy et. al.)... XiXi Training set Empty set of clusters d1 d2 d3 - d1 is selected. - if d1 < W ( predefined threshold value ), then X i is assigned to that cluster. - else, a new cluster is created, then X i is assigned to it. 19/29
20
Advantage: No need to know the initial no. of clusters. Disadvantage: Need to know W, which may label instances wrong in some cases. However… Clustering Algorithm (Portnoy et. al.) 20/29
21
Clustering Algorithm (Y-means Algorithm) 3 main parts: 1.assigning instances to k clusters 2.splitting clusters 3.merging clusters 21/29
22
1. assigning instances to k clusters Dataset k: no. of clusters n: no. of instances 1 < k < n... redefine cluster centroid Clustering Algorithm (Y-means Algorithm) 22/29
23
2. splitting clusters. Confident area t X i ( instance ). didi t ( normal threshold) = 2.32 σ σ = standard deviation if d i > t, X i is an outlier. New clusters are created firstly with the farthest outliers. Clustering Algorithm (Y-means Algorithm) 23/29
24
3. merging clusters. XiXi If X i is in the confident area of two clusters, merge these clusters back. Clustering Algorithm (Y-means Algorithm) 24/29
25
Labeling Clusters Our first assumption: # of normal instances >> # of intrusions Label instances in large clusters: normal Label instances in small clusters: intrusion Start labeling as normal, until 99% of data is labeled as normal, label rest of them as intrusion. Normal cluster Intrusion cluster 25/29
26
Intrusion Detection For test instance x, Measure the distance to each cluster. Select the nearest cluster C. If C is normal cluster, label x as normal, Otherwise label x as intrusion. 26/29
27
Overall Summary IDS & IDS Technologies Using Clustering for Intrusion Detection Methodology 1.Description of the dataset 2.Metric & Normalization 3.Clustering Algorithm 4.Labeling Clusters 5.Intrusion Detection Conclusion Unsupervised Clustering is choosen. KDD Cup 1999 Data Y-means Algorithm is used for creating ID System. 27/29
28
References [1] KDD Cup 1999 data. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. [2] Y. Guan and A. A. Ghorbani. Y-means: A clustering method for intrusion detection. In Proceedings of Canadian Conference on Electrical and Computer Engineering, pages 1083{1086, 2003. [3] L. Portnoy, E. Eskin, and S. Stolfo. Intrusion detection with unlabeled data using clustering. In Proceedings of ACM CSS Workshop on Data Mining Applied to Security (DMSA-2001), 2001. [4] K. Scarfone and P. Mell. Guide to intrusion detection and prevention systems (idps), 2007. 28/29
29
Questions? 29/29
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.