Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unsupervised Intrusion Detection Using Clustering Approach Muhammet Kabukçu Sefa Kılıç Ferhat Kutlu Teoman Toraman 1/29.

Similar presentations


Presentation on theme: "Unsupervised Intrusion Detection Using Clustering Approach Muhammet Kabukçu Sefa Kılıç Ferhat Kutlu Teoman Toraman 1/29."— Presentation transcript:

1 Unsupervised Intrusion Detection Using Clustering Approach Muhammet Kabukçu Sefa Kılıç Ferhat Kutlu Teoman Toraman 1/29

2 Outline  Introduction  Using Clustering for Intrusion Detection  Methodology  Overall Summary  Conclusion  References 2/29

3 Introduction Incidents are violations or imminent threats of violation of: * computer security policies, * acceptable use policies, * standard security practices. Intrusion detection is the process of monitoring the events occurring in a computer system or network and analyzing them for signs of possible incidents. 3/29

4 Introduction An intrusion detection system (IDS) is software that automates the intrusion detection process. IDSs are primarily focuses on identifying possible incidents and detecting when an attacker has successfully compromised a system by exploiting vulnerability in the system. 4 /29

5 Methodologies of IDS Technologies Signature- Based Detection Anomaly- Based Detection Stateful Protocol Analysis Introduction 5 /29

6 Signature-Based Detection  A signature is a pattern that corresponds to a known threat (e.g. a telnet attempt with a username of "root", which is a violation of an organization's security policy).  Signature-based detection is the process of comparing signatures against observed events to identify possible incidents. Advantage: Very effective at detecting known threats. Disadvantage: Ineffective at detecting previously unknown threats. 6 /29

7 Anomaly-Based Detection  The process of comparing definitions of what activity is considered normal against observed events to identify significant deviations.  Capable of detecting previously unknown threats.  Uses host or network-specific profiles. 7 /29

8 Detection by Stateful Protocol Analysis  The process of comparing predetermined profiles of generally accepted definitions of benign protocol activity for each protocol state against observed events to identify deviations.  Relies on vendor-developed universal profiles that specify how particular protocols should and should not be used. 8 /29

9 Using Clustering for Intrusion Detection  Methods other than Signature-Based Detection use data mining and machine learning algorithms to train on labeled network data.  For training data, there are two major paradigms: Misuse Detection Anomaly Detection. 9 /29 Which one to use ???

10 Using Clustering for Intrusion Detection - Misuse Detection -  In misuse detection, machine learning algorithms are used with labeled data.  By using the extracted features from labeled network traffic, network data is classified.  By using new data which includes new type of attacks, detection models are retrained. 10 /29

11  In anomaly detection, models are built by training on normal data, deviations are searched over the normal model.  Generating purely normal data is very difficult and costly in practice.  It is very hard to guarantee that there are no attacks during the time the traffic is collected from the network. 11 /29 Using Clustering for Intrusion Detection - Anomaly Detection -

12 12/29 Using Clustering for Intrusion Detection  Use a mechanism to detect intrusions by using unlabeled data as a train model.  Find intrusions buried within that data. Misuse Detection Anomaly Detection.

13 A Set of Unlabeled Data Unsupervised Anomaly Detection Algorithm Connection Comparison with Detected Clusters Detected Intrusion Clusters Assumptions for unsupervised anomaly detection algorithm: 1.The intrusions are rare with respect to normal network traffic. 2.The intrusions are different from normal network traffic. As a Result: The intrusions will appear as outliers in the data. Using Clustering for Intrusion Detection Detected malicious attacks 13 /29

14  The unsupervised anomaly detection algorithm clusters the unlabeled data instances together into clusters using a simple distance-based metric. 14 /29 Using Clustering for Intrusion Detection

15 Once data is clustered, all of the instances that appear in small clusters are labeled as anomalies because;  The normal instances should form large clusters compared to the intrusions,  Malicious intrusions and normal instances are qualitatively different, so they do not fall into the same cluster. 15 /29 Normal cluster Intrusion cluster

16 Methodology 1.Description of the dataset 2.Metric & Normalization 3.Clustering Algorithm a)Portnoy et. al. b)Y-means Algorithm 4.Labeling Clusters 5.Intrusion Detection 16 /29

17 Description of the dataset KDD Cup 1999 Data Main attack categories – DOS: Denial of Service, (e.g. synood) – R2L: Unauthorized access from a remote machine (e.g. guessing password) – U2R: Unauthorized access to local superuser (root) privileges (e.g. various buffer overflow attacks) – Probing: Surveillance and other probing (e.g. port scanning) In total, 24 attack types in training data; 14 additional ones in test data... 17/29

18 Metric & Normalization Euclidean Metric (for distance computation) Feature Normalization (to eliminate the difference in the scale of features) 18/29

19 Clustering Algorithm (Portnoy et. al.)... XiXi Training set Empty set of clusters d1 d2 d3 - d1 is selected. - if d1 < W ( predefined threshold value ), then X i is assigned to that cluster. - else, a new cluster is created, then X i is assigned to it. 19/29

20 Advantage: No need to know the initial no. of clusters. Disadvantage: Need to know W, which may label instances wrong in some cases. However… Clustering Algorithm (Portnoy et. al.) 20/29

21 Clustering Algorithm (Y-means Algorithm) 3 main parts: 1.assigning instances to k clusters 2.splitting clusters 3.merging clusters 21/29

22 1. assigning instances to k clusters Dataset k: no. of clusters n: no. of instances 1 < k < n... redefine cluster centroid Clustering Algorithm (Y-means Algorithm) 22/29

23 2. splitting clusters. Confident area t X i ( instance ). didi t ( normal threshold) = 2.32 σ σ = standard deviation if d i > t, X i is an outlier. New clusters are created firstly with the farthest outliers. Clustering Algorithm (Y-means Algorithm) 23/29

24 3. merging clusters. XiXi If X i is in the confident area of two clusters, merge these clusters back. Clustering Algorithm (Y-means Algorithm) 24/29

25 Labeling Clusters Our first assumption: # of normal instances >> # of intrusions Label instances in large clusters: normal Label instances in small clusters: intrusion Start labeling as normal, until 99% of data is labeled as normal, label rest of them as intrusion. Normal cluster Intrusion cluster 25/29

26 Intrusion Detection For test instance x,  Measure the distance to each cluster.  Select the nearest cluster C.  If C is normal cluster, label x as normal,  Otherwise label x as intrusion. 26/29

27 Overall Summary IDS & IDS Technologies Using Clustering for Intrusion Detection Methodology 1.Description of the dataset 2.Metric & Normalization 3.Clustering Algorithm 4.Labeling Clusters 5.Intrusion Detection Conclusion Unsupervised Clustering is choosen. KDD Cup 1999 Data Y-means Algorithm is used for creating ID System. 27/29

28 References [1] KDD Cup 1999 data. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. [2] Y. Guan and A. A. Ghorbani. Y-means: A clustering method for intrusion detection. In Proceedings of Canadian Conference on Electrical and Computer Engineering, pages 1083{1086, 2003. [3] L. Portnoy, E. Eskin, and S. Stolfo. Intrusion detection with unlabeled data using clustering. In Proceedings of ACM CSS Workshop on Data Mining Applied to Security (DMSA-2001), 2001. [4] K. Scarfone and P. Mell. Guide to intrusion detection and prevention systems (idps), 2007. 28/29

29 Questions? 29/29


Download ppt "Unsupervised Intrusion Detection Using Clustering Approach Muhammet Kabukçu Sefa Kılıç Ferhat Kutlu Teoman Toraman 1/29."

Similar presentations


Ads by Google