Data Mining for Intrusion Detection: A Critical Review Klaus Julisch From: Applications of data Mining in Computer Security (Eds. D. Barabara and S. Jajodia)
Knowledge Discovery from databases (KDD) Five steps – (1) Understanding the application domain – (2) Data integration and selection – (3) Data mining – (4) Pattern evaluation – (5) Knowledge representation
Data Mining Meets Intrusion Detection IDS: Detection and anomaly detection – Misuse detection: Requires a collection of known attacks – Anomaly detection: Requires user or system profile IDS: Host-based and network-based IDS – Host-based: Analyze host-bound audit sources such as audit trails, system logs, or application logs. – Network-based: Analyze packets captured on a network MADAM ID: At Columbia University ---Learn classifiers that distinguish between intrusions and normal activities – (i) Training connection records are partitioned into---normal connection records and intrusion connection records – (ii) Frequent episode rules are mined separately for the two categories of training data---form intrusion-only patterns – (iii) Intrusion-only patterns are used to derive additional attributes---indicative of intrusive behavior – (iv) Initial training records are augmented with the new attributes – (v) A classifier is learnt that distinguishes normal records from intrusion records---the misuse IDS – the classifier ---is the end product of MADAMID
ADAM Network-based anomaly detection system Learns normal network behavior from attack-free training data and represents it as a set of association rules---the profile At runtime, the records of the past δ seconds are continuously mined for new association rules that are not contained in the profile---which are sent to a classifier which separates false positives from true positives Its association rules are of the form: ∏ Ai = vi – Each association rule must have the source host and destination host and destination port among the attributes – Multi-level association rules have been introdfuced to capture coordinated and distributed attacks
Clustering of Unlabeled ID Data Main focus: Training anomaly detection systems over noisy data – Number of normal elements in the training data is assumed to be significantly larger than the number of anomalous elements – Anomalous elements are assumed to be qualitatively different from normal ones – Thus, anomalies appear as outliers standing out from normal data---thus explicit modeling of outliers results in anomaly detection Use of clustering--- all normal data may cluster into similar groups and all intrusive into the others---intrusive ones will be in small clusters since they are rare Real-time data is compared with the clusters to determine a classification Network-based anomaly detection has been built In addition to the intrinsic attributes (e.g., source host, destination host, start time, etc.), connection records also include derived attributes such as the #of failed login attempts, the #of file-creation operations as well as various counts and averages over temporally adjacent connection records Euclidean distance is used to determine similarity between connection records
Mining the Alarm Stream Applying data mining to alarms triggered by IDS – (i) Model the normal alarm stream so a sto henceforth raise the severity of “abnormal alarms” – (ii) Extract predominant alarm patterns---which a human expert can understand and act upon---e,g., write filters or patch a weak IDS signature Manganaris et al: – Models alarms as tuples (t,A)---t timestamp and A is an alarm type – All other attributes of an alarm are ignored – The profile of normal alarm behavior is learned as: Time-ordered alarm stream is partitioned into bursts Association rules are mined from the bursts This results in profile of normal alarms – At run time various tests are carried out to test if an alarm burst is normal
Clifton and Gengo; Julisch: – Mine historival alarm logs to find new knowledge---to reduce the future alarm load---e.g., to write filtering rules to discard false positives Tools: Frequent episode rules Attribute-oriented induction – Repeated replacing attributes by more abstract values » E.g., IP addresses to networks, timestamps to weekdays, and ports to port ranges; the hierarchies are provided by user – Generalization helps previously distinct alarms getting merged into a few classes---huge alarm logs are condensed into short and comprehensible summaries---reduces the alarm load by 80%
Isolated application of data mining techniques can be a dangerous activity- --leading to the discovery of meaningless or misleading patterns Data mining without a proper understanding of the application domain should be avoided Validation step is extremely important