Application of Association Rules in Intrusion Detection Xiangyang Li Dept. Industrial Engineering ASU.

Slides:



Advertisements
Similar presentations
Mining Association Rules from Microarray Gene Expression Data.
Advertisements

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
BPT2423 – STATISTICAL PROCESS CONTROL
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Programming Types of Testing.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Data Mining Association Analysis: Basic Concepts and Algorithms
Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
 Firewalls and Application Level Gateways (ALGs)  Usually configured to protect from at least two types of attack ▪ Control sites which local users.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Temporal Pattern Matching of Moving Objects for Location-Based Service GDM Ronald Treur14 October 2003.
Learning Classifier Systems to Intrusion Detection Monu Bambroo 12/01/03.
Intrusion detection Anomaly detection models: compare a user’s normal behavior statistically to parameters of the current session, in order to find significant.
Fast Algorithms for Association Rule Mining
Recommender systems Ram Akella November 26 th 2008.
FIREWALLS & NETWORK SECURITY with Intrusion Detection and VPNs, 2 nd ed. 6 Packet Filtering By Whitman, Mattord, & Austin© 2008 Course Technology.
Radial Basis Function Networks
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Intrusion and Anomaly Detection in Network Traffic Streams: Checking and Machine Learning Approaches ONR MURI area: High Confidence Real-Time Misuse and.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Data Mining for Intrusion Detection: A Critical Review Klaus Julisch From: Applications of data Mining in Computer Security (Eds. D. Barabara and S. Jajodia)
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 17: Code Mining.
A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,
Where Are the Nuggets in System Audit Data? Wenke Lee College of Computing Georgia Institute of Technology.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Masquerade Detection Mark Stamp 1Masquerade Detection.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Step-by-step techniques in SPSS Whitney I. Mattson 09/15/2010.
Network Intrusion Detection Using Random Forests Jiong Zhang Mohammad Zulkernine School of Computing Queen's University Kingston, Ontario, Canada.
IIT Indore © Neminah Hubballi
Data Mining Approaches for Intrusion Detection Wenke Lee and Salvatore J. Stolfo Computer Science Department Columbia University.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
1 ENTROPY-BASED CONCEPT SHIFT DETECTION PETER VORBURGER, ABRAHAM BERNSTEIN IEEE ICDM 2006 Speaker: Li HueiJyun Advisor: Koh JiaLing Date:2007/11/6 1.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 8-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,
Data Mining Algorithms for Large-Scale Distributed Systems Presenter: Ran Wolff Joint work with Assaf Schuster 2003.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,
CSC 211 Data Structures Lecture 13
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Association Rule Mining
DATA MINING By Cecilia Parng CS 157B.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.
Temporal Database Paper Reading R 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang.
Network Community Behavior to Infer Human Activities.
Intrusion Detection Systems Paper written detailing importance of audit data in detecting misuse + user behavior 1984-SRI int’l develop method of.
Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques.
Intrusion Detection Wenke Lee Computer Science Department Columbia University.
18 February 2003Mathias Creutz 1 T Seminar: Discovery of frequent episodes in event sequences Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo.
CS526: Information Security Chris Clifton November 25, 2003 Intrusion Detection.
Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
SENG521 (Fall SENG 521 Software Reliability & Testing Preparing for Test (Part 6a) Department of Electrical & Computer Engineering,
Frequent Pattern Mining
Association Rule Mining
Farzaneh Mirzazadeh Fall 2007
Presentation transcript:

Application of Association Rules in Intrusion Detection Xiangyang Li Dept. Industrial Engineering ASU

4/10/01Application of Association Rules in Intrusion Detection 2 Association rules (1) Objective learning rules, associations; Transaction data; e.g. Record Items 1soda, milk 2detergent, soda, cleanser 3cleanser, soda… Correlation among items; 2-way, 3-way, … k-way rules, e.g., i, j  k; A famous association rule:diapers -> beers

4/10/01Application of Association Rules in Intrusion Detection 3 Data Shell command timehostname command arg1 arg2 ampascalmkdirdir1 ampascalcddir1 ampascalvitex Network connection records time duration service src_bytes dst_bytes flag telnet SF ftp SF smtp SF telnet SF

4/10/01Application of Association Rules in Intrusion Detection 4 Association rules (2) Why? Program executions and user activities exhibit frequent correlations among system features. Definition Let A be a set of attributes, and I be a set of values on A, called items. Any subset of I is called an itemset. The number of items in an itemset is called its length. Let D be a database. An association rule is the expression X  Y, confidence, support. Here X and Y are itemsets, and X  Y= . support is the percentage of transactions (records) in D that contain X  Y,and confidence is the percentage of transactions that contain X and also contain Y, i.e., c=support(X  Y)/support(X).

4/10/01Application of Association Rules in Intrusion Detection 5 Association rules (3) For example, an association rule from the shell command history file (which is a stream of commands and their arguments) of a user is trn  rec.humor, 0.3, 0.1, which indicates that 30% of the time when the user invokes trn, he or she is reading the news in rec.humor, and reading this newsgroup accounts for 10% of the activities recorded in his or her command history file.

4/10/01Application of Association Rules in Intrusion Detection 6 Association rules (4) Control on the number of produced rules For example, if there are 100 different items, the number of possible 2- way rules: 100*99  10K  Minimum_support and minimum_confidence requirements are used to find rules on frequent itemsets. Any subset of a frequent itemset must be also a frequent itemset. The algorithm starts with finding the frequent itemsets of length 1, then iteratively computes frequent itemsets of length k+1 from those of length k. Rule generation If confidence=support(X)/support(subset of X)>=minimum_confidence Then output rule subset of X  X-subset of X with support=support(X).

4/10/01Application of Association Rules in Intrusion Detection 7 Frequent episodes (1) Why? There is need to study the frequent sequential patterns of events. The problem of finding frequent episodes is based on minimal occurrences. Briefly, given an event database where each transaction is associated with a timestamp, an interval [t 1, t 2 ] is the sequence of transactions that starts from timestamp t 1 and ends at t 2. The width of the interval is defined as t 2 - t 1. Given an itemset A in D, an interval is a minimal occurrence of A if it contains A and none of its proper sub- intervals contains A. A frequent episode rule is the expression X,Y  Z,confidence,support,window. Here X, Y and Z are itemsets. support is the percentage of minimal occurrences of X  Y  Z (that is, the ratio between the number of occurrences and the number of records in D). confidence is the percentage of minimal occurrences that contain X  Y and also contain Z.

4/10/01Application of Association Rules in Intrusion Detection 8 Frequent episodes (2) Here the width of each of the occurrences must be less than window. A serial episode rule has the additional constraint that X, Y and Z must occur in transactions in partial time order, i.e., Z follows Y and Y follows X. The implementation of frequent episodes algorithm utilizes the data structures and library functions of the association rules algorithm. Here instead of finding correlations across attributes, we look for correlations across records. A temporal join function that considers minimal occurrence is used to create the interval vector of a candidate length k itemset from the two interval vectors of two length k-1 frequent itemsets.

4/10/01Application of Association Rules in Intrusion Detection 9 System knowledge (1) The basic algorithms do not consider any domain knowledge and as a result they can generate many ``irrelevant'' rules. An association rule: The basic association rules algorithm may generate rules such as src_bytes=200  flag=SF. Axis variables Intuitively the axis attribute(s) is the essential attribute(s) of a record (transaction). We consider the correlations among non-axis attributes as not interesting. During candidate generation, an itemset must contain value(s) of the axis attribute(s). Since the most important information of a connection is its service, we use it as the axis attribute. The resulting association rules then describe only the patterns related to the services of the connections

4/10/01Application of Association Rules in Intrusion Detection 10 System knowledge (1) - continued It is even more important to use the axis attribute(s) to constrain the item generation for frequent episodes. The basic algorithm can generate serial episode rules that contain ``non-essential'' attribute values. For example src_bytes=200, src_bytes=200  dst_bytes=300, src_bytes=200. Compared with the association rules, the total number of serial rules is large and so is the number of such useless rules. First find the frequent associations using the axis attribute(s) and then generate the frequent serial patterns from these associations. An example of a rule is (service=smtp, src_bytes=200,dst_bytes=300,flag=SF), (service=telnet, flag=SF) (service=http,src_bytes=200 ). Here we in effect have combined the associations (among attributes) and the sequential patterns (among the records) into a single rule. This rule formalism provides rich and useful information.

4/10/01Application of Association Rules in Intrusion Detection 11 System knowledge (2) Reference variables Some essential attributes can be the references of other attributes. They are like some “subject”, and other attributes describe the “actions” that refer to the same “subject”. When reference attribute is used, the frequent episodes algorithm ensure that, within each episode’s minimal occurrences, the records covered by its constituent item sets have the same reference attribute value. Example, a “syn flood”attack, where the attacker sends a lot of “half- opened” connections (i.e., flag is “S0”) to a port (e.g., “http”) of the same victim dst_host (reference attribute). (service=http, flag=S0), (service=http,flag=S0)  (service=http,flag=S0),0.93,0.03,2

4/10/01Application of Association Rules in Intrusion Detection 12 System knowledge (3) Low frequency patterns Sometimes it is important to discover the low frequency patterns. In daily network traffic, some services, for example, gopher, account for very low percentages. Yet their patterns still need to be included into the network traffic profile (so that there are representative patterns for each supported service). But If a very low support value is used, then unnecessarily a very large number of patterns related to the high frequency services, for example, smtp, are produced.

4/10/01Application of Association Rules in Intrusion Detection 13 System knowledge (3) - continued Level-wise approximate mining procedure The idea is to first find the episodes related to high frequency axis attribute values. Then iteratively lower the support threshold to find the episodes related to the low frequency axis values by restricting the participation of the ``old'' axis values that already have output episodes. More specifically, when an episode is generated, it must contain at least one ``new'' (low frequency) axis value. The procedure terminates when a very low support value is reached. In practice, this can be the lowest frequency of all axis values. 1) (service=smtp, src_bytes=200), (service=smtp,src_bytes=200 )  (service=smtp, dst_bytes=300) 2) (service=smtp, src_bytes=200), (service=http, src_bytes=200) (service=smtp, src_bytes=300).

4/10/01Application of Association Rules in Intrusion Detection 14 System knowledge (3) - continued Note that for a high frequency axis value, this method in effect omits its very low frequency episodes (generated in the runs with low support values) because they are not as interesting (representative). Hence this procedure is ``approximate'' mining. All the old (high frequency) axis values are still included to form episodes with the new axis values because it is important to capture the sequential context of the new axis values. For example, although used infrequently, auth normally co-occurs with other services such as smtp and login. It is therefore imperative to include these high frequency services into the episode rules about auth.

4/10/01Application of Association Rules in Intrusion Detection 15 Application (1) Anomaly detection The patterns discovered from the audit data on a protected target (e.g., a network, system program, or user, etc.) corresponds to the target's behavior. When gathering audit data about the target, the patterns from each new audit data set are computed, and the new rules are merged into the existing aggregate rule set. The added new rules represent (new) variations of the normal behavior. When the aggregate rule set stabilizes, i.e., no new rules from the new audit data can be added, the data gathering can stop since the aggregate audit data set has covered sufficient variations of the normal behavior.

4/10/01Application of Association Rules in Intrusion Detection 16 Application (1) - continued Merge process- new rule set to aggregate rule set 1) Their right and left sides are exactly the same, or can be combined; 2) The support and confidence values are close, i.e., within a user- defined threshold. Match_count can be used to control the final rule output. This approach of merging rules is based on the fact that even the same type of behavior will have slight differences across audit data sets. Therefore we should not expect perfect (exact) match of the mined patterns. Instead similar patterns need to be combined into more generalized ones. The discovered patterns from (the extensively gathered) audit data can be used directly for anomaly detection.

4/10/01Application of Association Rules in Intrusion Detection 17 Application (2) Signature recognition “Intrusion only” patterns Given the normal patterns and patterns from an intrusion dataset, with the choices of axis attributes(s), reference attribute(s), support, confidence, and window requirements, intrusion only patterns can be identified. 1) For each pattern from the intrusion dataset, calculate a difference score with each normal pattern, and keep the lowest score as the “intrusion” score for this pattern 2) Output all patterns that have non-zero “intrusion”scores, or a user- specified percentage of patterns with the highest “intrusion” scores.

4/10/01Application of Association Rules in Intrusion Detection 18 Application (3) Feature selection An important use of the mined patterns is as the basis for feature selection. When the axis attribute is used as the class label attribute, features (the attributes) in the association rules should be included in the classification models. The time windowing information and the features in the frequent episodes suggest that their statistical measures, e.g., the average, the count, etc., should also be considered as additional features. An example: a large number of “rejected” network connections in a very short time span is a strong evidence of some intrusions.

4/10/01Application of Association Rules in Intrusion Detection 19 Application (3) - continued Each of the intrusion only patterns is used for constructing additional features. 1) When the same value of an attribute is repeated several times in a frequent episode rule, it suggests a corresponding count feature. 2) When an attribute (with different values) is repeated several times in the rule, add a corresponding average feature. 3) When the same value appears in all the itemsets of an episode, there is a large percentage of records that have the same value. These statistical and temporal features are used in constructing classifiers using rule generation algorithm such as Ripper and are claimed to improve the classification greatly.

4/10/01Application of Association Rules in Intrusion Detection 20 Window value Window size The experience shows that when plot the number of patterns generated using different window values w, it tends to stabilize after the initial jump. The smallest value in the stable region is called w 0. The experiment showed that the plot of the accuracy of the classifiers that use the temporal and statistical features calculated with different w, also stabilize after w>=w 0. Intuitively, a requirement for a good window size is that its set of sequential patterns is stable, that is, sufficient patterns are captured and noise is small.

4/10/01Application of Association Rules in Intrusion Detection 21 Encoding of association rules (1) “Completed and ordered” associations The records have n attributes, an association (A1=v1, A2= v2,... Ak=vk) is “complete and ordered” if k=n and attributes A1,A2,…Ak are in user-defined decreasing “order of importance”. Then it can be recorded as a number e v1 e v2 …e vn. e vi is: 0 if vi is null, or the order of appearance of vi among all the values of Ai processed thus far in the encoding process. An episode rule, X,Y  Z is mapped into a 3-d data point (encodingX, encoding Y, encodingZ). For pattern comparison, this 3-d encoding is converted into a 1-d value, as x1z1y1x2z2y2…xnznyn. This presentation preserves the “order of importance” of attributes and considers the rule structure of an episode.

4/10/01Application of Association Rules in Intrusion Detection 22 Encoding of association rules (2) Two episodes that have the similar first “body”(i.e., X) and “head” (i.e., Z)will be mapped to closer numbers. Example: “syn flood”: (service=http, flag=S0), (service=http,flag=S0)  (service=http,flag=S0) -> “normal”: (flag=SF, service=http), (flag=SF, service=icmp_echo)  (flag=SF, service=http) -> thus a difference score is defined as the absolute value difference in corresponding digits.

4/10/01Application of Association Rules in Intrusion Detection 23 Real-time detection (1) Low cost “necessary” conditions Features of 3 “cost” level:level features can be computed from the first packet; level 2 features can be computed at the end of the connection, using only information of current connection; level 3 can be computed at the end of the connection, but require access to data of other prior connections. Ideally there are a few tests involving the low cost features to eliminate the majority of the rules that need to be checked, thus eliminating the needs to compute some high cost features. An example: port_scan  src_bytes = 0 src_bytes = 0 is a necessary association for port scan intrusion. If this condition is failed the features of the rules for this intrusion need not be computed, unless they are needed for other rules.

4/10/01Application of Association Rules in Intrusion Detection 24 Real-time detection (2) Rule filtering 1) n Ripper rules: a n-nit “remaining” vector to indicate which rules still need to be checked; initially all bits are 1’s. 2) Each rule: an “invalidating”n-bit vector, where only the bit corresponding to the rule is 0 and all other bits are 1’s. 3) Each high cost feature: a “computing” n-bit vector, where only the bits corresponding to the rules that require this feature are 1’s. When examining a packet, or a connection, if a “necessary” condition of an intrusion is violated, the corresponding “invalidating” bit vectors of the Ripper rules of the intrusion are used to AND the “remaining” vector and all the “computing” vectors for the high cost features. Then only the features with non-zero “remaining” vectors are useful after all necessary conditions are checked.

4/10/01Application of Association Rules in Intrusion Detection 25 Experiements This data mining method has been applied on different data sets. In its application to 1998 DARPA Intrusion Detection Evaluation Program, three models are constructed based on different feature sets of data. They are: Content model - for suspicious behavior in the data portion based on domain knowledge, not from association rule analysis. Traffic model - time-based “traffic” features of the connection recordsin the past 2 seconds, including “same host” and “same service” features. Host traffic model - a mirror set of features as the “traffic” features using a “connection” window of 100 connections, for “slow” probing attacks. Three base models and the meta-level classifier are built using Ripper, a rule induction algorithm. It claims very good performance.

4/10/01Application of Association Rules in Intrusion Detection 26 Summary Contributions Provide automatic feature construction method based on frequent patterns; A simple and useful pattern encoding and comparison technique to assist model construction; Strategy for minimizing the cost of model execution. Disadvantages Only handle nominal variables in data A lot of domain knowledge used to improve performance

Questions?