Ensemble-based Adaptive Intrusion Detection Wei Fan IBM T.J.Watson Research Salvatore J. Stolfo Columbia University
Data Mining for Intrusion Detection Connection Records Feature Construction Training Data Inductive Learner Intrusion Detection Model Label Existing Connections (telnet, 10,3,...) (ftp,10,20,...)
Some interesting requirements ƒNew types of intrusions are constantly invented by hackers. Most recent coordinated attacks on many ebusiness websites in ƒHackers tend to use new types of intrusions that intrusion detection system is unaware of or weak at detecting them successfully. ƒData mining for intrusion detection is a very data- intensive process. very large data revolving patterns real-time detection
Question ƒWhen new types of intrusions are invented, can we quickly adapt our existing model to be able to detect these new intrusions before they cause more damages? If we don't have a solution, the new attack will make significant damage. For this kind of problem, having a solution that is not completely satisfactory is better than having no solution.
Naive Approach - Complete Re- training Existing Training Data New Data Merged Training Data Inductive Learner NEW Intrusion Detection Model
Problem with the Naive Approach ƒSince data (existing plus new) will be very large, it takes a long time to compute a detection model. ƒBy the time, the model is constructed, the new attack probably will have already made enough damage to our system.
New Approach New Data Learner NEW Model Existing Model Combined Model Key point: we only compute model from the data on new types of intrusions only
How do we label connections? a new connection existing model connection type unrecognized normal or previously known intrusion types NEW Model normal or new intrusion types
Basic Idea ƒExisting model is built to identify THREE classes normal some type of intrusions and anomaly: some connection that is neither normal nor some known types of intrusions. ƒ anomaly detection - we use the artificial anomaly generation method (Fan et al, ICDM 2001)
Anomaly Detection ƒGenerate "artificial anomalies" from training data: similar to "near misses". ƒArtificial anomalies are data points that are different from the training data. ƒThe algorithm concentrates on feature values that are infrequent in the training data. ƒDistribution-based Artificial Anomaly (Fan et al, ICDM2001)
Four Configurations ƒH 1 (x): existing model. ƒH 2 (x): new model. ƒThey differ in how H 2 (x) is computed. ƒand how H 1 (x) and H 2 (x) are combined ƒand how a connection is processed and classified.
Configuration I
Configuration II
Configuration III
Configuration IV
Experiment ƒ1998 DARPA Intrusion Detection Evaluation Dataset ƒ22 different types of intrusions.
Experiment ƒSequence to introduce intrusions into the training data to simulate new intrusions are being invented and launched by hackers 22! unique sequences we randomly used 3 unique sequences. ƒThe results are averaged. ƒRIPPER unordered rulesets
3 Unique Sequences
Measurements ƒAll results on the new intrusion types ƒPrecision: If I catch a potential thief, what is the probability that it is a real thief? ƒRecall: What is the probability that real thieves are detected? ƒAnomaly Detection Rate classified as anomaly ƒOther classified as other types of intrusions.
Precision Results
Recall Results
Anomaly Detection Rate
Other Detection Rate Results
Summary of results ƒThe most accurate is Configuration 1 where new model is trained from normal and the new intrusion type all predicted normal and anomalies by the old model is examined by the new model. ƒReason: Existing model's precision to detect normal connection influences combined model's accuracy. New data is limited in amount. Artificial anomalies generated from new data is limited as well.
Training Efficiency
Related Work (incomplete list) ƒAnomaly Detection: SRI's IDES use probability distribution of past activities to measure abnormality of host events. We measure network events. Forrest et al uses absence of subsequence to measure abnormality. Lane and Brodley employ a similar approach but use incremental learning approach to update stored sequence from UNIX shell commands. Ghosh and Schwarzbard use neural network to learn profile of normality and distance function to detect abnormality. ƒGenerating Artificial Data: Nigam et al assign label to unlabelled data using classifier trained from labeled data. Chang and Lippman applied voice transformation techniques to add artificial training talkers to increase variability. ƒMultiple classifiers: Asker and Macline "Ensembles as a sequence of classifiers"
Summary and Future Work ƒProposed a two-step two classifier approach for efficient training and fast model deployment. ƒEmpirically tested in the intrusion detection domain. ƒNeed to test if it works well for other domains.