Supporting a Real-time Distributed Intrusion Detection Application on GATES QianZhu, Liang Chen and Gagan Agrawal Department of Computer Science and Engineering The Ohio State University Euro-Par 2006 Conference Aug 30th, 2006 Dresden, Germany
Roadmap Introduction Anomaly Detection Algorithm Overview of GATES Distributed Anomaly Detection Algorithm Experiments Conclusion
Introduction Growing rate of interconnections among computer systems Network Security chanllenge Intrusion prevention techniques user authentication avoiding programming errors information protection Intrusion detection to protect system
Introduction Intrusion Detection Techniques Anomaly Detection Detect intrusions by determining whether a record is deviated from an established normal behavior profile Misuse Detection Detect intrusions by comparing records against patterns of known intrusions
Roadmap Introduction Anomaly Detection Algorithm Overview of GATES Distributed Anomaly Detection Algorithm Experiments Conclusion
Anomaly Detection Algorithm Many anomaly detection algorithms train models over clean data Drawbacks Clean data is NOT always easy to obtain Training over noisy data has serious consequences It is difficult to train the model “online” since clean data must be guaranteed
Anomaly Detection Algorithm An approach from Eskin (ICML 2000) Detecting intrusions without clean data Assumption: the number of normal elements should be significantly larger than the number of intrusion elements
Anomaly Detection Algorithm Explaining anomalies by a mixture model Modeling probability distributions D: The data set Mt: The set of normal data at time t At: The set of anomalous data at time t
Anomaly Detection Algorithm Detecting anomalies IF (LLt-LLt-1)>c ELSE
Anomaly Detection Algorithm Problems Computation intensive Processing data on one single node Real-time constraint Fast detection Need for self-adaptation
Roadmap Introduction Anomaly Detection Algorithm Overview of GATES Distributed Anomaly Detection Algorithm Experiments Conclusion
Overview of GATES GATES (Grid-based AdapTive Execution on Stream) is a middleware which can support distributed data stream processing Internet Globus-OGSA GATES Applications Web service
Overview of GATES An application built on the GATES Automatically distributed to proper computing nodes Automatically self-adaptive to varying environment without implementing certain algorithms or multiple versions Self-adaptation algorithm to achieve the highest level of accuracy while meeting the real-time constraint
Overview of GATES Breaking down the task into several sub-tasks so that the sub-tasks can consist of a pipeline Implementing each sub-task in Java Writing an XML configuration file for the sub-tasks to be automatically deployed. I.E specify how many stages the pipeline has specify where the codes that are processing the sub-tasks reside Launch the application by running a java program (StreamClient.class) provided by the GATES
Roadmap Introduction Anomaly Detection Algorithm Overview of GATES Distributed Anomaly Detection Algorithm Experiments Conclusion
Distributed Anomaly Detection Algorithm Network data come in streams How to maintain an accurate model for the data Incremental maintenance of a data model over a data stream The maintenance has to be quick for fast streams and robust for noisy data
Distributed Anomaly Detection Algorithm
Distributed Anomaly Detection Algorithm Producer Generating data streams Collector Generating local model (GMM) and sent it together with sample data to the next stage. Performing anomaly detection based on global model (GMM) Combiner Combining local models into a global model Sending the global model back to Collector
Distributed Anomaly Detection Algorithm Adjustable parameters The sampling rate on the Collector stage The converge threshold for the EM algorithm Fix one of them while making the other one adjusted by GATES
Roadmap Introduction Anomaly Detection Algorithm Overview of GATES Distributed Anomaly Detection Algorithm Experiments Conclusion
Experiments Data set (KDD cup 99) 335,892 91% 41 22 # of records % of normal data attributes # of intrusion types 335,892 91% 41 22 Note: only 10 attributes (7 continuous and 3 categorical) out of 41 were used for the algorithm
Experiments Adjustable EM threshold vs. Fixed sampling rate Producing rate varies from 100k/sec, 80k/sec 50k/sec, 30k/sec to 10k/sec Sampling rate varies from 40%, 20%, 16%, 13% to 10%
Experiments
Experiments
Experiments Logistic Regression input variables: continuous, categorical or both response variavles: 0/1 value Use three categorical attributes for logistic regression Combine results for final detection
Experiments Detection performance improved by using Logistic Regression
Experiments Adjustable sampling rate vs. Fixed EM threshold Producing rate varies from 100k/sec, 80k/sec 50k/sec, 30k/sec to 10k/sec EM threshold varies from 0.0001, 0.00005 to 0.00001
Experiements
Roadmap Introduction Anomaly Detection Algorithm Overview of GATES Distributed Anomaly Detection Algorithm Experiments Conclusion
Conclusion Convert the Eskin anomaly detection algorithm into a distributed version and deploy the application on GATES GATES can effectively adjust the tradeoff between maintaining the real-time constraint and the highest accuracy (95.36% vs. 97.63%)
Thank you!