WAC/ISSCI Automated Anomaly Detection Using Time-Variant Normal Profiling Jung-Yeop Kim, Utica College Rex E. Gantenbein, University of Wyoming
WAC/ISSCI Automated intrusion detection Intrusion detection determines that a system has been accessed by unauthorized parties Detection can be manual or automated Manual intrusion detection usually requires viewing of logs or user activity: labor-intensive, long reaction time Automated detection relies on continuous monitoring of system behavior within the system itself
WAC/ISSCI Automated intrusion detection Automated detection based on one of two mechanisms Misuse detection: define a set of “unacceptable” behaviors and raise alert when system behavior matches some member(s) of that set Anomaly detection: create a profile of typical (“normal”) user behavior and raise alert when a user attempts an activity that does not match his/her profile
WAC/ISSCI Defining “normal” behavior To determine normal user behavior, we must: Identify individual users Monitor their behavior over time to create a profile of expected activity Define measures for determining deviation from “normal” Quantitative: network traffic < 20% of capacity Qualititative: file transfer remains within internal network
WAC/ISSCI Defining “normal” behavior Using machine intelligence to detect intrusion Observe sequences of user commands and save as a profile Analyze new user commands using statistical similarity measures to compare with observed sequences Classify new behavior as anomalous or consistent with past behavior This approach does not deal with “concept drift” – the varying of command sequences over time
WAC/ISSCI Time-variant profiling Assumes that a user will change “normal” activities over time Profile is dynamically updated as activity changes Should detect anomalies with fewer false alerts Necessary activities Continuous monitoring of activity => profile Partitioning of profile data into meaningful clusters Characterizing deviation among clusters
WAC/ISSCI Time-variant profiling Representing user commands as tokens in an input stream allows the use of string- matching algorithms to characterize patterns over time FLORA (and variations) uses supervised incremental learning to incrementally update knowledge about a pattern Examines moving windows of token strings to determine pattern matches
WAC/ISSCI Time-variant profiling Clustering is accomplished through regression analysis Defines cluster “value” as a function of multiple independent variables Independent variables represent user command sequences from observed behavior
WAC/ISSCI Time-variant profiling Detecting deviation uses probabilistic reasoning Markov modeling Sequence alignment algorithms (bioinformatics) Needleman-Wunsch (global alignment) Smith-Waterman (local similarity)
WAC/ISSCI Current project status Evaluating functionality of string-matching algorithms Developing regression analysis formulae Determining how sequencing algorithms can be matched to a threshold value Future work includes implementing the system and measuring its effect on overall performance