Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS
Outline Introduction Training Data Scarcity Exploiting Global Knowledge Evaluation 2
Properties of Anomaly Detection Pros – Unknown attacks can be identified automatically – Without any a priori knowledge about the application. – Need not manually analyze applications composed of hundreds of components Cons – Tendency to produce a non-negligible amount of false positives – Critically rely upon the quality of enough training data used to construct their models 3
Motivation Web application component invocations are non-uniformly distributed For those components, it is often impossible to gather enough training data to accurately model their normal behavior No proposals exist that satisfactorily address the problem 4
Contributions Provide evidence for that traffic is distributed in a non-uniform fashion Propose an approach to address the problem of undertraining by using global knowledge Evaluate the proposed approach on a large data set of real-world traffic from many web applications 5
Outline Introduction Training Data Scarcity Exploiting Global Knowledge Evaluation 6
Summary of Notation Notations – A: a set of web applications – R: a set of resource paths or components – P: parameters – Q: requests Each request is represented by the tuple 7
Summary of Notation (cont’d) The set of models associated with each unique parameter instance can be represented as a tuple: The knowledgebase of an anomaly detection system trained on web application is denoted by 8
Multi-model Approach A profile for a given parameter is the tuple – describe normal intervals for integers and string lengths – models character strings as a ranked frequency histogram, or Idealized Character Distribution (ICD), – models sets of character strings by inducing a Hidden Markov Model (HMM). – models parameter values as a set of legal tokens 9
The Problem Non-uniform training data In the case of low-traffic applications – the rate of client requests is inadequate to allow models to train in a timely manner. In the case of high-traffic applications – a large subset of resource paths might fail to receive enough requests 10
Non-uniform training data 11
Outline Introduction Training Data Scarcity Exploiting Global Knowledge Evaluation 12
Exploiting Global Knowledge Parameters of the same type tend to induce model compositions that are similar to each other The goal is substituting profiles for similar parameters of the same type The proposed method is composed of three phases – Enhanced training – Building profile knowledge bases – Mapping undertrained profiles to well-trained profiles 13
14
Phase I: Enhanced training Generate undertrained profiles – Let denote a sequence of client requests containing parameter p for a i – Randomly sampled κ-sequences, where κ can take values in Each of the resulting profiles is then added to a knowledge base Each model monitors its stability during the training phase Well trained, or stable, profile is stored in a knowledge base 15
Phase II: Building profile knowledge bases Merge a set of knowledge bases as the undertrained profile database Profile clustering is performed in in order to time- optimize query execution The resulting clusters of profiles in are denoted by An agglomerative hierarchical clustering algorithm using group average linkage was applied 16
Distance Measure More formally, the distance between the profiles c i and c j is defined as: where is the distance function 17
Distance Functions 18
Phase III: Mapping undertrained profiles to well-trained profiles The mapping is implemented as follows – A nearest-neighbor match is performed between and – A nearest-neighbor match is performed between and the members of to discover the undertrained profile at minimum distance from – Well-trained profile is substituted for 19
Mapping Quality 20
Mapping Quality Let be a mapping from an undertrained cluster to the maximum number of elements in that cluster that map to the same cluster in C The robustness metric ρ is then defined as And where is a minimum robustness threshold 21
Outline Introduction Training Data Scarcity Exploiting Global Knowledge Evaluation 22
Experimental Setting HTTP connection observed over a period of approximately three months A portion of the resulting flows were then filtered using Snort to remove known attacks The data set contains 823 distinct web applications, 36,392 unique components, 16,671 unique parameters, and 58,734,624 HTTP requests 23
Profile clustering quality 24
Profile mapping robustness 25
Detection accuracy 100,000 attacks 26
Conclusion Have identified that non-uniform web client access distributions cause model undertraining Propose the use of global knowledge bases of well- trained profiles to remediate a local scarcity of training data 27