Active Learning for Network Intrusion Detection ACM CCS 2009 Nico Görnitz, Technische Universität Berlin Marius Kloft, Technische Universität Berlin Konrad Rieck, Technische Universität Berlin Ulf Brefeld, Technische Universität Berlin
Agenda 2 Introduction Methodology Empirical evaluation Conclusion
Introduction 3 Conventional defenses against network threats rest on the concept of misuse detetion That is, attacks are identified in network traffic using known patterns of misuse Prominent technique for anomaly detection is learning a hypersphere enclosing normal network data, and map them to a vector space
Introduction (cont.) 4 Conventional defenses against network threats rest on the concept of misuse detetion That is, attacks are identified in network traffic using known patterns of misuse Prominent technique for anomaly detection is learning a hypersphere enclosing normal network data, and map them to a vector space Anomaly detection as an active learning task
Methodology 5 From network payload to feature spaces A network payload x ∈ X (the data contained in a packet) is mapped to a vector space using a set of strings S and an embedding function ф. For each string s ∈ S, the function ф s (x) returns 1 if s is contained in x and 0 otherwise By applying ф s (x), for all elements of S, we obtain the following map Eq. 1: Mapping from network payload to feature spaces
Methodology (cont.) 6 Assume payloads are mapped to some vector spaces as described just now, the hypersphere can be calculated by SVDD (Support Vector Domain Description) classifier Given function in Eq. 2, the boundary of the hypersphere is described by the set of x such that f(x) = 0 Eq. 2: Function of hypersphere
Methodology (cont.) 7 Fig. 1: An exemplary solution of the SVDD Eq. 2: Function of hypersphere
Methodology (cont.) 8 The center c and radius R of the hypersphere can be calculated by Eq. 3, where η is a trade- off parameter adjusting point-wise violation of the hypersphere. Discarded data points induce slack that is absorbed by variables ε i Eq. 3: Find concise center c and radius R by discarding some points (using SVDD)
Methodology (cont.) 9 Here, we devise an “active learning” strategy to query low-confidence events, hence guiding the security expert in the labeling process Our strategy takes unlabeled and labeled data into account. We denote Unlabeled examples by x 1,…,x n Labeled ones by x n+1,…,x n+m, where n >> m Every labeled example x i is annotated with a label y i ∈ {+1,−1}, depending on whether it’s classified as benign (y i = +1) or malicious (y i = −1) data x’ is the point we will ask user to label
Methodology (cont.) 10 We first using a common learning strategy, called margin strategy, which simply queries borderline points using Eq. 4 Eq. 4: Query point at margin
Methodology (cont.) 11 But novel attacks won’t be found around margin. Therefore, we translate this into an active learning strategy as follows Let A = (a st ) s,t=1,…,n+m be an adjacent matrix obtained by k-nearest neighbor, where a ij = 1 if x i is among k-nearest neighbors of x j and 0 otherwise. Eq. 5 implements the above idea Eq. 5: Query point which is not around margin but suspicious
Methodology (cont.) 12 Our final active learning strategy is Eq. 6 Eq. 5: Query point which is not around margin but suspicious Eq. 4: Query point at margin Eq. 6: Final active learning strategy to query point
Methodology (cont.) 13 Now, we can query low-confidence points instead of all points for labeling Unfortunately, SVDD cannot make use of labeled data. Here, we extend SVDD to support active learning and propose the integrated method ActiveSVDD
Methodology (cont.) 14 The optimization problem has additional constraints for the labeled examples that have to fulfill the margin criterion with margin γ κ, η u, η l are trade-off parameters balancing margin-maximization and the impact of unlabeled and labeled examples ε j is slack variable allowing for point-wise relaxations of margin violations by labeled examples Eq. 3: Find concise center c and radius R by discarding some points (using SVDD) Eq. 7: Find concise center c and radius R by discarding some points (using ActiveSVDD)
Methodology (cont.) 15 Since Eq. 7 cause the optimization problem non-convex and optimization in dual is prohibitive, we translate it to Eq. 8 Eq. 7: Find concise center c and radius R by discarding some points (using ActiveSVDD) Eq. 8: Find concise center c and radius R by discarding some points with Huber loss (using ActiveSVDD)
Methodology (cont.) 16 Fig. 2: Compare between SVDD and ActiveSVDD with unlabeled (green) and labeled data of the normal class (red) and attacks (blue) Eq. 3: Find concise center c and radius R by discarding some points (using SVDD) Eq. 7: Find concise center c and radius R by discarding some points (using ActiveSVDD)
Empirical evaluation 17 Data set HTTP traffic recorded at Fraunhofer Institute FIRST for 10 days, unmodified connections with average length of 489 bytes Regard FIRST data as normal pool Malicious pool contains 27 real attack classes generated using Metasploit framework covering 15 BOF, 8 injection and 4 other attacks including HTTP tunnels and XSS. Every attack is recorded in 2~6 different variants Malicious pool is obfuscated by adding common HTTP headers while malicious body remain unaltered, the results are saved as cloaked pool Each connection is mapped to a vector space using 3-grams
Empirical evaluation (cont.) 18 Experiment 1 Comparison for three cases ▪ SVDD v.s. ActiveSVDD with random sampling under uncloaked malicious data ▪ SVDD v.s. ActiveSVDD with random sampling under cloaked malicious data ▪ SVDD v.s. ActiveSVDD with active learning under cloaked malicious data For training set, 966 examples from normal pool and 34 attacks from malicious or cloaked pool For holdout and test set, 795 normal connections and 27 attacks We make sure the same attack class occur either in training set or test set but not in both
Empirical evaluation (cont.) 19 Fig. 3: SVDD v.s. ActiveSVDD with random sampling under uncloaked malicious data
Empirical evaluation (cont.) 20 Fig. 3: SVDD v.s. ActiveSVDD with random sampling under uncloaked malicious data Fig. 4: SVDD v.s. ActiveSVDD with random sampling under cloaked malicious data
Empirical evaluation (cont.) 21 Fig. 4: SVDD v.s. ActiveSVDD with random sampling under cloaked malicious data Fig. 5: SVDD v.s. ActiveSVDD with active learning under cloaked malicious data
Empirical evaluation (cont.) 22 Fig. 6: Number of attacks found by different active learning strategies
Empirical evaluation (cont.) 23 Experiment 2 Investigate ActiveSVDD in an online learning scenario, i.e., when the normal data pool steadily increases 3750 events from normal pool, where 1250 as test set and the others are decomposed into five chunks of equal size for training Cloaked attacks are mixed into all samples, and the same attack class occur either in training set or test set but not in both For each chunk we adjust active learning strategy such that only 10 data points are needed to label
Empirical evaluation (cont.) 24 Fig. 8: ROC curve for all chunks using ActiveSVDD in online application Fig. 7: ActiveSVDD progress over chunks in online application
Empirical evaluation (cont.) 25 Experiment 3 Threshold adaption of SVDD with three methods ▪ Original SVDD: don’t adapt threshold ▪ Adapt with the average of random labeled instances ▪ Adapt with the average of active learning labeled instances 3750 connections from normal pool and split into training set of 2500 connections and test set of 1250 connections Cloaked attacks are mixed into all samples
Empirical evaluation (cont.) 26 Fig. 9: Comparison of threshold adaption methods
Conclusion 27 To reduce the labeling effort, we devise an active learning strategy to query instances that are not only close to boundary but also likely novel attacks To use labeled and unlabeled instances in training process, we propose ActiveSVDD as a generalization of SVDD Rephrasing the unsupervised problems setting as an active learning task is worth the effort
The End 28