Presentation is loading. Please wait.

Presentation is loading. Please wait.

Active Learning for Network Intrusion Detection ACM CCS 2009 Nico Görnitz, Technische Universität Berlin Marius Kloft, Technische Universität Berlin Konrad.

Similar presentations


Presentation on theme: "Active Learning for Network Intrusion Detection ACM CCS 2009 Nico Görnitz, Technische Universität Berlin Marius Kloft, Technische Universität Berlin Konrad."— Presentation transcript:

1 Active Learning for Network Intrusion Detection ACM CCS 2009 Nico Görnitz, Technische Universität Berlin Marius Kloft, Technische Universität Berlin Konrad Rieck, Technische Universität Berlin Ulf Brefeld, Technische Universität Berlin

2 Agenda 2  Introduction  Methodology  Empirical evaluation  Conclusion

3 Introduction 3  Conventional defenses against network threats rest on the concept of misuse detetion That is, attacks are identified in network traffic using known patterns of misuse  Prominent technique for anomaly detection is learning a hypersphere enclosing normal network data, and map them to a vector space

4 Introduction (cont.) 4  Conventional defenses against network threats rest on the concept of misuse detetion That is, attacks are identified in network traffic using known patterns of misuse  Prominent technique for anomaly detection is learning a hypersphere enclosing normal network data, and map them to a vector space  Anomaly detection as an active learning task

5 Methodology 5  From network payload to feature spaces A network payload x ∈ X (the data contained in a packet) is mapped to a vector space using a set of strings S and an embedding function ф. For each string s ∈ S, the function ф s (x) returns 1 if s is contained in x and 0 otherwise By applying ф s (x), for all elements of S, we obtain the following map Eq. 1: Mapping from network payload to feature spaces

6 Methodology (cont.) 6  Assume payloads are mapped to some vector spaces as described just now, the hypersphere can be calculated by SVDD (Support Vector Domain Description) classifier  Given function in Eq. 2, the boundary of the hypersphere is described by the set of x such that f(x) = 0 Eq. 2: Function of hypersphere

7 Methodology (cont.) 7 Fig. 1: An exemplary solution of the SVDD Eq. 2: Function of hypersphere

8 Methodology (cont.) 8  The center c and radius R of the hypersphere can be calculated by Eq. 3, where η is a trade- off parameter adjusting point-wise violation of the hypersphere. Discarded data points induce slack that is absorbed by variables ε i Eq. 3: Find concise center c and radius R by discarding some points (using SVDD)

9 Methodology (cont.) 9  Here, we devise an “active learning” strategy to query low-confidence events, hence guiding the security expert in the labeling process  Our strategy takes unlabeled and labeled data into account. We denote Unlabeled examples by x 1,…,x n Labeled ones by x n+1,…,x n+m, where n >> m Every labeled example x i is annotated with a label y i ∈ {+1,−1}, depending on whether it’s classified as benign (y i = +1) or malicious (y i = −1) data x’ is the point we will ask user to label

10 Methodology (cont.) 10  We first using a common learning strategy, called margin strategy, which simply queries borderline points using Eq. 4 Eq. 4: Query point at margin

11 Methodology (cont.) 11  But novel attacks won’t be found around margin. Therefore, we translate this into an active learning strategy as follows  Let A = (a st ) s,t=1,…,n+m be an adjacent matrix obtained by k-nearest neighbor, where a ij = 1 if x i is among k-nearest neighbors of x j and 0 otherwise. Eq. 5 implements the above idea Eq. 5: Query point which is not around margin but suspicious

12 Methodology (cont.) 12  Our final active learning strategy is Eq. 6 Eq. 5: Query point which is not around margin but suspicious Eq. 4: Query point at margin Eq. 6: Final active learning strategy to query point

13 Methodology (cont.) 13  Now, we can query low-confidence points instead of all points for labeling  Unfortunately, SVDD cannot make use of labeled data. Here, we extend SVDD to support active learning and propose the integrated method ActiveSVDD

14 Methodology (cont.) 14  The optimization problem has additional constraints for the labeled examples that have to fulfill the margin criterion with margin γ  κ, η u, η l are trade-off parameters balancing margin-maximization and the impact of unlabeled and labeled examples  ε j is slack variable allowing for point-wise relaxations of margin violations by labeled examples Eq. 3: Find concise center c and radius R by discarding some points (using SVDD) Eq. 7: Find concise center c and radius R by discarding some points (using ActiveSVDD)

15 Methodology (cont.) 15  Since Eq. 7 cause the optimization problem non-convex and optimization in dual is prohibitive, we translate it to Eq. 8 Eq. 7: Find concise center c and radius R by discarding some points (using ActiveSVDD) Eq. 8: Find concise center c and radius R by discarding some points with Huber loss (using ActiveSVDD)

16 Methodology (cont.) 16 Fig. 2: Compare between SVDD and ActiveSVDD with unlabeled (green) and labeled data of the normal class (red) and attacks (blue) Eq. 3: Find concise center c and radius R by discarding some points (using SVDD) Eq. 7: Find concise center c and radius R by discarding some points (using ActiveSVDD)

17 Empirical evaluation 17  Data set HTTP traffic recorded at Fraunhofer Institute FIRST for 10 days, 145069 unmodified connections with average length of 489 bytes Regard FIRST data as normal pool Malicious pool contains 27 real attack classes generated using Metasploit framework covering 15 BOF, 8 injection and 4 other attacks including HTTP tunnels and XSS. Every attack is recorded in 2~6 different variants Malicious pool is obfuscated by adding common HTTP headers while malicious body remain unaltered, the results are saved as cloaked pool  Each connection is mapped to a vector space using 3-grams

18 Empirical evaluation (cont.) 18  Experiment 1 Comparison for three cases ▪ SVDD v.s. ActiveSVDD with random sampling under uncloaked malicious data ▪ SVDD v.s. ActiveSVDD with random sampling under cloaked malicious data ▪ SVDD v.s. ActiveSVDD with active learning under cloaked malicious data For training set, 966 examples from normal pool and 34 attacks from malicious or cloaked pool For holdout and test set, 795 normal connections and 27 attacks We make sure the same attack class occur either in training set or test set but not in both

19 Empirical evaluation (cont.) 19 Fig. 3: SVDD v.s. ActiveSVDD with random sampling under uncloaked malicious data

20 Empirical evaluation (cont.) 20 Fig. 3: SVDD v.s. ActiveSVDD with random sampling under uncloaked malicious data Fig. 4: SVDD v.s. ActiveSVDD with random sampling under cloaked malicious data

21 Empirical evaluation (cont.) 21 Fig. 4: SVDD v.s. ActiveSVDD with random sampling under cloaked malicious data Fig. 5: SVDD v.s. ActiveSVDD with active learning under cloaked malicious data

22 Empirical evaluation (cont.) 22 Fig. 6: Number of attacks found by different active learning strategies

23 Empirical evaluation (cont.) 23  Experiment 2 Investigate ActiveSVDD in an online learning scenario, i.e., when the normal data pool steadily increases 3750 events from normal pool, where 1250 as test set and the others are decomposed into five chunks of equal size for training Cloaked attacks are mixed into all samples, and the same attack class occur either in training set or test set but not in both For each chunk we adjust active learning strategy such that only 10 data points are needed to label

24 Empirical evaluation (cont.) 24 Fig. 8: ROC curve for all chunks using ActiveSVDD in online application Fig. 7: ActiveSVDD progress over chunks in online application

25 Empirical evaluation (cont.) 25  Experiment 3 Threshold adaption of SVDD with three methods ▪ Original SVDD: don’t adapt threshold ▪ Adapt with the average of random labeled instances ▪ Adapt with the average of active learning labeled instances 3750 connections from normal pool and split into training set of 2500 connections and test set of 1250 connections Cloaked attacks are mixed into all samples

26 Empirical evaluation (cont.) 26 Fig. 9: Comparison of threshold adaption methods

27 Conclusion 27  To reduce the labeling effort, we devise an active learning strategy to query instances that are not only close to boundary but also likely novel attacks  To use labeled and unlabeled instances in training process, we propose ActiveSVDD as a generalization of SVDD  Rephrasing the unsupervised problems setting as an active learning task is worth the effort

28 The End 28


Download ppt "Active Learning for Network Intrusion Detection ACM CCS 2009 Nico Görnitz, Technische Universität Berlin Marius Kloft, Technische Universität Berlin Konrad."

Similar presentations


Ads by Google