Active Learning for Network Intrusion Detection ACM CCS 2009 Nico Görnitz, Technische Universität Berlin Marius Kloft, Technische Universität Berlin Konrad.

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Design of the fast-pick area Based on Bartholdi & Hackman, Chpt. 7.

Aggregating local image descriptors into compact codes

A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.

Fast Algorithms For Hierarchical Range Histogram Constructions

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz.

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)

1 Detection of Injected, Dynamically Generated, and Obfuscated Malicious Code (DOME) Subha Ramanathan & Arun Krishnamurthy Nov 15, 2005.

Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.

Date : 21 st of May, Shri Ramdeo Baba College of Engineering and Management Presentation By : Rimjhim Singh Under the Guidance of: Dr. M.B. Chandak.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.

 Firewalls and Application Level Gateways (ALGs)  Usually configured to protect from at least two types of attack ▪ Control sites which local users.

Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Unsupervised Intrusion Detection Using Clustering Approach Muhammet Kabukçu Sefa Kılıç Ferhat Kutlu Teoman Toraman 1/29.

SVM Active Learning with Application to Image Retrieval

Active Learning with Support Vector Machines

Development of Empirical Models From Process Data

Linear Discriminant Functions Chapter 5 (Duda et al.)

05/06/2005CSIS © M. Gibbons On Evaluating Open Biometric Identification Systems Spring 2005 Michael Gibbons School of Computer Science & Information Systems.

Introduction to Machine Learning Approach Lecture 5.

Modeling and Finding Abnormal Nodes (chapter 2) 駱宏毅 Hung-Yi Lo Social Network Mining Lab Seminar July 18, 2007.

Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.

Intrusion and Anomaly Detection in Network Traffic Streams: Checking and Machine Learning Approaches ONR MURI area: High Confidence Real-Time Misuse and.

Combining Supervised and Unsupervised Learning for Zero-Day Malware Detection © 2013 Narus, Inc. Prakash Comar 1 Lei Liu 1 Sabyasachi (Saby) Saha 2 Pang-Ning.

A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.

Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.

A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,

Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.

Detecting Network Violation Based on Fuzzy Class-Association-Rule Mining Using Genetic Network Programming.

Active Learning for Class Imbalance Problem

Network Intrusion Detection Using Random Forests Jiong Zhang Mohammad Zulkernine School of Computing Queen's University Kingston, Ontario, Canada.

ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.

Resistant Learning on the Envelope Bulk for Identifying Anomalous Patterns Fang Yu Department of Management Information Systems National Chengchi University.

Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.

Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.

Thwarting Passive Privacy Attacks in Collaborative Filtering Rui Chen Min Xie Laks V.S. Lakshmanan HKBU, Hong Kong UBC, Canada UBC, Canada Introduction.

10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.

One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

Line detection Assume there is a binary image, we use F(ά,X)=0 as the parametric equation of a curve with a vector of parameters ά=[α 1, …, α m ] and X=[x.

Adviser: Frank, Yeong-Sung Lin Presenter: Yi-Cin Lin.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Wireless communications and mobile computing conference, p.p , July 2011.

Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.

Author: Tadeusz Sawik Decision Support Systems Volume 55, Issue 1, April 2013, Pages 156–164 Adviser: Frank, Yeong-Sung Lin Presenter: Yi-Cin Lin.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Bradley Cowie Supervised by Barry Irwin Security and Networks Research Group Department of Computer Science Rhodes University DATA CLASSIFICATION FOR CLASSIFIER.

Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.

1 An Arc-Path Model for OSPF Weight Setting Problem Dr.Jeffery Kennington Anusha Madhavan.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:

PEER TO PEER BOTNET DETECTION FOR CYBER- SECURITY (DEFENSIVE OPERATION): A DATA MINING APPROACH Masud, M. M. 1, Gao, J. 2, Khan, L. 1, Han, J. 2, Thuraisingham,

Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.

Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,

DOWeR Detecting Outliers in Web Service Requests Master’s Presentation of Christian Blass.

Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.

Lecture 1.31 Criteria for optimal reception of radio signals.

Intrusion Detection using Deep Neural Networks

Random Testing: Theoretical Results and Practical Implications IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2012 Andrea Arcuri, Member, IEEE, Muhammad.

QianZhu, Liang Chen and Gagan Agrawal

WP4 Measurements & social indicators.

BotCatch: A Behavior and Signature Correlated Bot Detection Approach

View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.

Presentation transcript:

Active Learning for Network Intrusion Detection ACM CCS 2009 Nico Görnitz, Technische Universität Berlin Marius Kloft, Technische Universität Berlin Konrad Rieck, Technische Universität Berlin Ulf Brefeld, Technische Universität Berlin

Agenda 2  Introduction  Methodology  Empirical evaluation  Conclusion

Introduction 3  Conventional defenses against network threats rest on the concept of misuse detetion That is, attacks are identified in network traffic using known patterns of misuse  Prominent technique for anomaly detection is learning a hypersphere enclosing normal network data, and map them to a vector space

Introduction (cont.) 4  Conventional defenses against network threats rest on the concept of misuse detetion That is, attacks are identified in network traffic using known patterns of misuse  Prominent technique for anomaly detection is learning a hypersphere enclosing normal network data, and map them to a vector space  Anomaly detection as an active learning task

Methodology 5  From network payload to feature spaces A network payload x ∈ X (the data contained in a packet) is mapped to a vector space using a set of strings S and an embedding function ф. For each string s ∈ S, the function ф s (x) returns 1 if s is contained in x and 0 otherwise By applying ф s (x), for all elements of S, we obtain the following map Eq. 1: Mapping from network payload to feature spaces

Methodology (cont.) 6  Assume payloads are mapped to some vector spaces as described just now, the hypersphere can be calculated by SVDD (Support Vector Domain Description) classifier  Given function in Eq. 2, the boundary of the hypersphere is described by the set of x such that f(x) = 0 Eq. 2: Function of hypersphere

Methodology (cont.) 7 Fig. 1: An exemplary solution of the SVDD Eq. 2: Function of hypersphere

Methodology (cont.) 8  The center c and radius R of the hypersphere can be calculated by Eq. 3, where η is a tradeoff parameter adjusting point-wise violation of the hypersphere. Discarded data points induce slack that is absorbed by variables ε i Eq. 3: Find concise center c and radius R by discarding some points (using SVDD)

Methodology (cont.) 9  Here, we devise an “active learning” strategy to query low-confidence events, hence guiding the security expert in the labeling process  Our strategy takes unlabeled and labeled data into account. We denote Unlabeled examples by x 1,…,x n Labeled ones by x n+1,…,x n+m, where n >> m Every labeled example x i is annotated with a label y i ∈ {+1,−1}, depending on whether it’s classified as benign (y i = +1) or malicious (y i = −1) data x’ is the point we will ask user to label

Methodology (cont.) 10  We first using a common learning strategy, called margin strategy, which simply queries borderline points using Eq. 4 Eq. 4: Query point at margin

Methodology (cont.) 11  But novel attacks won’t be found around margin. Therefore, we translate this into an active learning strategy as follows  Let A = (a st ) s,t=1,…,n+m be an adjacent matrix obtained by k-nearest neighbor, where a ij = 1 if x i is among k-nearest neighbors of x j and 0 otherwise. Eq. 5 implements the above idea Eq. 5: Query point which is not around margin but suspicious

Methodology (cont.) 12  Our final active learning strategy is Eq. 6 Eq. 5: Query point which is not around margin but suspicious Eq. 4: Query point at margin Eq. 6: Final active learning strategy to query point

Methodology (cont.) 13  Now, we can query low-confidence points instead of all points for labeling  Unfortunately, SVDD cannot make use of labeled data. Here, we extend SVDD to support active learning and propose the integrated method ActiveSVDD

Methodology (cont.) 14  The optimization problem has additional constraints for the labeled examples that have to fulfill the margin criterion with margin γ  κ, η u, η l are trade-off parameters balancing margin-maximization and the impact of unlabeled and labeled examples  ε j is slack variable allowing for point-wise relaxations of margin violations by labeled examples Eq. 3: Find concise center c and radius R by discarding some points (using SVDD) Eq. 7: Find concise center c and radius R by discarding some points (using ActiveSVDD)

Methodology (cont.) 15  Since Eq. 7 cause the optimization problem non-convex and optimization in dual is prohibitive, we translate it to Eq. 8 Eq. 7: Find concise center c and radius R by discarding some points (using ActiveSVDD) Eq. 8: Find concise center c and radius R by discarding some points with Huber loss (using ActiveSVDD)

Methodology (cont.) 16 Fig. 2: Compare between SVDD and ActiveSVDD with unlabeled (green) and labeled data of the normal class (red) and attacks (blue) Eq. 3: Find concise center c and radius R by discarding some points (using SVDD) Eq. 7: Find concise center c and radius R by discarding some points (using ActiveSVDD)

Empirical evaluation 17  Data set HTTP traffic recorded at Fraunhofer Institute FIRST for 10 days, unmodified connections with average length of 489 bytes Regard FIRST data as normal pool Malicious pool contains 27 real attack classes generated using Metasploit framework covering 15 BOF, 8 injection and 4 other attacks including HTTP tunnels and XSS. Every attack is recorded in 2~6 different variants Malicious pool is obfuscated by adding common HTTP headers while malicious body remain unaltered, the results are saved as cloaked pool  Each connection is mapped to a vector space using 3-grams

Empirical evaluation (cont.) 18  Experiment 1 Comparison for three cases ▪ SVDD v.s. ActiveSVDD with random sampling under uncloaked malicious data ▪ SVDD v.s. ActiveSVDD with random sampling under cloaked malicious data ▪ SVDD v.s. ActiveSVDD with active learning under cloaked malicious data For training set, 966 examples from normal pool and 34 attacks from malicious or cloaked pool For holdout and test set, 795 normal connections and 27 attacks We make sure the same attack class occur either in training set or test set but not in both

Empirical evaluation (cont.) 19 Fig. 3: SVDD v.s. ActiveSVDD with random sampling under uncloaked malicious data

Empirical evaluation (cont.) 20 Fig. 3: SVDD v.s. ActiveSVDD with random sampling under uncloaked malicious data Fig. 4: SVDD v.s. ActiveSVDD with random sampling under cloaked malicious data

Empirical evaluation (cont.) 21 Fig. 4: SVDD v.s. ActiveSVDD with random sampling under cloaked malicious data Fig. 5: SVDD v.s. ActiveSVDD with active learning under cloaked malicious data

Empirical evaluation (cont.) 22 Fig. 6: Number of attacks found by different active learning strategies

Empirical evaluation (cont.) 23  Experiment 2 Investigate ActiveSVDD in an online learning scenario, i.e., when the normal data pool steadily increases 3750 events from normal pool, where 1250 as test set and the others are decomposed into five chunks of equal size for training Cloaked attacks are mixed into all samples, and the same attack class occur either in training set or test set but not in both For each chunk we adjust active learning strategy such that only 10 data points are needed to label

Empirical evaluation (cont.) 24 Fig. 8: ROC curve for all chunks using ActiveSVDD in online application Fig. 7: ActiveSVDD progress over chunks in online application

Empirical evaluation (cont.) 25  Experiment 3 Threshold adaption of SVDD with three methods ▪ Original SVDD: don’t adapt threshold ▪ Adapt with the average of random labeled instances ▪ Adapt with the average of active learning labeled instances 3750 connections from normal pool and split into training set of 2500 connections and test set of 1250 connections Cloaked attacks are mixed into all samples

Empirical evaluation (cont.) 26 Fig. 9: Comparison of threshold adaption methods

Conclusion 27  To reduce the labeling effort, we devise an active learning strategy to query instances that are not only close to boundary but also likely novel attacks  To use labeled and unlabeled instances in training process, we propose ActiveSVDD as a generalization of SVDD  Rephrasing the unsupervised problems setting as an active learning task is worth the effort

The End 28