A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,

Slides:



Advertisements
Similar presentations
ECG Signal processing (2)
Advertisements

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
An Introduction of Support Vector Machine
Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Classification and Decision Boundaries
Discriminative and generative methods for bags of features
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Support Vector Machines and Kernel Methods
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Unsupervised Intrusion Detection Using Clustering Approach Muhammet Kabukçu Sefa Kılıç Ferhat Kutlu Teoman Toraman 1/29.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Semi-supervised protein classification using cluster kernels Jason Weston, Christina Leslie, Eugene Ie, Dengyong Zhou, Andre Elisseeff and William Stafford.
Support Vector Machines
SVM (Support Vector Machines) Base on statistical learning theory choose the kernel before the learning process.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
05/06/2005CSIS © M. Gibbons On Evaluating Open Biometric Identification Systems Spring 2005 Michael Gibbons School of Computer Science & Information Systems.
CS Instance Based Learning1 Instance Based Learning.
Radial Basis Function Networks
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Masquerade Detection Mark Stamp 1Masquerade Detection.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
This week: overview on pattern recognition (related to machine learning)
Efficient Model Selection for Support Vector Machines
SVM by Sequential Minimal Optimization (SMO)
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Learning a Kernel Matrix for Nonlinear Dimensionality Reduction By K. Weinberger, F. Sha, and L. Saul Presented by Michael Barnathan.
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
An Introduction to Support Vector Machines (M. Law)
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
An Introduction to Support Vector Machine (SVM)
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
Active Learning for Network Intrusion Detection ACM CCS 2009 Nico Görnitz, Technische Universität Berlin Marius Kloft, Technische Universität Berlin Konrad.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Unsupervised Classification
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.
Support vector machines
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
An Enhanced Support Vector Machine Model for Intrusion Detection
An Introduction to Support Vector Machines
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Support vector machines
SVMs for Document Ranking
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau, Leonid Portnoy, Sal Stolfo Presenter: Marbin Pazos-Revilla Cognitive Radio Group TTU

Motivation Machine Learning Algorithms – Cluster – K-Means – SVM Datasets – KDD Cup Intrusion Detection Among best ROC curves and overall IDS performance 2

Contributions The authors proposed three improved methods for clustering, K-NN and SVM to be used in Unsupervised Intrusion Detection The methods show to have very good performance (ROC curves) 3

Introduction Commercially available methods for intrusion detection employ signature based detection The signature database has to be manually revised for newly discovered signatures and until a new update is applied systems are left vulnerable to new attacks 4

IDS Types Misuse – Each instance in a set of data is labeled as normal or intrusion, and a machine learning algorithm is trained over the labeled data – Classification rules – Manuel updates are needed Anomaly – A given normal set data is given – A new set of data is tested and system is supposed to detect whether it is normal or not – It can detect new types of attacks 5

Supervised Anomaly Detection Supervised Anomaly Detection require a set of purely normal data from which they train their model. If intrusions are present in “normal” data, then these intrusions won’t be detected. It is hard in practice to have labeled or purely normal data In the event of having labeled data by simulating intrusions, we would be limited by the set of known attacks in the simulation 6

Unsupervised Anomaly Detection Goal is to differentiate normal elements from anomalous elements buried in the data Do not require a purely normal training set No need for labeled data Raw data is much easier to obtain 7

Geometric Framework Maps Data to a d-dimentional Feature Space – Better capture intrusion in this feature space – Represent and map different types of data Data-dependent normalization feature map Spectrum Kernel feature map Points can be classified as outliers (anomalies) based on their position in this space In general anomalies tend to be distant from other points (parallel with sparse) 8

Datasets and Algorithms Datasets – KDD CUP 99 data (IDS dataset) – Lincoln Labs DARPA intrusion detection evaluation Algorithms – Clustering – KNN – SVM 9

Unsupervised Anomaly Detection Intrusions are buried in the data Can help in forensic analysis Assumptions – Most (significant) of the elements are normal – Anomalies are qualitatively different than normal instances With the previous assumptions anomalies will appear to be rare and different from normal elements and show as outliers 10

Geometric Framework for Unsupervised Anomaly Detection Mapping records from audit stream to a feature space The distance between two elements in the feature space then becomes or 11

12 In many cases is difficult to map data instances to a feature space and calculate distances High Dimentionality of the feature space (memory considerations) Explicit map might be difficult to determine We can define a kernel function to compute these dot products in the feature space (Hilbert) Then we could get distances by using Kernel functions

13 Radial Basis Kernel Function Defined over input spaces which are vector spaces Using Convolution kernels we can then use arbitrary input spaces. The author suggests the use of convolution kernels to avoid converting audit data into a vector in

Detecting Outliers Detecting points that are distant from other points or in relatively sparse regions of the feature space 14

Cluster-based Estimation Count the number of points within a sphere of radius w around the point Sort clusters based on size The points in the small clusters are labeled anomalous 15

Cluster-based Estimation Any points x1,x2 are considered near if their distance is less than or equal to Define N(x) to be the number of points that are within w of point x Since we have to compute the pairwise distance among points the computation of N(x) for all points has complexity We are interested in the outliers 16

To reduce computation, an approximation can be done via fixed width clustering – The first point is the center of the first cluster – For every subsequent point, if it is within w of a cluster center, it is added to that cluster Otherwise it becomes the center of a new cluster – Points may be added to several clusters – Complexity with c number of clusters and n number of data points – A threshold on n is used to find outliers 17

Find points that lie in a sparse region of the feature space by computing the distances to the k-nearest neighbors of the point Dense regions will have many points near them and will have a small k-NN score If k exceeds the frequency of any given attack and the images of the attack elements are far from the images of the normal elements, then the k-NN score can be used to detect attacks 18 K-Nearest Neighbor

K-NN is computationally expensive Since we’re interested in only the k-nearest points to a given point we can reduce the computational cost by using canopy clustering – Canopy Clustering is used to reduce the space into smaller subsets avoiding the need to check every data point 19

Modified Canopy Clustering Cluster data with fixed-width approach with the variation of placing each element in only one cluster For each two points x1,x2 in a cluster And in all cases 20

Let C be the set of clusters (initially containing all clusters in the data) At any step, we have a set of points which are potentially among the k-nearest neighbor points. This set is denoted as P. We also have a set of points that are in fact among the k-nearest points. This set is denotes as K. Initially K and P are empty 21

Pre-compute the distance from x to each cluster. For the cluster with center closest to x we remove it from C and add all its points to P. Called Opening the Cluster We can use the lower bound on distance given by For each point xi in P we compare distances to other points in P If this distance is <dmin we can guarantee that xi is closer to point x than all the points in the clusters in C 22

In this case we remove xi from P and add it to K If distance is >dmin then we open the closest cluster and add all the points to P and remove that cluster from C Every time we remove a cluster from C dmin will increase Once K has k elements we terminate 23

Computation is spent checking distance between points in D to the cluster centers, which is more efficient than computing pairwise distances among all points Choice of w effects only the efficiency, not the K-NN score Intuitively we want to choose a w that splits the data into reasonably sized clusters 24

One Class SVM Map feature space into a second feature space with a radial basis kernel Standard SVM requires supervised learning algorithms (it requires labeled data) 25

A newly modified SVM was adapted to unsupervised learning algorithm Attempts to separate the entire set of data from the origin with maximal margin Classes will be labeled as +1 and -1 26

The hyperplane is estimated by the hyperplane’s normal vector in the feature space w and offset from the origin 27 Decision function

28 Optimization is solved with a variant of Sequential Minimal Optimization

Feature Space Data Sets – Network Records with 41 features and 4,900,00 instances (KDD Cup 1999 Data) – System Call Traces (process) from 5 weeks from the Basic Security Module of the MIT Lincoln Labs IDS Evaluation created on

Experimental Results 30

ROC Curves 31

Questions 32