Anomaly Detection Presented by: Anupam Das CS 568MCC Spring 2013

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

COMPUTER AIDED DIAGNOSIS: CLASSIFICATION Prof. Yasser Mostafa Kadah –
ECG Signal processing (2)
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 12 —
Imbalanced data David Kauchak CS 451 – Fall 2013.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Data Mining Classification: Alternative Techniques
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Minqi Zhou © Tan,Steinbach, Kumar Introduction to Data Mining.
An Introduction of Support Vector Machine
Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine learning continued Image source:
Anomaly Detection in Data Docent Xiao-Zhi Gao
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Anomaly Detection. Anomaly/Outlier Detection  What are anomalies/outliers? The set of data points that are considerably different than the remainder.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
05/06/2005CSIS © M. Gibbons On Evaluating Open Biometric Identification Systems Spring 2005 Michael Gibbons School of Computer Science & Information Systems.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Chapter 5 Data mining : A Closer Look.
Radial Basis Function Networks
Data Mining Techniques
A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,
Jeff Howbert Introduction to Machine Learning Winter Anomaly Detection Some slides taken or adapted from: “Anomaly Detection: A Tutorial” Arindam.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
This week: overview on pattern recognition (related to machine learning)
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by.
Data Mining Anomaly Detection © Tan,Steinbach, Kumar Introduction to Data Mining.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
Project by: Cirill Aizenberg, Dima Altshuler Supervisor: Erez Berkovich.
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Supervised Learning. CS583, Bing Liu, UIC 2 An example application An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc)
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Anomaly Detection.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Anomaly Detection. Network Intrusion Detection Techniques. Ştefan-Iulian Handra Dept. of Computer Science Polytechnic University of Timișoara June 2010.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,
Anomaly Detection Nathan Dautenhahn CS 598 Class Lecture March 3, 2011.
Machine Learning for Computer Security
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
School of Computer Science & Engineering
Ch8: Nonparametric Methods
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 12 —
Lecture Notes for Chapter 9 Introduction to Data Mining, 2nd Edition
Data Mining Classification: Alternative Techniques
Outlier Discovery/Anomaly Detection
A survey of network anomaly detection techniques
Data Mining Anomaly/Outlier Detection
Neural Networks and Their Application in the Fields of Coporate Finance By Eric Séverin Hanna Viinikainen.
Data Mining Anomaly Detection
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Modeling IDS using hybrid intelligent systems
Data Mining Anomaly Detection
Presentation transcript:

Anomaly Detection Presented by: Anupam Das CS 568MCC Spring 2013 Network Security Some of the contents were taken from the authors of "Anomaly Detection : A Survey ACM Computing Surveys, Vol. 41(3), Article 15, July 2009 Department of Computer Science, UIUC April 21, 2017

“Mining needle in a haystack. So much hay and so little time” Introduction We are drowning in the deluge of data that are being collected world-wide, while starving for knowledge at the same time Anomalous events occur relatively infrequently However, when they do occur, their consequences can be quite dramatic and quite often in a negative sense “Mining needle in a haystack. So much hay and so little time” Department of Computer Science, UIUC April 21, 2017

Department of Computer Science, UIUC What are Anomalies? Anomaly is a pattern in the data that does not conform to the expected behaviour Also referred to as outliers, exceptions, peculiarities, surprise, etc. Anomalies translate to significant (often critical) real life entities Department of Computer Science, UIUC April 21, 2017

Department of Computer Science, UIUC Real World Anomalies Credit Card Fraud Cyber Intrusions Healthcare Informatics / Medical diagnostics Industrial Damage Detection Image Processing / Video surveillance Novel Topic Detection in Text Mining Department of Computer Science, UIUC April 21, 2017

Department of Computer Science, UIUC Simple Example X Y N1 N2 o1 o2 O3 N1 and N2 are regions of normal behavior Points o1 and o2 are anomalies Points in region O3 are anomalies Department of Computer Science, UIUC April 21, 2017

Department of Computer Science, UIUC Key Challenges Defining a representative normal region is challenging The boundary between normal and outlying behavior is often not precise The exact notion of an outlier is different for different application domains Availability of labeled data for training/validation Malicious adversaries Data might contain noise Normal behavior keeps evolving Department of Computer Science, UIUC April 21, 2017

Aspects of Anomaly Detection Problem Nature of input data Availability of supervision Type of anomaly: point, contextual, structural Output of anomaly detection Evaluation of anomaly detection techniques Department of Computer Science, UIUC April 21, 2017

Department of Computer Science, UIUC Input Data Input Data could be Univariate : single variable Multivariate: multiple variable Nature of attributes Binary Categorical Continuous Hybrid continuous categorical continuous categorical binary Department of Computer Science, UIUC April 21, 2017

Input Data – Complex Data Types Relationship among data instances Sequential Temporal Spatial Spatio-temporal Graph Department of Computer Science, UIUC April 21, 2017

Department of Computer Science, UIUC Data Labels Supervised Anomaly Detection Labels available for both normal data and anomalies Similar to rare class mining Semi-supervised Anomaly Detection Labels available only for normal data Unsupervised Anomaly Detection No labels assumed Based on the assumption that anomalies are very rare compared to normal data Department of Computer Science, UIUC April 21, 2017

Department of Computer Science, UIUC Type of Anomaly Point Anomalies Contextual Anomalies Collective Anomalies Department of Computer Science, UIUC April 21, 2017

Department of Computer Science, UIUC Point Anomalies An individual data instance is anomalous if it deviates significantly from the rest of the data set. X Y N1 N2 o1 o2 O3 Anomaly Department of Computer Science, UIUC April 21, 2017

Department of Computer Science, UIUC Contextual Anomalies An individual data instance is anomalous within a context Requires a notion of context Also referred to as conditional anomalies* Anomaly Normal Department of Computer Science, UIUC April 21, 2017

Anomalous Subsequence Collective Anomalies A collection of related data instances is anomalous Requires a relationship among data instances Sequential Data Spatial Data Graph Data The individual instances within a collective anomaly are not anomalous by themselves Anomalous Subsequence Department of Computer Science, UIUC April 21, 2017

Output of Anomaly Detection Label Each test instance is given a normal or anomaly label This is especially true of classification-based approaches Score Each test instance is assigned an anomaly score Allows the output to be ranked Requires an additional threshold parameter Department of Computer Science, UIUC April 21, 2017

Evaluation of Anomaly Detection Accuracy is not sufficient metric for evaluation Example: network traffic data set with 99.9% of normal data and 0.1% of intrusions Trivial classifier that labels everything with the normal class can achieve 99.9% accuracy !!!!! anomaly class – C normal class – NC AUC Focus on both recall and precision Recall /Detection (R)= TP/(TP + FN)‏ Precision (P) = TP/(TP + FP)‏ False rate (F)=FP/(TN+FP) Department of Computer Science, UIUC April 21, 2017

Taxonomy* Anomaly Detection Point Anomaly Detection Classification Based Nearest Neighbor Based Clustering Based Statistical Others Rule Based Neural Networks Based SVM Based Density Based Distance Based Parametric Non-parametric Information Theory Based Spectral Decomposition Based Visualization Based Contextual Anomaly Detection Collective Anomaly Detection Online Anomaly Detection Distributed Anomaly Detection * Outlier Detection – A Survey, Varun Chandola, Arindam Banerjee, and Vipin Kumar, Technical Report TR07-17, University of Minnesota (Under Review) Department of Computer Science, UIUC April 21, 2017

Classification Based Techniques Main idea: build a classification model for normal (and anomalous) events based on labelled training data, and use it to classify each new unseen event Classification models must be able to handle skewed (imbalanced) class distributions Categories: Supervised classification techniques Require knowledge of both normal and anomaly class Build classifier to distinguish between normal and known anomalies Semi-supervised classification techniques Require knowledge of normal class only! Use modified classification model to learn the normal behavior and then detect any deviations from normal behavior as anomalous Pros and Cons Department of Computer Science, UIUC April 21, 2017

Classification Based Techniques Some techniques Neural network based approaches Support Vector machines (SVM) based approaches Bayesian networks based approaches Rule based techniques Fuzzy Logic Genetic Algorithms Principle Component Analysis Department of Computer Science, UIUC April 21, 2017

Department of Computer Science, UIUC Using Neural Networks Multi-layer NNs Creating hyper-planes for separating between various classes Good when dealing with huge data sets and handles noisy data well Bad because learning takes a long time Department of Computer Science, UIUC April 21, 2017

Department of Computer Science, UIUC Neural Networks INPUT LAYER- X = { x1, x2, …. xn}, where n is the number of attributes. There are as many nodes as no. of inputs. HIDDEN LAYER – the number of nodes in the hidden layer and the number of hidden layers depends on implementation. OUTPUT LAYER – corresponds to the class attribute. There are as many nodes as classes. Back Propagation learns by iteratively processing a set of training data (samples). Department of Computer Science, UIUC April 21, 2017

Department of Computer Science, UIUC SVM Support vector machine constructs a hyperplane or set of hyperplanes in a high or infinite-dimensional space, which can be used for classification denotes +1 denotes -1 x2 How would you classify these points using a linear discriminant function in order to minimize the error rate? Infinite number of answers! Which one is the best? x1 Department of Computer Science, UIUC April 21, 2017

Department of Computer Science, UIUC Linear SVM denotes +1 denotes -1 The linear discriminant function (classifier) with the maximum margin is the best Support Vectors are those data points that the margin pushes up against Why it is the best? Robust to outliners and thus strong generalization ability Department of Computer Science, UIUC

Linear SVM Quadratic programming with linear constraints such that Formulation: denotes +1 denotes -1 x2 Margin x+ x- Quadratic programming with linear constraints The margin width is: wT x + b = 1 wT x + b = 0 wT x + b = -1 n Goal: x1 such that Department of Computer Science, UIUC April 21, 2017

Point Anomaly Detection Nearest Neighbor Based Taxonomy Anomaly Detection Point Anomaly Detection Classification Based Nearest Neighbor Based Clustering Based Statistical Others Rule Based Neural Networks Based SVM Based Density Based Distance Based Parametric Non-parametric Information Theory Based Spectral Decomposition Based Visualization Based Contextual Anomaly Detection Collective Anomaly Detection Online Anomaly Detection Distributed Anomaly Detection Department of Computer Science, UIUC April 21, 2017

Nearest Neighbor Based Techniques Key assumption: normal points have close neighbors while anomalies are located far from other points General two-step approach Compute neighborhood for each data record Analyze the neighborhood to determine whether data record is anomaly or not Categories: Distance based methods Anomalies are data points most distant from other points Density based methods Anomalies are data points in low density regions Department of Computer Science, UIUC April 21, 2017

Distance Based Anomaly Detection For each object o, examine the # of other objects in the r-neighborhood of o, where r is a user-specified distance threshold An object o is an outlier if most (taking π as a fraction threshold) of the objects in D are far away from o, i.e., not in the r-neighborhood of o An object o is a DB(r, π) outlier if Department of Computer Science, UIUC April 21, 2017

Density Based Anomaly Detection Compute local densities of particular regions and declare instances in low density regions as potential anomalies Approach: Local Outlier Factor (LOF) Distance from p3 to nearest neighbor In the NN approach, p2 is not considered as outlier, while the LOF approach find both p1 and p2 as outliers NN approach may consider p3 as outlier, but LOF approach does not p3  Distance from p2 to nearest neighbor p2  p1  Department of Computer Science, UIUC April 21, 2017

Local Outlier Factor (LOF) For each data point o compute the # of points in k-distance: Compute reachability distance (reachdist) for each data example o with respect to data example o’ as: Compute local reachability density (lrd) : LOF is the average of the ratio of local reachability density of o’s k- nearest neighbors and local reachability density of the data record o Higher the LOF the more likely its an outlier Department of Computer Science, UIUC April 21, 2017

Point Anomaly Detection Taxonomy Anomaly Detection Point Anomaly Detection Classification Based Nearest Neighbor Based Clustering Based Statistical Others Rule Based Neural Networks Based SVM Based Density Based Distance Based Parametric Non-parametric Information Theory Based Spectral Decomposition Based Visualization Based Contextual Anomaly Detection Collective Anomaly Detection Online Anomaly Detection Distributed Anomaly Detection Department of Computer Science, UIUC April 21, 2017

Clustering Based Techniques Key assumption: normal data records belong to large and dense clusters, while anomalies belong do not belong to any of the clusters or form very small clusters Categorization according to labels Semi-supervised – cluster normal data to create modes of normal behavior. If a new instance does not belong to any of the clusters or it is not close to any cluster, is anomaly Unsupervised – post-processing is needed after a clustering step to determine the size of the clusters and the distance from the clusters is required for the point to be anomaly Anomalies detected using clustering based methods can be: Does not belong to any cluster, Large distance between the object and its closest cluster Belongs to a small or sparse cluster Department of Computer Science, UIUC April 21, 2017

Cluster Based Local Outlier Factor FindCBLOF: Detect outliers in small clusters Find clusters, and sort them in decreasing size To each data point, assign a cluster-based local outlier factor (CBLOF): If obj p belongs to a large cluster, CBLOF = cluster_size X similarity between p and cluster If p belongs to a small one, CBLOF = cluster size X similarity betw. p and the closest large cluster Ex. In the figure, o is outlier since its closest large cluster is C1, but the similarity between o and C1 is small. For any point in C3, its closest large cluster is C2 but its similarity from C2 is low, plus |C3| = 3 is small Department of Computer Science, UIUC April 21, 2017

Point Anomaly Detection Taxonomy Anomaly Detection Point Anomaly Detection Classification Based Nearest Neighbor Based Clustering Based Statistical Others Rule Based Neural Networks Based SVM Based Density Based Distance Based Parametric Non-parametric Information Theory Based Spectral Decomposition Based Visualization Based Contextual Anomaly Detection Collective Anomaly Detection Online Anomaly Detection Distributed Anomaly Detection Department of Computer Science, UIUC April 21, 2017

Statistics Based Techniques Statistical approaches assume that the objects in a data set are generated by a stochastic process (a generative model) Idea: learn a generative model fitting the given data set, and then identify the objects in low probability regions of the model as outliers. Advantage Utilize existing statistical modelling techniques to model various type of distributions Challenges With high dimensions, difficult to estimate distributions Parametric assumptions often do not hold for real data sets Department of Computer Science, UIUC April 21, 2017

Types of Statistical Techniques Parametric Techniques Assume that the normal data is generated from an underlying parametric distribution Learn the parameters from the normal sample Determine the likelihood of a test instance to be generated from this distribution to detect anomalies Non-parametric Techniques Do not assume any knowledge of parameters Not completely parameter free but consider the number and nature of the parameters are flexible and not fixed in advance Examples: histogram and kernel density estimation Department of Computer Science, UIUC April 21, 2017

Parametric Techniques Univariate data: A data set involving only one attribute or variable Often assume that data are generated from a normal distribution, learn the parameters from the input data, and identify the points with low probability as outliers Ex: Avg. temp.: {24.0, 28.9, 29.0, 29.1, 29.1, 29.2, 29.2, 29.3, 29.4} Compute μ and σ from the samples Department of Computer Science, UIUC April 21, 2017

Parametric Techniques Univariate outlier detection: The Grubb's test (another statistical method under normal distribution) For each object x in a data set, compute its z-score: Now x is an outlier if where is the value taken by a t-distribution at a significance level of α/(2N), and N is the # of objects in the data set Department of Computer Science, UIUC April 21, 2017

Non-Parametric Techniques The model of normal data is learned from the input data without any a priori structure. Outlier detection using histogram: Figure shows the histogram of purchase amounts in transactions A transaction in the amount of $7,500 is an outlier, since only 0.2% transactions have an amount higher than $5,000 Problem: Hard to choose an appropriate bin size for histogram Solution: Adopt kernel density estimation to estimate the probability density distribution of the data. Department of Computer Science, UIUC April 21, 2017

Anomaly Detection on Real Network Data Anomaly detection was used at U of Minnesota and Army Research Lab to detect various intrusive/suspicious activities Many of these could not be detected using widely used intrusion detection tools like SNORT Anomalies/attacks picked by MINDS Scanning activities Non-standard behavior Policy violations Worms MINDS MINDS – Minnesota Intrusion Detection System Association pattern analysis Summary and characterization of attacks Anomaly scores network Anomaly detection … Detected novel attacks MINDSAT Human analyst Net flow tools tcpdump Data capturing device Labels Filtering Feature Extraction Known attack detection Detected known attacks Department of Computer Science, UIUC April 21, 2017

Department of Computer Science, UIUC Feature Extraction Three groups of features Basic features of individual TCP connections source & destination IP Features 1 & 2 source & destination port Features 3 & 4 Protocol Feature 5 Duration Feature 6 Bytes per packets Feature 7 number of bytes Feature 8 Time based features For the same source (destination) IP address, number of unique destination (source) IP addresses inside the network in last T seconds – Features 9 (13) Number of connections from source (destination) IP to the same destination (source) port in last T seconds – Features 11 (15) Connection based features For the same source (destination) IP address, number of unique destination (source) IP addresses inside the network in last N connections - Features 10 (14) Number of connections from source (destination) IP to the same destination (source) port in last N connections - Features 12 (16) Department of Computer Science, UIUC April 21, 2017

Typical Anomaly Detection Output 48 hours after the “slammer” worm Department of Computer Science, UIUC April 21, 2017

Department of Computer Science, UIUC Conclusions Anomaly detection can detect critical information in data Highly applicable in various application domains Nature of anomaly detection problem is dependent on the application domain Need different approaches to solve a particular problem formulation Department of Computer Science, UIUC April 21, 2017

Department of Computer Science, UIUC Related problems Rare Class Mining Chance discovery Novelty Detection Exception Mining Noise Removal Black Swan* Department of Computer Science, UIUC April 21, 2017