Jeff Howbert Introduction to Machine Learning Winter 2014 1 Anomaly Detection Some slides taken or adapted from: “Anomaly Detection: A Tutorial” Arindam.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 12 —
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Minqi Zhou © Tan,Steinbach, Kumar Introduction to Data Mining.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Unsupervised Intrusion Detection Using Clustering Approach Muhammet Kabukçu Sefa Kılıç Ferhat Kutlu Teoman Toraman 1/29.
Clustering.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Anomaly Detection. Anomaly/Outlier Detection  What are anomalies/outliers? The set of data points that are considerably different than the remainder.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.
Anomaly detection Problem motivation Machine Learning.
Chapter 10 Hypothesis Testing
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Anomaly Detection Presented by: Anupam Das CS 568MCC Spring 2013
Anomaly detection with Bayesian networks Website: John Sandiford.
Fast Portscan Detection Using Sequential Hypothesis Testing Authors: Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari Balakrishnan Publication: IEEE.
Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
N. GagunashviliRAVEN Workshop Heidelberg Nikolai Gagunashvili (University of Akureyri, Iceland) Data mining methods in RAVEN network.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Data Mining Anomaly Detection © Tan,Steinbach, Kumar Introduction to Data Mining.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
CLUSTER ANALYSIS Introduction to Clustering Major Clustering Methods.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Lecture 7: Outlier Detection Introduction to Data Mining Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Classification Ensemble Methods 1
Anomaly Detection.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,
1 CSE 881: Data Mining Lecture 22: Anomaly Detection.
Anomaly Detection Nathan Dautenhahn CS 598 Class Lecture March 3, 2011.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Semi-Supervised Clustering
Ch8: Nonparametric Methods
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 12 —
Lecture Notes for Chapter 9 Introduction to Data Mining, 2nd Edition
Roland Kwitt & Tobias Strohmeier
Data Mining Anomaly Detection
Outlier Discovery/Anomaly Detection
Baselining PMU Data to Find Patterns and Anomalies
A survey of network anomaly detection techniques
Data Mining Anomaly/Outlier Detection
Data Mining Anomaly Detection
Jia-Bin Huang Virginia Tech
Data Mining Anomaly/Outlier Detection
Data Mining Anomaly Detection
Presumptions Subgroups (samples) of data are formed.
Presentation transcript:

Jeff Howbert Introduction to Machine Learning Winter Anomaly Detection Some slides taken or adapted from: “Anomaly Detection: A Tutorial” Arindam Banerjee, Varun Chandola, Vipin Kumar, Jaideep Srivastava, University of Minnesota Aleksandar Lazarevic, United Technology Research Center

Jeff Howbert Introduction to Machine Learning Winter Anomalies and outliers are essentially the same thing: objects that are different from most other objects The techniques used for detection are the same. Anomaly detection

Jeff Howbert Introduction to Machine Learning Winter l Historically, the field of statistics tried to find and remove outliers as a way to improve analyses. l There are now many fields where the outliers / anomalies are the objects of greatest interest. –The rare events may be the ones with the greatest impact, and often in a negative way. Anomaly detection

Jeff Howbert Introduction to Machine Learning Winter l Data from different class of object or underlying mechanism –disease vs. non-disease –fraud vs. not fraud l Natural variation –tails on a Gaussian distribution l Data measurement and collection errors Causes of anomalies

Jeff Howbert Introduction to Machine Learning Winter Structure of anomalies l Point anomalies l Contextual anomalies l Collective anomalies

Jeff Howbert Introduction to Machine Learning Winter l An individual data instance is anomalous with respect to the data Point anomalies

Jeff Howbert Introduction to Machine Learning Winter Contextual anomalies l An individual data instance is anomalous within a context l Requires a notion of context l Also referred to as conditional anomalies * * Song, et al, “Conditional Anomaly Detection”, IEEE Transactions on Data and Knowledge Engineering, Normal Anomaly

Jeff Howbert Introduction to Machine Learning Winter Collective anomalies l A collection of related data instances is anomalous l Requires a relationship among data instances –Sequential data –Spatial data –Graph data l The individual instances within a collective anomaly are not anomalous by themselves anomalous subsequence

Jeff Howbert Introduction to Machine Learning Winter Applications of anomaly detection l Network intrusion l Insurance / credit card fraud l Healthcare informatics / medical diagnostics l Industrial damage detection l Image processing / video surveillance l Novel topic detection in text mining l …

Jeff Howbert Introduction to Machine Learning Winter Intrusion detection l Intrusion detection –Monitor events occurring in a computer system or network and analyze them for intrusions –Intrusions defined as attempts to bypass the security mechanisms of a computer or network ‏ l Challenges –Traditional intrusion detection systems are based on signatures of known attacks and cannot detect emerging cyber threats –Substantial latency in deployment of newly created signatures across the computer system l Anomaly detection can alleviate these limitations

Jeff Howbert Introduction to Machine Learning Winter Fraud detection l Detection of criminal activities occurring in commercial organizations. l Malicious users might be: –Employees –Actual customers –Someone posing as a customer (identity theft) l Types of fraud –Credit card fraud –Insurance claim fraud –Mobile / cell phone fraud –Insider trading l Challenges –Fast and accurate real-time detection –Misclassification cost is very high

Jeff Howbert Introduction to Machine Learning Winter Healthcare informatics l Detect anomalous patient records –Indicate disease outbreaks, instrumentation errors, etc. l Key challenges –Only normal labels available –Misclassification cost is very high –Data can be complex: spatio-temporal

Jeff Howbert Introduction to Machine Learning Winter Industrial damage detection l Detect faults and failures in complex industrial systems, structural damages, intrusions in electronic security systems, suspicious events in video surveillance, abnormal energy consumption, etc. –Example: aircraft safety  anomalous aircraft (engine) / fleet usage  anomalies in engine combustion data  total aircraft health and usage management l Key challenges –Data is extremely large, noisy, and unlabelled –Most of applications exhibit temporal behavior –Detected anomalous events typically require immediate intervention

Jeff Howbert Introduction to Machine Learning Winter Image processing l Detecting outliers in a image monitored over time l Detecting anomalous regions within an image l Used in –mammography image analysis –video surveillance –satellite image analysis l Key Challenges –Detecting collective anomalies –Data sets are very large Anomaly

Jeff Howbert Introduction to Machine Learning Winter Use of data labels in anomaly detection l Supervised anomaly detection –Labels available for both normal data and anomalies –Similar to classification with high class imbalance l Semi-supervised anomaly detection –Labels available only for normal data l Unsupervised anomaly detection –No labels assumed –Based on the assumption that anomalies are very rare compared to normal data

Jeff Howbert Introduction to Machine Learning Winter Output of anomaly detection l Label –Each test instance is given a normal or anomaly label –Typical output of classification-based approaches l Score –Each test instance is assigned an anomaly score  allows outputs to be ranked  requires an additional threshold parameter

Jeff Howbert Introduction to Machine Learning Winter Variants of anomaly detection problem l Given a dataset D, find all the data points x  D with anomaly scores greater than some threshold t. l Given a dataset D, find all the data points x  D having the top-n largest anomaly scores. l Given a dataset D, containing mostly normal data points, and a test point x, compute the anomaly score of x with respect to D.

Jeff Howbert Introduction to Machine Learning Winter l No labels available l Based on assumption that anomalies are very rare compared to “normal” data l General steps –Build a profile of “normal” behavior  summary statistics for overall population  model of multivariate data distribution –Use the “normal” profile to detect anomalies  anomalies are observations whose characteristics differ significantly from the normal profile Unsupervised anomaly detection

Jeff Howbert Introduction to Machine Learning Winter l Statistical l Proximity-based l Density-based l Clustering-based [ following slides illustrate these techniques for unsupervised detection of point anomalies ] Techniques for anomaly detection

Jeff Howbert Introduction to Machine Learning Winter Statistical outlier detection Outliers are objects that are fit poorly by a statistical model. l Estimate a parametric model describing the distribution of the data l Apply a statistical test that depends on –Properties of test instance –Parameters of model (e.g., mean, variance) –Confidence limit (related to number of expected outliers)

Jeff Howbert Introduction to Machine Learning Winter Statistical outlier detection l Univariate Gaussian distribution –Outlier defined by z-score > threshold

Jeff Howbert Introduction to Machine Learning Winter l Multivariate Gaussian distribution –Outlier defined by Mahalanobis distance > threshold Statistical anomaly detection

Jeff Howbert Introduction to Machine Learning Winter Grubbs’ test l Detect outliers in univariate data l Assume data comes from normal distribution l Detects one outlier at a time, remove the outlier, and repeat –H 0 : There is no outlier in data –H A : There is at least one outlier l Grubbs’ test statistic: l Reject H 0 if:

Jeff Howbert Introduction to Machine Learning Winter Likelihood approach l Assume the dataset D contains samples from a mixture of two probability distributions: –M (majority distribution) –A (anomalous distribution) l General approach: –Initially, assume all the data points belong to M –Let L t (D) be the log likelihood of D at time t –For each point x t that belongs to M, move it to A  Let L t+1 (D) be the new log likelihood.  Compute the difference,  = L t (D) – L t+1 (D)  If  > c (some threshold), then x t is declared as an anomaly and moved permanently from M to A

Jeff Howbert Introduction to Machine Learning Winter Likelihood approach l Data distribution, D = (1 – ) M + A l M is a probability distribution estimated from data –Can be based on any modeling method (naïve Bayes, maximum entropy, etc) l A is initially assumed to be uniform distribution l Likelihood at time t:

Jeff Howbert Introduction to Machine Learning Winter Statistical outlier detection l Pros –Statistical tests are well-understood and well- validated. –Quantitative measure of degree to which object is an outlier. l Cons –Data may be hard to model parametrically.  multiple modes  variable density –In high dimensions, data may be insufficient to estimate true distribution.

Jeff Howbert Introduction to Machine Learning Winter Outliers are objects far away from other objects. l Common approach: –Outlier score is distance to k th nearest neighbor. –Score sensitive to choice of k. Proximity-based outlier detection

Jeff Howbert Introduction to Machine Learning Winter Proximity-based outlier detection

Jeff Howbert Introduction to Machine Learning Winter Proximity-based outlier detection

Jeff Howbert Introduction to Machine Learning Winter Proximity-based outlier detection

Jeff Howbert Introduction to Machine Learning Winter Proximity-based outlier detection

Jeff Howbert Introduction to Machine Learning Winter Proximity-based outlier detection l Pros –Easier to define a proximity measure for a dataset than determine its statistical distribution. –Quantitative measure of degree to which object is an outlier. –Deals naturally with multiple modes. l Cons –O(n 2 ) complexity. –Score sensitive to choice of k. –Does not work well if data has widely variable density.

Jeff Howbert Introduction to Machine Learning Winter Outliers are objects in regions of low density. l Outlier score is inverse of density around object. l Scores usually based on proximities. l Example scores: –Reciprocal of average distance to k nearest neighbors: –Number of objects within fixed radius d (DBSCAN). –These two example scores work poorly if data has variable density. Density-based outlier detection

Jeff Howbert Introduction to Machine Learning Winter l Relative density outlier score (Local Outlier Factor, LOF) –Reciprocal of average distance to k nearest neighbors, relative to that of those k neighbors. Density-based outlier detection

Jeff Howbert Introduction to Machine Learning Winter Density-based outlier detection relative density (LOF) outlier scores

Jeff Howbert Introduction to Machine Learning Winter l Pros –Quantitative measure of degree to which object is an outlier. –Can work well even if data has variable density. l Cons –O(n 2 ) complexity –Must choose parameters  k for nearest neighbor  d for distance threshold Density-based outlier detection

Jeff Howbert Introduction to Machine Learning Winter Outliers are objects that do not belong strongly to any cluster. l Approaches: –Assess degree to which object belongs to any cluster. –Eliminate object(s) to improve objective function. –Discard small clusters far from other clusters. l Issue: –Outliers may affect initial formation of clusters. Cluster-based outlier detection

Jeff Howbert Introduction to Machine Learning Winter Assess degree to which object belongs to any cluster. l For prototype-based clustering (e.g. k-means), use distance to cluster centers. –To deal with variable density clusters, use relative distance: l Similar concepts for density-based or connectivity-based clusters. Cluster-based outlier detection

Jeff Howbert Introduction to Machine Learning Winter Cluster-based outlier detection distance of points from nearest centroid

Jeff Howbert Introduction to Machine Learning Winter Cluster-based outlier detection relative distance of points from nearest centroid

Jeff Howbert Introduction to Machine Learning Winter Eliminate object(s) to improve objective function. 1) Form initial set of clusters. 2) Remove the object which most improves objective function. 3) Repeat step 2) until … Cluster-based outlier detection

Jeff Howbert Introduction to Machine Learning Winter Discard small clusters far from other clusters. l Need to define thresholds for “small” and “far”. Cluster-based outlier detection

Jeff Howbert Introduction to Machine Learning Winter l Pro: –Some clustering techniques have O(n) complexity. –Extends concept of outlier from single objects to groups of objects. l Cons: –Requires thresholds for minimum size and distance. –Sensitive to number of clusters chosen. –Hard to associate outlier score with objects. –Outliers may affect initial formation of clusters. Cluster-based outlier detection

Jeff Howbert Introduction to Machine Learning Winter l Data is unlabelled, unlike usual SVM setting. l Goal: find hyperplane (in higher-dimensional kernel space) which encloses as much data as possible with minimum volume. –Tradeoff between amount of data enclosed and tightness of enclosure; controlled by regularization of slack variables. One-class support vector machines

Jeff Howbert Introduction to Machine Learning Winter One-class SVM vs. Gaussian envelope Images from

Jeff Howbert Introduction to Machine Learning Winter LIBSVM -s 2 -t 2 -g 50 -n 0.35 One-class SVM demo

Jeff Howbert Introduction to Machine Learning Winter l Three groups of features –Basic features of individual TCP connections  source & destination IP Features 1 & 2  source & destination port Features 3 & 4  Protocol Feature 5  Duration Feature 6  Bytes per packets Feature 7  number of bytes Feature 8 –Time based features  For the same source (destination) IP address, number of unique destination (source) IP addresses inside the network in last T seconds – Features 9 (13)  Number of connections from source (destination) IP to the same destination (source) port in last T seconds – Features 11 (15) –Connection based features  For the same source (destination) IP address, number of unique destination (source) IP addresses inside the network in last N connections - Features 10 (14)  Number of connections from source (destination) IP to the same destination (source) port in last N connections - Features 12 (16) Anomaly detection on real network data

Jeff Howbert Introduction to Machine Learning Winter Typical anomaly detection output  Anomalous connections that correspond to the “slammer” worm  Anomalous connections that correspond to the ping scan  Connections corresponding to Univ. Minnesota machines connecting to “half-life” game servers

Jeff Howbert Introduction to Machine Learning Winter l Data often streaming, not static –Credit card transactions l Anomalies can be bursty –Network intrusions Real-world issues in anomaly detection

Jeff Howbert Introduction to Machine Learning Winter An excerpt from advice given by a machine learning veteran on StackOverflow: “ … you are training and testing on the same data. A kitten dies every time this happens.” Quote of the day