EL 933 Final Project Presentation Combining Filtering and Statistical Methods for Anomaly Detection Augustin Soule Kav´e SalamatianNina Taft.

Slides:

Advertisements

Similar presentations

Characteristics of Network Traffic Flow Anomalies Paul Barford and David Plonka University of Wisconsin – Madison SIGCOMM IMW, 2001.

Advertisements

Evaluating Classifiers

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.

A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,

Detectability of Traffic Anomalies in Two Adjacent Networks Augustin Soule, Haakon Ringberg, Fernando Silveira, Jennifer Rexford, Christophe Diot.

CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.

FTP Biostatistics II Model parameter estimations: Confronting models with measurements.

Part II – TIME SERIES ANALYSIS C5 ARIMA (Box-Jenkins) Models

Sampling Distributions (§ )

STAT 497 APPLIED TIME SERIES ANALYSIS

Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.

Planning under Uncertainty

An introduction to time series approaches in biosurveillance Professor The Auton Lab School of Computer Science Carnegie Mellon University

Model Evaluation Metrics for Performance Evaluation

1 In-Network PCA and Anomaly Detection Ling Huang* XuanLong Nguyen* Minos Garofalakis § Michael Jordan* Anthony Joseph* Nina Taft § *UC Berkeley § Intel.

CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.

Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering On-line Alert Systems for Production Plants A Conflict Based Approach.

Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.

Evaluating Hypotheses

Cumulative Violation For any window size  t  Communication-Efficient Tracking for Distributed Cumulative Triggers Ling Huang* Minos Garofalakis.

1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.

Control Charts for Variables

Chapter Sampling Distributions and Hypothesis Testing.

Inferences About Process Quality

A Signal Analysis of Network Traffic Anomalies Paul Barford, Jeffrey Kline, David Plonka, and Amos Ron.

Radial Basis Function Networks

A Signal Analysis of Network Traffic Anomalies Paul Barford with Jeffery Kline, David Plonka, Amos Ron University of Wisconsin – Madison Summer, 2002.

1. Introduction Generally Intrusion Detection Systems (IDSs), as special-purpose devices to detect network anomalies and attacks, are using two approaches.

AM Recitation 2/10/11.

Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,

1 Least squares procedure Inference for least squares lines Simple Linear Regression.

SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.

Chapter 15 Data Analysis: Testing for Significant Differences.

Chapter 8 Introduction to Hypothesis Testing

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.

Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.

Discrete Distributions The values generated for a random variable must be from a finite distinct set of individual values. For example, based on past observations,

The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.

Who Is Peeping at Your Passwords at Starbucks? To Catch an Evil Twin Access Point DSN 2010 Yimin Song, Texas A&M University Chao Yang, Texas A&M University.

Network Anomography Yin Zhang – University of Texas at Austin Zihui Ge and Albert Greenberg – AT&T Labs Matthew Roughan – University of Adelaide IMC 2005.

Section 10.1 Confidence Intervals

Copyright © 2010 Pearson Education, Inc. Slide Beware: Lots of hidden slides!

© 2003, Carla Ellis Self-Scaling Benchmarks Peter Chen and David Patterson, A New Approach to I/O Performance Evaluation – Self-Scaling I/O Benchmarks,

Wireless communications and mobile computing conference, p.p , July 2011.

K. Kolomvatsos 1, C. Anagnostopoulos 2, and S. Hadjiefthymiades 1 An Efficient Environmental Monitoring System adopting Data Fusion, Prediction & Fuzzy.

1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.

18/01/01GEO data analysis meeting, Golm Issues in GW bursts Detection Soumya D. Mohanty AEI Outline of the talk Transient Tests (Transient=Burst) Establishing.

ASTUTE: Detecting a Different Class of Traffic Anomalies Fernando Silveira 1,2, Christophe Diot 1, Nina Taft 3, Ramesh Govindan 4 1 Technicolor 2 UPMC.

Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

Taming Internet Traffic Some notes on modeling the wild nature of OD flows Augustin Soule Kavé Salamatian Antonio Nucci Nina Taft Univ. Paris VI Sprintlabs.

EE515/IS523: Security 101: Think Like an Adversary Evading Anomarly Detection through Variance Injection Attacks on PCA Benjamin I.P. Rubinstein, Blaine.

INTRODUCTION TO HYPOTHESIS TESTING From R. B. McCall, Fundamental Statistics for Behavioral Sciences, 5th edition, Harcourt Brace Jovanovich Publishers,

Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Nine Hypothesis Testing.

Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.

Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.

Network Anomography Yin Zhang Joint work with Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement.

Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)

The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.

Confidence Intervals. Point Estimate u A specific numerical value estimate of a parameter. u The best point estimate for the population mean is the sample.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.

Distributed Network Monitoring in the Wisconsin Advanced Internet Lab Paul Barford Computer Science Department University of Wisconsin – Madison Spring,

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.

Chapter Nine Hypothesis Testing.

Evaluating Classifiers

Discrete Event Simulation - 4

Counting Statistics and Error Prediction

Presentation transcript:

EL 933 Final Project Presentation Combining Filtering and Statistical Methods for Anomaly Detection Augustin Soule Kav´e SalamatianNina Taft

Motivation Traffic anomalies are a fact of life in computer networks –Outages, attacks, flash crowds, DOS, alpha events etc… Anomaly detection and identification in a timely fashion is challenging –Operators typically monitor traffic / link by eye using SNMP or IP flows…. Is it effective ???? –Characterization, building a model of what constitutes the normal behavior. –Efficient methods for anomaly detection…are they any ???

Overview Introduction - AIM and general discussion of methodology Approach - What tools are used and why ? Sequence of different analysis for anomaly detection – Modeling the network Obtaining the residual using filtering analysis of the residual using all the methods and comparison of the methods in the due course Results (analysis) Results Conclusion Future work – some ideas about what I like about the paper

Introduction Aim – Develop an approach for anomaly detection for large scale networks using Traffic Matrix. (TM) The main idea is to predict the normal behavior(TM approach) of the network, filter the “normal“ traffic the actual traffic matrix that is got using more recent measurement data than those used for prediction. Examination of anomalies in the residual traffic using the 4 methods proposed in the paper, two of which are new and other two are the ones in use

Approach What is Traffic Matrix ? How we obtain it here ? (SNMP data) Should Time interval to be considered ??? What kinds of anomalies are in focus and why only those ? Few quick example of such anomalies.. What is a Kalman filter ? What is it used here for ?? What are the four different methods proposed ?

Sequencing the different analysis procedures SNMP data -> TM -> It has OD flows -> filter normal to get residual -> Anomaly analysis What are OD flows ? Its significance here. They are preferred over direct usage of the monitored data captured at the target granularity level What is included anomaly analysis ? Methods used for analysis and validation of these methods Using actual data Using a synthetic anomaly generator.

Continued.. Using network wide perspective (TM) for volume anomaly detection is justified but the data will be lots so have to be scaled. By projecting onto a small number of principle components we can filter out normal traffic. The traffic projecting onto the remaining components is analyzed for anomalies.

Modeling the Network Obtain per-link statistics on byte counts (SNMP today) Infer TM that includes all the OD flows (hidden states!) Total traffic on a link = Sum of all OD flows traversing that link which can be expressed as : Yt = At * Xt + Vt Where Yt = vector of link counts at time t, Xt = OD flows as vectors and Vt corresponds to the measurement errors and At is the routing matrix To capture the dynamic model of OD flow we need a model that specifies Xt+1 in terms of Xt. Where Ct is the state trasition matrix Wt is the noise that accounts for randomness in the fluctuation of flow. For traffic estimation, get total byte count per link and then partition in based on the number of OD flows traversing that link. Now when an anomaly occurs on the link, then it is possible that the anomaly maybe get distributed across all OD flows on that link and to avoid that we use Ct as a diagonal matrix.

Continued.. The assumptions are that Vt and Wt are uncorrelated. Now the task is that we need to Estimate the (t+1)st instance of Xt+1 and that is done by using a Kalman filter. The Kalman filter is a robust filter that estimates the system state process by using a 2 step approach that iterates at time t. using this we get, which is the estimate of Xt at time i, where t >= I and hence the estimated are obtained.

Continued …. The methods used and their details: Method I - based on comparison of the residual traffic to the threshold. ADV – it triggers an anomaly very fast as the test is verified as soon as the Kalman filter processes the new observation. DISADV – the test is being performed independently on past results This creates a High false positive rate i.e it will detect the anomaly based on one observation and which is not the right approach. Method II - based on comparison of local and global variance on our filtered residual signal. ADV - Uses Cumulative summation approach which solves the DISADV of the previous method. It is a very powerful method as it is proved that it’s the best estimator when the variance and level change are unknown. DISADV - It adds some delay for the detection as it takes some observations after the anomaly to estimate the deviation level.

Continued.. Method III - Based on multi-scale and variance ADV - The rational behind multi-scale is that anomalies should appear at different time scales and hence by monitoring these multiple scales the false positive rate should be reduced that is because the change on one time scale will not trigger an alarm (anomaly detected) DISADV – Detection lag involved as wavelet analysis is involved Method IV - Based on multi-scale variance shift In this methods an alarm is triggered if the ratio between the local and global variance exceeds a threshold This analysis is based on two scales one is the scale at which the global variance is calculated and other is the scale at which the local variance is calculated. Again detection lag seen as wavelet analysis of the signal is performed.

Validation of the methods Validation of the methods is done using ROC ( receiver operating characteristics) curves. These curves are a grahical representation of the tradeoffs between the false positives and the false negatives of a decision threshold. These are used since the method comparison can be don’t throughout the decision threshold. Also this ROC analysis gives the performance of the methods in terms of detection time. An algorithm is considered to be good is the ROC curve climbs rapidly towards the upper left corner (A very high fraction of the true anomalies with only a few false positives) Quantification of how quickly the ROC climbs is based on the area, larger the area, better is the algorithm. Each algorithm has a ROC curve and then comparison is done.

Continued.. Validation of these methods also includes testing these methods and getting the range of operation of each of them and this can be done in two ways. Method I : Collect live data in the form of packet or flow level trace and then to have this trace “labeled” Labeling is the procedure of identification of anomalous event with the start and finish time. ADV of labeling - real world anomalies can be tested with. DISADV of labelling – parameters of the anomaly cannot be changed to know the threshold of the trace. Method II : Synthetically generate attacks and test the methods. ADV - Parameters of the attacks can be changed and hence the limits of the method can be identified DISADV – these attacks are not real world and hence the attacks might not be of much concern.

Continued.. Both these approached will be used in our paper so that we can analyze the methods better. Lets consider an Abilene network for testing. Has 11 POP’s Data is collected from every POP. This data results in a TM of 121 OD flows. Synthetic anomaly generator Select either one or a set of OD flows to add anomaly and then add anomalies on top of the baseline traffic level for those OD flows. Anomaly is characterized by 4 factors : Volume, duration, number of OD flows, shape function. (TABLE 1)

Continued..

Results

Continued.. This was the results with the actual data from the network. The Abilene network had 27 anomalies and the above graph shows that the basic method performed the best(7% FPR & 100% TPR) that is misses no anomalies. The second best method cached about 85% of the anomalies with the same FPR. Wavelet method dint reach a FNR=0 even with a huge threshold.

Continued… Now the results with the Synthetic anomaly generator is as follows: 500 anomalies were generated. Duration varied between 5mins to 2 hours. OD flows in consideration were 1-7 per anomaly (randomly chosen) Basic and the Second method performed the best equivalently. Second method doesn’t perform very well in actual data whereas in the synthetic generator it performs well. Reason maybe lie in the properties of the anomalies itself.

Detection Time Comparison Onset of attacks is rapid on the internet so the methods should be fast and have very less detection time.

Continued.. Second method detected 90% anomalies with no lag and Basic method detected 95% of anomalies with no lag Wavelet method takes about half hour for detection of anomalies and hence its too slow and also it dint perform well. It is interesting that the vshift method performs well in the synthetic than the abilene network.

Conclusion and what I feel Interesting granularity level for anomaly detection namely that of TM. Estimation and prediction was interesting as we used wavelets. The idea of filtering normal traffic to analyze residual traffic for anomalies was nice. Detection schemes were tested well as it involves use of both actual data and the synthetic anomaly generator. The anomaly generator got close to depict anomalies like DOS, flash crowds etc. which helped in the validation and evaluation of the four methods well, but then it was not very effective in depicting the attacks like worms. The wavelet based method dint do very well because of the detechtion time involved.