End-to-end Anomalous Event Detection in Production Networks

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

Operations Management Forecasting Chapter 4
T T18-03 Exponential Smoothing Forecast Purpose Allows the analyst to create and analyze the "Exponential Smoothing Average" forecast. The MAD.
Moving Averages Ft(1) is average of last m observations
1 In-Network PCA and Anomaly Detection Ling Huang* XuanLong Nguyen* Minos Garofalakis § Michael Jordan* Anthony Joseph* Nina Taft § *UC Berkeley § Intel.
BA 555 Practical Business Analysis
Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.
1 Evaluation of Techniques to Detect Significant Performance Problems using End-to-end Active Network Measurements Les Cottrell, SLAC 2006 IEEE/IFIP Network.
MOVING AVERAGES AND EXPONENTIAL SMOOTHING
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
T T18-05 Trend Adjusted Exponential Smoothing Forecast Purpose Allows the analyst to create and analyze the "Trend Adjusted Exponential Smoothing"
ANOMALY DETECTION AND CHARACTERIZATION: LEARNING AND EXPERIANCE YAN CHEN – MATT MODAFF – AARON BEACH.
EL 933 Final Project Presentation Combining Filtering and Statistical Methods for Anomaly Detection Augustin Soule Kav´e SalamatianNina Taft.
A Signal Analysis of Network Traffic Anomalies Paul Barford, Jeffrey Kline, David Plonka, and Amos Ron.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Fall, 2012 EMBA 512 Demand Forecasting Boise State University 1 Demand Forecasting.
Radial Basis Function Networks
A Signal Analysis of Network Traffic Anomalies Paul Barford with Jeffery Kline, David Plonka, Amos Ron University of Wisconsin – Madison Summer, 2002.
© 2003 Prentice-Hall, Inc.Chap 12-1 Business Statistics: A First Course (3 rd Edition) Chapter 12 Time-Series Forecasting.
Constant process Separate signal & noise Smooth the data: Backward smoother: At any give T, replace the observation yt by a combination of observations.
Traffic modeling and Prediction ----Linear Models
Winter’s Exponential smoothing
1 Using Netflow data for forecasting Les Cottrell SLAC and Fawad Nazir NIIT, Presented at the CHEP06 Meeting, Mumbai India, February
© 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual.
DAVIS AQUILANO CHASE PowerPoint Presentation by Charlie Cook F O U R T H E D I T I O N Forecasting © The McGraw-Hill Companies, Inc., 2003 chapter 9.
Time Series Analysis and Forecasting
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 20 Time Series Analysis and Forecasting.
Automating Analysis of Large-Scale Botnet Probing Events Zhichun Li, Anup Goyal, Yan Chen and Vern Paxson* Lab for Internet and Security Technology (LIST)
Time Series Analysis and Forecasting. Introduction to Time Series Analysis A time-series is a set of observations on a quantitative variable collected.
BUAD306 Chapter 3 – Forecasting.
Model Based Event Detection in Sensor Networks Jayant Gupchup, Andreas Terzis, Randal Burns, Alex Szalay.
Tutorial I: Missing Value Analysis
1 BABS 502 Moving Averages, Decomposition and Exponential Smoothing Revised March 14, 2010.
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
DEPARTMENT OF MECHANICAL ENGINEERING VII-SEMESTER PRODUCTION TECHNOLOGY-II 1 CHAPTER NO.4 FORECASTING.
Forecast 2 Linear trend Forecast error Seasonal demand.
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
T T18-02 Weighted Moving Average Forecast Purpose Allows the analyst to create and analyze the "Weighted Moving Average" forecast for up to 5.
Welcome to MM305 Unit 5 Seminar Dr. Bob Forecasting.
Yandell – Econ 216 Chap 16-1 Chapter 16 Time-Series Forecasting.
Lecture 9 Forecasting. Introduction to Forecasting * * * * * * * * o o o o o o o o Model 1Model 2 Which model performs better? There are many forecasting.
Keller: Stats for Mgmt & Econ, 7th Ed
Short-Term Forecasting
Regression Analysis AGEC 784.
Deep Feedforward Networks
Monitoring Persistently Congested Internet Links
The CALgorithm for Detecting Bandwidth Changes
Linear Regression.
Forecasting Methods Dr. T. T. Kachwala.
Data Mining: Concepts and Techniques
BOF Discussion: Uploading IEPM-BW data to MonALISA
“The Art of Forecasting”
Chapter 4: Seasonal Series: Forecasting and Decomposition
Using Netflow data for forecasting
MOVING AVERAGES AND EXPONENTIAL SMOOTHING
Connie Logg, Joint Techs Workshop February 4-9, 2006
Connie Logg February 13 and 17, 2005
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.
End-to-end Anomalous Event Detection in Production Networks
Evaluation of Techniques to Detect Significant Performance Problems using End-to-end Active Network Measurements Mahesh Chhaparia & Les Cottrell, SLAC.
Forecasting Qualitative Analysis Quantitative Analysis.
Exponential Smoothing
Forecasting Network Performance
Product moment correlation
The CALgorithm for Detecting Bandwidth Changes
Jia-Bin Huang Virginia Tech
Lecture 16. Classification (II): Practical Considerations
Forecasting Plays an important role in many industries
Exponential Smoothing
Presentation transcript:

End-to-end Anomalous Event Detection in Production Networks Les Cottrell, Connie Logg, Felipe Haro, Mahesh Chhaparia (SLAC), Maxim Grigoriev (FNAL), Mark Sandford (Loughborough University) Site Visit by Thomas Ndousse April 27, 2005 http://www.slac.stanford.edu/grp/scs/net/talk05/anomaly-apr05.ppt Network administrators and others need ways to be notified when there are significant, persistent anomalous changes in network performance that may require intervention. We have successfully implemented techniques (plateau algorithm and Kolmogorov-Smirnov) to find such step changes in time series measurements of network performance. However, if there are large seasonal changes (e.g. day/night, weekday/weekend) then the number of false positives or misses can be larger than desirable. We are thus working with FNAL to evaluate seasonal effect algorithms (e.g. Holt-Winters) on real production network performance measurements. We will present results on the performance of such algorithms as well as plans to evaluate methods to look at multiple metrics (e.g. capacity, available bandwidth, RTT, multiple paths) simultaneously using Principal Component Analysis. Partially funded by DOE/MICS for Internet End-to-end Performance Monitoring (IEPM)

Outline Why? Input data How? Results Conclusions & Futures First approaches The real world Results Conclusions & Futures

Uses of Techniques Automated problem identification: Alerts for network administrators, e.g. Bandwidth changes in time-series, iperf, SNMP Alerts for systems people OS/Host metrics Anomalies for security Forecasts (are a fallout of the techniques) for Grid Middleware, e.g. replica manager, data placement

Data Uses packet pair dispersion of 20 packets to provide: Capacity, X-traffic, available bandwidth At 3 minute intervals Very noisy time series data Moving averaged over 1 hour Capacity

Plateau, most intuitive Each observation: If outside history buffer mean mh ± b*sh then add to trigger buffer Else add to history, and remove oldest from trigger buffer When trigger buffer > t points then trigger issued Check if (mh - mt) / mh > D & 90% trigger in last T mins then have trigger Move trigger buffer to history buffer Observations Event * = history length = 1 day, t = trigger length = 3 hours = standard deviations = 2 We set the history buffer length to one day in order to minimize the lag between the history mean and the observations due to diurnal changes. Trigger % full History mean History mean – 2 * stdev

K-S For each observation: for the previous 100 observations with next 100 observations Compare the vertical difference in CDFs How does it differ from random CDFs Expressed as % difference The trigger buffer reporting the event well after start of step down is partially an artifact. It could for example report the time of the start of the event as say when the trigger buffer reached 10% full. However, K-S is still more accurate in defining the time when the change was greatest. Compare K-S with Plateau

Compare Results between K-S & plateau very similar, using K-S threshold = 70% Current plateau only finds negative changes Useful to see when condition returns to normal K-S implemented in C and executes faster than Plateau (in Perl), depends on parameters K-S more formalized Plateau and K-S work well for non seasonal observations (e.g. small changes day/night) Plateau takes about 14 mins for 100 days H-W FNAL takes 7 mins K-S takes 1.08 min on +- 100 points 3.07 min on +- 200 points 14.75 min on +- 400 points

Seasons & false alerts Congestion on Monday following a quiet weekend causes a high forecast, gives an alert Also a history buffer of not a day causes History mean to be out of sync with observations

Effect on events Change in bandwidth (drops) between 19:00 & 20:00 causes more anomalous events around this time

Seasonal Changes Use Holt-Winters (H-W) technique: Uses triple exponential weighted moving average EWMA(i) = Obs(i) * a + (1-a) * EWMA(i-1) Three terms each with its own parameter (a, b, ) that take into account local smoothing, long term seasonal smoothing, and trends The trend component for our data is flat.

Example Local smoothing 99% weight for last 24 hours Linear trend 50% last 24 hours Seasonal 99% for last week Within an 80 minute window, 80% points outside deviation envelope ≡ event Observations Deviations are smoothed absolute residuals Note the difference in weekend vs weekday Deviations Forecast Weekend Weekdays

Evaluation Created a library of time series for 100 days from June through Sep 2004 for 40 hosts Analyzed using Plateau and saved all events where trigger buffer filled (no filters on size of step) 23 hosts had 120 candidate events Event types: steps; diurnal changes; congestion from cron jobs, bandwidth tests, flash crowds Classify ~120 events as to whether interesting Large, sharp drop in bandwidth, persist for >> 3hrs Plateau easiest to understand and tune etc. also first to be developed. Classification is subjective, large (mh-mt)/mh> 10%, also looked at 30%, step occurs in <4 hours

Results K-S shows similar results to Plateau As adjust parameters to reduce false positives then increase missed events E.g. for plateau with trigger buffer = 3 hrs filled to 90% in < 220 minutes, history buffer=1 day, effect of threshold D=(mh-mt)/mh Plateau (b=2) K-S with ± 100 observations D False Miss 10% 16% 8% 30% 2% 32%

Conclusions A few paths (10%) have strong seasonal effects Plateau & K-S work well if only weak seasonal effects K-S detects both step downs & up, also gives accurate time estimate of event (good for correlations) H-W promising for seasonal effects, but Is more complex, and requires more parameters which may not be easy to estimate Requires regular data (interpolation step) CPU time can depend critically on parameters chosen, e.g. increasing K-S range from ±100 to say ±400 increases CPU time by factor 14 H-W works, still need to quantify its effectiveness Looking at PCA to evaluate multiple metrics simultaneously (e.g. fwd & bwd traffic, RTT, multiple paths) AND multiple paths

Future Work Improve the event detection technique for Holt-Winters (HW) method. We tried to apply KS on the residuals of HW Technique, but this does not seem to come up well. Next we plan to apply plateau on the residuals on HW. Future Development in PCA Enable looking at multiple measurements simultaneously E.g. RTT, loss, capacity …; multiple routes Neural networks to interpolate heavyweight/infrequent measurements based on light weight more frequent

More information SLAC Plateau implementation www.acm.org/sigs/sigcomm/sigcomm2004/workshop_papers/nts26-logg1.pdf SLAC H-W implementation www-iepm.slac.stanford.edu/monitoring/forecast/hw.html Eng. Statistics Handbook http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc435.htm Comparison Between Mark Burgess Method & KS http://www-iepm.slac.stanford.edu/monitoring/forecast/ksvsmb/ksvsmb.htm

Diurnal Variation People arriving at work between 19:00 & 20:00 PDT (7:00 & 8:00 PK time) cause sudden drop in dynamic capacity

H-W Implementation Need regularly spaced data (else going back one season is difficult, and gets out of sync): Interpolate data: select bin size Average points in bin If no points in first week bin then get data from future weeks For following weeks, missing data bins filled from previous week Initial values for smoothing from NIST “Engineering Statistics Handbook” Choose parms by minimizing (1/N)Σ(Ft-yt)2 Ft=forecast for time t as function of parameters, yt = observation at time t A week is special and defines a cycle of seasons. We do nothing special with the day. Note we need a weeks worth of data to get going

H-W Implementation Three implementations evaluated (two new) FNAL (Maxim Grigoriev) Inspiration for evaluating this method Part of RRD (Brutlag) Limited control over what it produces and how it works SLAC Implemented NIST formulation, different formulation/parameter values from Brutlag/FNAL, also added minimize sums of squares to get parms

Events Can look at residuals (Ft – yt), or Χ2 Could use K-S or plateau on: residuals, or on the local smoothing (i.e. after removing long term seasonal effects)

Mark Burgess Method A two dimensional time-series approach in order to classify a periodic, adaptive threshold for service level anomaly detection An iterative algorithm is applied to history analysis on this periodic time to provide a smooth roll-off in the significance of the data with time. This method was originally designed to detect anomalous behavior on a single host.

Compare with KS Mark Burgess technique detects the anomalies for Iperf from SLAC to Caltech – Feb & Mar 05 KS-Result KS Technique works Very well for the long Term anomalous Variations in internet End-to-end traffic. Mark Burgess technique detects the anomalies for each and every Unwanted huge spikes/variation (Real Time) Mark Burgess Tech-Result

PCA PCA is a coordinate transformation method that maps a given set of data points onto new axes. These axes are called the principal axes or principal components. For network anomaly detection PCA divides the data into normal & abnormal subspace Procedure Arrangement of data into matrix form Zero meaning the matrix data Calculating the covariance matrix Calculate principal components Application of the formulae (I-PPT)(data-matrix) yields the result. P is the matrix of Principal Components.

PCA Results on SLAC-BINP (June-Sep, 2004) Due to 10% rise in dbcap Anomalous Good Events 10% rise in RTT Caught all the events that were detected by HW, Plateau and KS Can work on multiple parameters Tested PCA on six routes so far SLAC-FZK, SLAC-DESY, SLAC-CALTECH, SLAC-NIIT, SLAC-BINP, SLAC-UMICH