1 D-Trigger: A General Framework for Efficient Online Detection Ling Huang University of California, Berkeley.

1 D-Trigger: A General Framework for Efficient Online Detection Ling Huang University of California, Berkeley

Outline Motivation & Introduction My Decentralized Detection Framework Detection of Network-Wide Anomalies  Centralized Algorithm  Decentralized Detection Summary & Future work

[1] Olston et al. Adaptive Filters for continuous queries over distributed data streams. In ACM SIGMOD (2003). [2] Cormode et al. Sketching Streams Through the Net: Distributed Approximate Query Tracking. In VLDB (2005). Operation Center Traditional Distributed Monitoring Large-scale network monitoring and detection systems  Distributed and collaborative monitoring boxes  Continuously generating time series data Existing research focuses on data streaming [1][2]  Centrally collect, store and aggregate network state  Well suited to answering approximate queries and continuously recording system state  Incur high overhead! Monitor 1 Monitor 2 Monitor 3 Local Network 2 Local Network 3 Local Network 1 Bandwidth Bottleneck! Overloaded!

Detection Problems in Enterprise Network Do machines in my network participate in a Botnet to attack other machines? victim command & control Send at rate.95*allowed to victim X V Operation Center Coordinated Detection! Data: Byte rate on a network link. CAN’T AFFORD TO DO THE DETECTION CONTINUOUSLY !

Detection Problems in Enterprise Network victim V Operation Center Data: Byte rate on a network link. For efficient and scalable detection, push data processing to the edge of network! Approximate, decentralized detection!

Detection Problems in Sensor Network Is there any vehicle traversing my battlefield? Need efficient method for continuous, coordinated detection!

Moving Towards Decentralized Detection Today: Distributed Monitoring & Centralized Computation  Stream-based data collection (push)  Evaluates sophisticated detection function over periodical monitoring data  Doesn’t scale well in numbers of nodes or to smaller timescales – high bandwidth and/or central overhead Tomorrow: Distributed Monitoring & Decentralized Computation  Evaluates sophisticated detection function over continuous monitoring data in a decentralized way  Provides low-overhead, rapid response, high accuracy, and scalability

Research Goals and Achievements Decentralized detection system  Distributed information processing & central decision making  Detects violations/anomalies on micro/small timescale  Open platform to support a wide-range of applications General detection functions  SUM, MIN, MAX, PCA, SVM, TOP-K,…… Operation on general time series  Detection accuracy controllable by a “tuning knob” Provide user-controllable tradeoff between accuracy and overhead  Communication-efficient Minimize communication at given detection accuracy

The System Setup A set of distributed monitors  Each produces a time series signals Dat i (t)  Send minimal information to coordinator  No communication among monitors A coordinator X  Is aggregation, correlation and detection center  Performs detection  X tells monitors the level of accuracy for signal updates

New Ideas for Efficiency and Scalability Local Processing – push tracking capabilities to edges  Monitors do ‘continuous’ monitoring and local processing  Use filtering scheme to decide when to push information to operation center (i.e. only ‘when needed’). Dealing with approximations – algorithms resident at operation center for doing accurate detection in the face of limited data/information A framework for putting the above together in a single system  Implemented as protocols that govern actions between monitors and operation center, and manage adaptivity.

Local Processing by Filtering Filters x “push” Filters x adjust Hosts/Monitors Operation Center Don’t send all the data! Don’t need most of them anyway  Our applications are anomaly detectors  Most of the traffic is normal so don’t need to send. Ideally hosts should only be sending rare events Detection Accuracy

My Decentralized Detection Framework Dat 1 (t) Dat 2 (t) Dat n (t) Detection Function Perturbation Analysis Feedback: Filter Sizes original monitored time series Coordinator Alarms user inputs: detection error Distr. Monitors Mod 1 (t) Mod 2 (t) Mod n (t) Linear Function Queue Overflow Detection Classificati- on Function Outlier Detection

My Dissertation Work: The Components Decentralized Detection Linear Functions, Fixed Thresholds Detection of Cumulative Violations (ICDCS’07) Detection of Inst. Violations (MINENET’06) Sophisticated Detection Detection of Network- Wide Anomalies (NIPS’06, INFOCOM’07) Online Continuous Classification (Ongoing …) Detection of Cumulative Violations (ICDCS’07) Detection of Inst. Violations (MINENET’06) Detection of Network- Wide Anomalies (NIPS’06, INFOCOM’07) Online Continuous Classification (Ongoing …) Principal Component Analysis Support Vector Machine SUM + Top-K Queueing Model

Detection of Network-wide Anomalies The anomaly is a sudden change of link measurement in an Origin-Destination flow  Caused by DDoS, device failure, misconfigs, etc. Given link traffic measurements, detect the volume anomalies H1H1 H2H2 The backbone network Regional network 1 Regional network 2

Observation and correlation across multiple links increases detection capability  Anomalies happen in flows traversing multiple links  Capturing spatial correlation across links makes anomalies stand out Examine the traffic space spanned by all links  Traffic timeseries across different links are highly correlated  Normal traffic can be approximated as occupying a low dimensional subspace  Analysis and detection method is based on Principal Component analysis (PCA) Detection via Correlation

Traffic on Link 1 Traffic on Link 2 Principal Component Analysis (PCA) y Anomalous traffic usually results in a large value of : principal components : minor components Principal components are top eigenvectors of covariance matrix. They form the subspace projection matrices C no and C ab

The PCA Method [3][4] An approach to separate normal from anomalous traffic Normal Subspace : space spanned by the top k principal components Anomalous Subspace : space spanned by the remaining components Then, decompose traffic on all links by projecting onto and to obtain: Traffic vector of all links at a particular point in time Normal traffic vector Residual traffic vector [3] Lakhina et al. Diagnosing Network-Wide Traffic Anomalies. In ACM SIGCOMM (2004). [4] Zhang et al. Network Anomography, In IMC (2005).

Detection Illustration Value of over time (all traffic) over time (SPE) Value of SPE at anomaly time points clearly stand out

m (timestep) n (nodeID) Operation center The Centralized Algorithm The Network Eigen values Eigen vectors Data matrix Dat 1) Each link produces a column of m data over time. 2) n links produces a row data y at each time instance. The detection is: Dat 1 (t)Dat 2 (t)Dat 3 (t)Dat n (t) Periodically Doesn’t scale well to large network or to smaller timescales  The number of monitoring devices may grow to thousands  The anomalies may occur on second or sub-second time scales

Outline Motivation & Introduction My Decentralized Detection Framework Detection of Network-Wide Anomalies  The Centralized Algorithm  The Decentralized Detection Summary & Future work

My In-Network Detection Framework Dat 1 (t) Dat 2 (t) Dat n (t) PCA-Based Detection Perturbation Analysis Feedback: Filter Sizes original monitored time series Mod 1 (t) Mod 2 (t) Mod n (t) Coordinator Alarms Distr. Monitors user inputs: detection error

The Protocol At Monitors Each monitor updates information to coordinator if its incoming signal where (filtering parameters) are adaptively computed by the coordinator can be based on any prediction model built by on its data at an update time  e.g., the average of last 5 signal values observed locally at  Simple but enough to achieve 10x data reduction

The coordinator makes a new row where The Protocol At The Coordinator Perform detection using If big changes in  Update  Compute new

The Tradeoff data(t) 12 9 45 7 2431 63 72 Dat= filtered_data(t) PCA on Dat original constraint PCA on modified constraint Difference? The bigger the filtering parameter  i, the less the communication overhead, but the more the detection error!

Parameter Design and Error Control (I) Users specify an upper bound on detection error, then we determine the monitor parameters  ’s Data vs. Model Monte Carlo and fast binary search Stochastic Matrix Perturbation Theory An algorithmic analysis for the tradeoff between detection accuracy and data communication cost  monitor parameters determined from the target accuracy  technical tool relies on stochastic matrix perturbation theory

Let and are eigenvalues of the covariance matrices and Define the perturbation matrix Define the eigen error From matrix perturbation theory, we have So the key point is to estimate in terms of parameters  ’s Parameter Design and Error Control (II)

Eigen-Error   Monitor Slacks  ’s Where:,, n is number of monitors and m is the number of data points. The coordinator has all information to compute  i ’s for all monitors (Huang’06) reasonable

Detection Error   Eigen-Error  Basic idea: study how eigen error impacts detection error (Huang’07)  With full data, false alarm rate is  With approximate data, we only have perturbed version  Given eigen error, we can compute the false alarm rate (though not in closed-form solution) Inverse dependency: given desired false alarm rate, we can determine tolerable eigen error by fast binary search

Evaluation Given a tolerable deviation of false alarm rate, we can determine system parameters Using system parameters, we can evaluate the actual detection accuracy using data- driven emulation Experiment setup  Abilene backbone network data  Traffic matrices of size 1008 X 41  Set uniform slack for all monitors

Result  Missed DetectionsFalse AlarmsData Reduction Week 1Week 2Week 1Week 2Week 1Week 2 0.01000075%70% 0.03011082%76% 0.06010090%79% Data Used: Abilene traffic matrix, 2 weeks, 41 links. error tolerance = upper bound on error vs centralized approach

Outline Motivation & Introduction My Decentralized Detection Framework Detection of Network-Wide Anomalies  The Centralized Algorithm  The Decentralized Detection Summary & Future work

Key Contributions (1/2) Designed decentralized detection systems that scale well  You don’t need all the data!  Can do detection with 80+% less than others  Enable the detection on very small time scales Provable mathematical guarantees on errors  Detection accuracy is always provable mathematically

Key Contributions (2/2) General Framework for continuous online detection  For wide-range of applications  Framework is broad: Distributed information processing  Filtering, prediction, adaptive learning, … Central decision making  Sum, PCA and Top-k, SVM Classification, … Constraint definition  Fixed value, threshold function, advanced statistics, … Adaptive system  Algorithms guide the tradeoff between comm. overhead and detection accuracy

Capabilities and Future Work ApplicationConstraintQuery Hot-spot detection on server farm(web, DNS) On sum of workload or traffic rate SUM PCA anomaly detection On quadratic function of traffic volumes PCA SVM classificationOnline classification FUTURE Efficient detection system for operational network Machine learning in system research ……

My Other Research Work Tapestry: Scalable and Resilient Peer-to-Peer network infrastructure (JSAC’04)  System implementation and wide-area deployment  Novel network applications on Tapestry Efficient mobility via overlay indirection ( IPTPS’04 ) Fault-tolerance routing ( ICNP’03 ) Collaborative spam filtering ( Middleware’03 ), Landmark routing on overlay networks (IPTPS’02)

Reference [Huang’07] Communication-Efficient Tracking of Distributed Cumulative Triggers. L. Huang, M. Garofalakis, A. Joseph and N. Taft. To appear in ICDCS’07. [Huang’07] Communication-Efficient Online Detection of Network-Wide Anomalies. L. Huang, X. Nguyen, M. Garofalakis, J. Hellerstein, M. Jordan, A. Joseph and N. Taft. To appear in INFOCOM'07. [Huang’06] In-Network PCA and Anomaly Detection. L. Huang, X. Nguyen, M. Garofalakis, M. Jordan, A. Joseph and N. Taft. In NIPS 19, 2006. [Huang’06] Toward Sophisticated Detection with Distributed Triggers. L. Huang, M. Garofalakis, A. Joseph and N. Taft. In MineNet 2006. Questions

39 Backup Slides

Distributed Trigger Contributions Continuous, online detection and triggering  Provides decentralized data processing and management  Detects and reacts to constraint violations  Detects anomalies on micro/small timescale Practical model for specifying detection accuracy High performance gain  High accuracy with 10x reduction in comm. overhead

Dealing with approximation Intuition: the operation center has a “approximate” view of the global data  Approximations can lead to errors (false positive/negatives)  We want to bound these errors  Tradeoff accuracy and communication cost – we make this tunable Use different statistical analysis tools to explore the tradeoffs and achieve the bounds  Idea: use matrix perturbation theory when global condition is captured by matrix data  Idea: use queueing theory in which queues are used to measure the size of the “violation”

Efficient Detection of Network- Wide Anomalies Trigger an alarm at each time t when (NIPS’06, INFOCOM’07) StatisticsThreshold

An Illustration Observed network link data = aggregate of application-level flows Each link is a dimension Unobserved anomalies in flow data Finding anomalies in high-dimensional, noisy data is difficult !

Efficient Detection of Distributed Cumulative Violations Detection of volume-based anomalies; trigger an alarm when overflows (ICDCS’07)

Sum Problem Setup Constraints on aggregate conditions on subset of nodes  Accrue volume penalty when value exceeds threshold C  Fire trigger whenever volume penalty exceeds error tolerance  Aggregate function  Current work supports simple queries Focus on SUM and AVG here Extending to MIN, MAX, TOP-K as ongoing work  Future work to support general and complex functions

Applications Distributed SUM exceeding threshold  Hot-spot detection for server farm Trigger an alarm if any set of 20 servers have workload more than 80% of the workload of the total 100 servers  Botnet detection for enterprise network Trigger an alarm if the packet rate from all hosts in my network to a destination exceeds 200 MB/second Cumulative (persistent) violation  Sever load are spiky, and it makes more sense to look at persistent overload over time  Network traffic is shaped and routed according to queueing model  In environment monitoring, radiation exposure are accumulated over time

Problem Statement User Inputs:  Constraint violation threshold: C  Tolerable error zone around constraint:   Tolerable false alarm rate:   Tolerable missed detection rate:  GOAL: fire trigger whenever penalty exceeds error tolerance  with required accuracy level  AND with minimum communication overhead (monitor updates)

Let V(t,  ) be size of penalty, at time t, over past window  Instantaneous violation [5][6] Fixed-window violation Cumulative violation 4 Three Types of Violations for a any  in [1, t] for a user given fixed  >  <  >  Key point: One needs to find a special  * which maximize V(t,  ) 1 5 [5] Keralapura et al. Communication-efficient distributed monitoring of thresholded counts. In ACM SIGMOD (2006). [6] Sharfman et al. A geometric approach to Monitoring threshold functions over distributed data streams. In ACM SIGMOD (2006). t C C+  >  <  > 

Detection of Cumulative Violation Key insight: Cumulative trigger is equivalent to a queue overflow problem The centralized queuing model Value, penalty and queue Trigger fires! Dat 1 (t) Dat 2 (t) Dat n (t)

Outline Motivation & Introduction Detection of Network-Wide Anomalies  The Centralized Algorithm  The Decentralized Detection Detection of Distributed Cumulative Violation  The Problem Definition  The Decentralized Detection Summary & Future work

Aggreg./ Queueing My In-Network Detection Framework Dat 1 (t) Dat 2 (t) Dat n (t) Constraint Checking Adjust Filter Slacks original monitored time series Mod 1 (t) Mod 2 (t) Mod n (t) Coordinator Alarms user inputs: C,  Distr. Monitors

Mod 1 (t) Mod 2 (t) Mod n (t) Dat 1 (t) Dat 2 (t) Dat n (t) The Distributed Queuing Model Distributed queuing model for cumulative triggers (b) Queue-based filtering (a) Distributed queuing model under-estimate over-estimate  1,...,  n : monitor queue size;   coordinator queue size Number of TCP Requests Dat 1 (t) Dat 2 (t) Dat n (t)

Queuing Analysis: The Model Each input is decomposed into two parts  Continuous enqueuing with rate  Discrete enqueuing/dequeuing with size How is the detection behavior of solution model different from centralized model? (a) The centralized model(b) The Distributed solution model

Queuing Analysis: Missed Detection The centralized model overflows … The solution model does not overflow! Queueing model and assumptions! Tell people there is one equation.

Queuing Analysis: False Alarm The solution model overflows! The centralized model does not overflow …

Results: Model Validation Desired vs. achieved detection performance  missed detection rate  false alarm rate Achieved   and  are always less than desired  and  indicating that analytical model find upper bounds on the detection performance. 0.10.010.0070.020.010 0.10.020.0000.020.008 0.10.020.0000.040.011 0.20.010.0000.020.016 0.20.020.0000.020.013 0.20.020.0000.040.020   DesiredAchievedDesiredAchieved

Results: The Tradeoff Parameters design and tradeoff between false alarm, missed detection and communication overhead Error tolerance  = 0.2C Overhead = # of messages sent / total # of monitoring epochs False Alarm Rate Missed Det. Rate False Alarm Rate Missed Det. Rate

Detection Illustration Value of over time (all traffic) over time (SPE) Value of SPE at anomaly time points clearly stand out

My Distributed Processing Approach User provides:  0-1 detection function (SUM, MAX, PCA, …)  Target accuracy level My approach provides:  a communication-efficient framework  a distributed protocol for in-network decision making  An algorithmic analysis for the tradeoff between detection accuracy and data communication cost monitor parameters determined from the target accuracy technical tool relies on stochastic matrix perturbation theory

Let and are eigenvalues of the covariance matrices and Define the perturbation matrix Define the eigen error From matrix perturbation theory, we have So the key point is to estimate in terms of slaks  ’s Parameter Design and Error Control (II)

Eigen-Error   Monitor Slacks  ’s Where:,, n is number of monitors and m is the number of data points. The coordinator has all information to compute  i ’s for all monitors (Huang’06) reasonable

Detection Error   Eigen-Error  Basic idea: study how eigen error impacts detection error (Huang’07)  With full data, false alarm rate is  With approximate data, we only have perturbed version  Given eigen error, we can compute the false alarm rate (though not in closed-form solution) Inverse dependency: given desired false alarm rate, we can determine tolerable eigen error by fast binary search

Data Acquisition with Statistical Prediction Prediction model can be any of: 1) Last value, 2) Simple averaging, 3) ARMA, 4) Multi-level prediction, 5) Kalman filtering, etc. Is update available from monitors? No, request a prediction Aggregation/ Queuing Prediction value Update value Yes Calibration Is prediction outside slack bound? Streaming Source Prediction Model update to coordinator Yes Calibration No, drop the data _ The Dual-Module Data Acquisition Mechanism Prediction Model Monitor Coordinator

Let and We have: Standard assumptions on the filtering error matrix W: Eigen Error   Monitor Slacks  ’s (I)

Detection Error   Eigen-Error  (I) Consider normalized random variable For approximate data, we only obtain Let denote an upper bound on The deviation of false alarm rate can be approximate as The upper bound of false alarm rate is

My In-Network Detection Framework Anomaly User inputs Original monitored time series Processed time series Distr. Monitors Coordinator

Parameter Design and Error Control (I) Users specify an upper bound on detection error, then we determine the monitor slacks  ’s Perturbation analysis: from detection error to monitor slacks

Detection error   Eigen error  (Huang’07)  Study how eigen error impacts detection error  Inverse dependency: given detection error, we can determine tolerable eigen error by fast binary search Eigen-Error   Monitor Slacks  ’ s (Huang ’ 06) The coordinator has all the information to compute slacks for monitors Parameter Design and Error Control (II)

Results (II) Monitor slacks, communication cost and detection error

Parameter Design and Error Control (I) Given upper bound of false alarm , determine the monitor slacks  ’s Perturbation analysis: from deviation of false alarm to monitor slacks

Problem Space and Current Status Query support Quantile, Entropy, Hist., … Sliding-window Triggers SUM, AVG, MIN, MAX Cumulative Triggers Multi-level P2P Distributed One-level Violation Types Yes No Instantaneous Triggers … … … … ……

Outline Motivation & Introduction Detection of Network-Wide Anomalies  The Centralized Algorithm  The Decentralized Detection Detection of Distributed Cumulative Violation  The Problem Definition  The Decentralized Detection Summary & Future work

Coordinator simulates a virtual queue of size Getting an update the coordinator  En-queues with (chunks) and (rate)  Fires the alarm if the queue gets full  Resets queue = 0 if queue < 0  Updates parameters Adaptive Protocol for Cumulative Triggers Each monitor simulates a virtual queue of size Whenever its local queue under/over-flows, i.e.,, Monitor  Predicts a new  Updates to coordinator  Resets and repeats virtual queue simulation

Let start the analysis with uniform , which is easy for analysis and is applicable to the non-uniform case We want  as large as possible to reduce the communication overhead However, large  brings large bursts into the system, which requires a large  to absorb the burst Values of  and  are constrained by the error  Using queuing theory, we can analyze the overflow probability of the queue, thus determining the values of  and  Queuing Analysis: The Setup

Adaptivity and Heterogeneous  ’s Adaptivity Heterogeneous  ’s  After computing, set  Optimal is solved by Olston & Widom using convex optimization approach

Scalability Issues of Centralized Approach As the number of monitoring devices grow (up to hundreds or thousands network data features)  central processing site overloaded  certain networks do not overprovision inter-site connectivity When anomalies occur on smaller time scales (down to second or sub-second scales)  “periodic push” has to be applied on second or sub-second scales  the volume of data transmitted through network would explode

1 D-Trigger: A General Framework for Efficient Online Detection Ling Huang University of California, Berkeley.

Similar presentations

Presentation on theme: "1 D-Trigger: A General Framework for Efficient Online Detection Ling Huang University of California, Berkeley."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 D-Trigger: A General Framework for Efficient Online Detection Ling Huang University of California, Berkeley.

Similar presentations

Presentation on theme: "1 D-Trigger: A General Framework for Efficient Online Detection Ling Huang University of California, Berkeley."— Presentation transcript:

Similar presentations

About project

Feedback