Download presentation
Presentation is loading. Please wait.
Published byAmberlynn Cameron Modified over 9 years ago
1
1 D-Trigger: A General Framework for Efficient Online Detection Ling Huang* XuanLong Nguyen* Minos Garofalakis ◊ Joe Hellerstein* Michael Jordan* Anthony Joseph* Nina Taft § *UC Berkeley § Intel Research ◊ Yahoo! Research
2
2 Overview slide DC spec compiler logical config physical config Policy-aware Switching Layer monitoring data Ruby on Rails interpreter VM monitor local OS functions Trace/data collection web svc APIs Web 2.0 apps per node SW stack AWE drivers policy verification D-Trigger Trace/data collection D-Trigger monitoring data
3
3 RIOT Framework RADLab Integrated Observation via Tracing Trace Data Instrumented Calls libAsync Applications Distributed Reporting RoRlibLog Distributed Replay Policy Verification Performance XMLdb Queries Instrumented VMs Troubleshooting Augmented Logging Distributed Triggering Real Time Alarms / Actuation libAsynclibLog Instrumented VMs Instrumented Calls RoR Applications Augmented Logging Trace Data Distributed Triggering Distributed Reporting Distributed Replay Performance XMLdb Queries Real Time Alarms / Actuation Troubleshooting Policy Verification
4
4 Outline Motivation & Introduction Our Decentralized Detection Framework Efficient Detection of Cumulative Violations –Problem definition –Decentralized Detection Summary & Future work
5
5 Operation Center Traditional Distributed Monitoring Large-scale network monitoring and detection systems –Distributed and collaborative monitoring boxes Netflow modules, Snort sensors, firewalls, … –Continuously generating time series data Existing research focuses on data streaming –Centrally collect, store and aggregate network state –Well suited to answering approximate queries and continuously recording system state –Incur high overhead! Monitor 1 Monitor 2 Monitor 3 Local Network 2 Local Network 3 Local Network 1 Bandwidth Bottleneck! Overloaded!
6
6 Do machines in my network participate in a Botnet to attack other machines? victim command & control Send at rate.95*allowed to victim X V Operation Center Coordinated Detection! Data: Byte rate on a network link. CAN’T AFFORD TO DO THE DETECTION CONTINUOUSLY ! For efficient and scalable detection, push data processing to the edge of network! Approximate but accurate detection! Detection Problems in Enterprise Network
7
7 Moving Towards Decentralized Detection Today: Distributed Monitoring & Centralized Computation –Stream-based data collection (push) –Evaluates sophisticated detection function over periodical monitoring data –Doesn’t scale well in numbers of nodes or to smaller timescales – high bandwidth and/or central overhead Tomorrow: Distributed Monitoring & Decentralized Computation –Evaluates sophisticated detection function over continuous monitoring data in a decentralized way –Provides low-overhead, rapid response, high accuracy, and scalability
8
8 Decentralized detection system –Distributed information processing & central decision making –Detects violations/anomalies on micro/small timescale –Open platform to support a wide-range of applications General detection functions –SUM, MIN, MAX, PCA, SVM, TOP-K,…… Operation on general time series –Detection accuracy controllable by a “tuning knob” Provide user-controllable tradeoff between accuracy and overhead –Communication-efficient Minimize communication at given detection accuracy Research Goals and Achievements
9
9 Outline Motivation & Introduction Our Decentralized Detection Framework Efficient Detection of Cumulative Violations –Problem Definition –Decentralized Detection Summary & Future work
10
10 The System Setup A set of distributed monitors –Each produces a time series signals Dat i (t) –Send minimal information to coordinator –No communication among monitors A coordinator X –Is aggregation, correlation and detection center –Performs detection –X tells monitors the level of accuracy for signal updates
11
11 New Ideas for Efficiency and Scalability Local Processing – push tracking capabilities to edges –Monitors do ‘continuous’ monitoring and local processing –Use filtering scheme to decide when to push information to operation center (i.e. only ‘when needed’). Dealing with approximations – algorithms resident at operation center for doing accurate detection in the face of limited data/information A framework for putting the above together in a single system –Implemented as protocols that govern actions between monitors and operation center, and manage adaptivity.
12
12 Local Processing by Filtering Filters x “push” Filters x adjust Hosts/Monitors Operation Center Don’t send all the data! Don’t need most of them anyway –Our applications are anomaly detectors –Most of the traffic is normal so don’t need to send. Ideally hosts should only be sending rare events Detection Accuracy
13
13 Our Decentralized Detection Framework Dat 1 (t) Dat 2 (t) Dat n (t) Detection Function Perturbation Analysis Feedback: Filter Sizes original monitored time series Coordinator Alarms user inputs: detection error Distr. Monitors Mod 1 (t) Mod 2 (t) Mod n (t) Linear Function Queue Overflow Detection Classificati- on Function Outlier Detection
14
14 Outline Motivation & Introduction Our Decentralized Detection Framework Efficient Detection of Cumulative Violations –Problem Definition –Decentralized Detection Summary & Future work
15
15 Efficient Detection of Distributed Cumulative Violations Detection of volume-based anomalies; trigger an alarm when overflows (ICDCS’07)
16
16 Problem Setup Constraints on aggregate conditions of subset of nodes –Accrue volume penalty when value exceeds threshold C –Fire trigger whenever volume penalty exceeds error tolerance Aggregate function –Current work supports simple queries Focus on SUM and AVG here Extending to MIN, MAX, TOP-K as ongoing work –Future work to support general and complex functions Sum
17
17 Applications Distributed SUM exceeding threshold –Hot-spot detection for server farm Trigger an alarm if any set of 20 servers have workload more than 80% of the workload of the total 100 servers –Botnet detection for enterprise network Trigger an alarm if the packet rate from all hosts in my network to a destination exceeds 200 MB/second Cumulative (persistent) violation –Sever load are spiky, and it makes more sense to look at persistent overload over time –Network traffic is shaped and routed according to queueing model –In environment monitoring, radiation exposure are accumulated over time
18
18 Problem Statement User Inputs: – Constraint violation threshold: C – Tolerable error zone around constraint: – Tolerable false alarm rate: – Tolerable missed detection rate: GOAL: fire trigger whenever penalty exceeds error tolerance –with required accuracy level –AND with minimum communication overhead (monitor updates)
19
19 Let V(t, ) be size of penalty, at time t, over past window Instantaneous violation [3][4] Fixed-window violation Cumulative violation 4 Three Types of Violations for a any in [1, t] for a user given fixed > < > Key point: One needs to find a special * which maximize V(t, ) 1 5 t C C+ > < >
20
20 Detection of Cumulative Violation Key insight: Cumulative trigger is equivalent to a queue overflow problem The centralized queuing model Dat 1 (t) Dat 2 (t) Dat n (t)
21
21 Outline Motivation & Introduction Our Decentralized Detection Framework Efficient Detection of Cumulative Violations –Problem Definition –Decentralized Detection Summary & Future work
22
22 Aggreg./ Queueing Our In-Network Detection Framework Dat 1 (t) Dat 2 (t) Dat n (t) Constraint Checking Adjust Filter Slacks original monitored time series Mod 1 (t) Mod 2 (t) Mod n (t) Coordinator Alarms user inputs: C, Distr. Monitors
23
23 Mod 1 (t) Mod 2 (t) Mod n (t) Dat 1 (t) Dat 2 (t) Dat n (t) The Distributed Queuing Model Distributed queuing model for cumulative triggers (b) Queue-based filtering (a) Distributed queuing model under-estimate over-estimate 1,..., n : monitor queue size; coordinator queue size Number of TCP Requests Dat 1 (t) Dat 2 (t) Dat n (t)
24
24 Queuing Analysis: The Model Each input is decomposed into two parts Continuous enqueuing with rate Discrete enqueuing/dequeuing with size How is the detection behavior of solution model different from centralized model? (a) The centralized idea model(b) The Distributed solution model
25
25 Queuing Analysis: Missed Detection The centralized model overflows … The solution model does not overflow! Queueing model and assumptions! Tell people there is one equation.
26
26 Queuing Analysis: False Alarm The solution model overflows! The centralized model does not overflow …
27
27 Experimental Setup Collecting trace data from 200 Planetlab nodes using SNORT sensors Evaluation carried out for using time series of “number of active TCP connections” on 1-minute interval Randomly selecting 40 data streams, each with 4000 data points Threshold C = 90 percentile of the aggregate data
28
28 Results: Model Validation Desired vs. achieved detection performance missed detection rate false alarm rate Achieved and are always less than desired and indicating that analytical model find upper bounds on the detection performance. 0.10.010.0070.020.010 0.10.020.0000.020.008 0.10.020.0000.040.011 0.20.010.0000.020.016 0.20.020.0000.020.013 0.20.020.0000.040.020 DesiredAchievedDesiredAchieved
29
29 Results: The Tradeoff Parameters design and tradeoff between false alarm, missed detection and communication overhead Error tolerance = 0.2C Overhead = # of messages sent / total # of monitoring epochs False Alarm Rate Missed Det. Rate False Alarm Rate Missed Det. Rate
30
30 Outline Motivation & Introduction Our Decentralized Detection Framework Efficient Detection of Cumulative Violations –Problem Definition –Decentralized Detection Summary & Future work
31
31 Summary A communication-efficient framework that –detects anomalies at desired accuracy level –with minimal communication cost A distributed protocol for data processing –local monitors decide when to update data to coordinator –coordinator makes global decision and feedback to monitors An algorithmic framework to guide the tradeoff between detection accuracy and overhead –Detection accuracy is always provable mathematically
32
32 Future Work Efficient detection framework for distributed systems –Extension to more sophisticate classification and detection functions –Development better prediction algorithms –Exploration of spatial correlation among data streams between monitors –Detection on hierarchical topology Going beyond detection –Fault identification, diagnosis, etc.
33
33 http://radlab.cs.berkeley.edu/wiki/Dtrigger Questions
34
34 Backup Slides
35
35 The Components of D-Triggers Decentralized Detection Linear Functions, Fixed Thresholds Detection of Cumulative Violations (ICDCS’07) Detection of Inst. Violations (MINENET’06) Sophisticated Detection Detection of Network- Wide Anomalies (NIPS’06, INFOCOM’07) Online Continuous Classification (Ongoing …) Detection of Inst. Violations (MINENET’06) Detection of Cumulative Violations (ICDCS’07) Detection of Network- Wide Anomalies (NIPS’06, INFOCOM’07) Online Continuous Classification (Ongoing …) Principal Component Analysis Support Vector Machine SUM + Top-K Queueing Model
36
36 Let start the analysis with all monitors having same local queue size easy for analysis, applicable to the non-uniform case We want as large as possible to reduce the communication overhead However, large brings large bursts into the system, which requires a large to absorb the burst Values of and are constrained by the error Using queuing theory, we can analyze the overflow probability of the queue, thus determining the values of and Queuing Analysis: The Intuition
37
37 Key Contributions (1/2) Designed decentralized detection systems that scale well –You don’t need all the data! –Can do detection with 80+% less than others –Enable the detection on very small time scales Provable mathematical guarantees on errors –Detection accuracy is always provable mathematically
38
38 Key Contributions (2/2) General Framework for continuous online detection –For wide-range of applications –Framework is broad: Distributed information processing –Filtering, prediction, adaptive learning, … Central decision making –Cumulative Sum, PCA and Top-k, SVM Classification, … Constraint definition –Fixed value, threshold function, advanced statistics, … Adaptive system –Algorithms guide the tradeoff between comm. overhead and detection accuracy
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.