Download presentation
Presentation is loading. Please wait.
1
1 A New Paradigm For Distributed Monitoring Ling Huang, Minos Garofalakis, Nina Taft and Anthony Joseph hling@cs.berkeley.edu {minos.garofalakis, nina.taft}@intel.com adj@cs.berkeley.edu Sys Lunch ▪ Feb, 2006
2
Outline Introduction & Motivation The Problem Definition Related Work The Proposed Solution The Platform Extensions The Research plan
3
Operation Center Introduction: Network Monitoring Large-scale network monitoring and intrusion detection systems Distributed and collaborative monitoring boxes Continuously generating time series data Existing research focuses on data streaming Collect, store and aggregate network state Monitor and correlate data for trend analysis Well suited to answering approximate queries and continuously recording system state Monitor 1 Monitor 2 Monitor 3
4
The Need for Distributed Triggers Streaming protocol-based approaches suffer from excessive query overhead Always -approximation regardless system conditions Wasting resource if applications only care 0-1 information I aim to design distributed triggering protocols Trigger alarms based on aggregate conditions and threshold Monitoring systems call for a triggering component [Ankur04] Detect and react to constraint violations/system anomalies Maintain system-wide logical predicates/invariants Doesn’t provide guarantee
5
An Typical Example A set of distributed monitors Each produces a time series signals Send filtered version of signals to coordinator No communication among monitors A coordinator X Is aggregation, detection and coordination center Fires trigger upon violations Informs monitors the level of accuracy for signal updates
6
Streaming vs. Triggering Streaming protocols Aim at approximation Accurate system state Rich information for detail analysis Always incur overhead Triggering protocols Aim at detection 0-1 system state Concise information indicating anomalies incur overhead when necessary Provide strong detection guarantee
7
Outline Introduction & Motivation The Problem Definition Related Work The Proposed Solution The Platform Extensions The Research plan
8
Sum Problem Setup Constraints on aggregate Conditions on subset of nodes Accrue penalty when bypass threshold C Fire trigger whenever penalty exceeds error tolerance Aggregate function Current work supports simple queries Focus on SUM and AVG here Extending to MIN, MAX at ongoing work Future work to support general and complex queries
9
Problem Statement User Inputs: Constraint violation threshold: C Tolerable error zone around constraint: Tolerable false alarm rate: Tolerable missed detection rate: GOAL: fire trigger whenever penalty exceeds error tolerance with required accuracy level AND with minimum communication overhead (monitor updates)
10
Let V(t, ) be size of penalty, at time t, over past window Instantaneous violation Fixed-window violation Varying-window violation 4 Three Types of Violations for a any in [1, t] for a user given fixed > < >
11
Detection of Varying-Window Violation Key insight: Varying-window trigger is equivalent to a queue overflow problem The centralized queuing model Value, penalty and queue Trigger fires!
12
The Relationship Between Violation Types General problem – detecting this condition: 1) If is given, it is the fixed-window version 2) If, it is the instantaneous version 3) If is any value, it is the varying-window version Penalty violation independent of time Strong and strict guarantee
13
Proposed Research Distributed triggering system Open platform to support General queries with general constraints SUM, MIN, MAX, Quantile, …… Operation on general time series Controllable detection performance via ( , , ) Communication-efficient Minimize communication at given detection performance Provide flexibility for tradeoff performance with overhead Applying to broad-range of applications
14
Outline Introduction & Motivation The Problem Definition Related Work The Proposed Solution The Platform Extensions The Research plan
15
Related work: Database Data streaming Adaptive filtering from Olston & Widom -accurate answers to simple queries Adaptive local threshold to achieve optimal results Sketching streams from Cormode & Garofalakis -accurate answers to general and complex queries Key difference: I focus on -detection instead of approximation TAG and its follow-on focus on tree-based in-network processing PIER brings DB style queries at Internet scale
16
Related work: Monitoring and Detection Lots of progress in distributed monitoring, profiling and intrusion detection Share information and foster collaboration between distributed boxes Systematic coordination for security operations Little consideration of efficient management of distributed data Provide examples why a triggering tool would be useful
17
Outline Introduction & Motivation The Problem Definition Related Work The Proposed Solution The Platform Extensions The Research plan
18
Key Contributions Achieved The first distributed triggering protocol which Achieves controllable detection performance Minimizes communication-overhead For SUM and AVG queries Mathematical definition of distributed triggering problem Queuing framework, analytical solution and probabilistic guarantee for varying-window triggers Adaptive protocol and deterministic guarantee for instantaneous and fixed-window triggers System implementation of inst. and varying-win. triggers; deployment and evaluation on PlanetLab
19
Problem Space and Current Status Query support Quantile, Entropy, Hist., … Fixed-window Triggers SUM, AVG, MIN, MAX Varying-window Triggers Multi-level P2P Distributed One-level Violation Types Yes No Instantaneous Triggers … … … … ……
20
Distributed Trigger Tracking Framework Alarms User inputs Original monitored time series Filtered time series Distr. Monitors Coordinator 2
21
Solution Overview Minimize communication cost by: Having monitors send as few updates as possible Carefully managing the discrepancy between the coordinator’s view of the global state and the actual global state Providing the coordinator with an accurate enough view so that it fires the trigger with prescribed accuracy Key idea Filter monitored signal, don’t send an update unless surprising change has occurred When far away from trigger threshold, monitors can afford to be less accurate. Coordinator informs them when they can do this, and by how much.
22
22 1) Varying-window Triggers Fire an alarm when overflows
23
The Distributed Queuing Model Distributed queuing model for varying-window triggers (b) Queue-based filtering (a) Distributed queuing model under-estimate over-estimate 1,..., n : monitor queue size; coordinator queue size Number of TCP Requests
24
Coordinator simulates a virtual queue of size Getting an update, coordinator Dequeues, where is the time elapse since last update Enqueues or dequeues Updates Fires the alarm if the queue gets full If necessary, re-computes queue parameters Adaptive Protocol for Varying-win. Triggers Each monitor simulates a virtual queue of size Whenever its local queue under/over-flows, i.e.,, Monitor Predicts a new Updates to coordinator Resets and repeats virtual queue simulation
25
Queuing Analysis: The Model Each input is decomposed into two parts Continuous enqueuing with rate Discrete enqueuing/dequeuing with size How is the detection behavior of solution model different from centralized model? (a) The centralized model(b) The Distributed solution model
26
Let start the analysis with uniform, which is easy for analysis and is applicable to non-uniform case We want as large as possible to reduce communication overhead However, large brings large burst in the system, which requires a large to absorb the burst Certainly, value of are constrained by the error tolerance Using queuing theory, we can analyze the overflow probability of the queue, thus determining the values of Queuing Analysis: The Setup
27
Queuing Analysis: Missed Detection The centralized model overflows … The solution model does not overflow!
28
Queuing Analysis: False Alarm The centralized model does not overflow … The solution model overflows!
29
Adaptivity and Heterogeneous ’s Adaptivity Heterogeneous ’s After computing, set Optimal is solved by Olston & Widom using convex optimization approach
30
Results for Varying-Window Triggers Desired vs. achieved detection performance miss detection rate false alarm rate Achieved and * are always less than target and indicating that analytical model find upper bounds on the detection performance.
31
Results for Varying-Window Triggers Parameters design and tradeoff between false alarm, missed detection and communication overhead Error tolerance = 0.2C Overhead = # of messages sent / total # of monitoring epochs
32
32 2) Instantaneous Triggers Fire an alarm if
33
Each monitor updates information to coordinator if where is determined by coordinator Adaptive Protocol for Inst. Triggers Coordinator X check in which global slack is adaptively computed and optimally split for monitors Simply setting is the data streaming approach
34
Results for Instantaneous Triggers Comm. cost when comparing to existing approaches our schemes
35
We guarantee a around threshold C The Detection Performance Guarantee band of uncertainty Theorem: the described protocol guarantees (1) always fires if (2) never fires if Key decision: Tradeoff between communication cost and triggering performance
36
Benefit of Adaptive Global Slack Input signals Adaptive global slack Fixed global slack band of uncertainty Key observation: Adaptive slack is substantially larger than fixed slack
37
Outline Introduction & Motivation The Problem Definition Related Work The Proposed Solution The Platform Extensions The Research plan
38
Extensions Platform Probabilistic guarantee for instantaneous and fixed-window triggers Supporting general queries with general constraints Applications Distributed workload alarming system Coordinated end-host profiling & detection system
39
Extensions Platform Probabilistic guarantee for instantaneous and fixed-window triggers Supporting general queries with general constraints Applications Distributed workload alarming system Coordinated end-host profiling & detection system
40
dstIP protocolID srcIP srcPort dstPort dstIP State-of-the-Art of Profiling & Detection Profiling network traffic at gateway using entropy Initial success with entropy metrics on packet headers Have not been applied to end- host profiling Profiling end-hosts using graphlets Anomalies show up as distinct perturbations in the graph Initial success in detecting scanning, DDoS, ICMP attacks, web service attacks. srcIPprotocolIDdstIPsrcPortdstPortdstIP
41
Coordinated End-Host Profiling & Detection Limitations of graphlet model: Graphlets currently do not support time series Interaction between host & group profiles is thin Integrating end-host profiling with triggering system to enable coordinated detection Build time series profiles to facilitate anomaly detection Extend profiling systems by providing underlying triggering support Identifying new functionalities for triggering system How can profiling for security be improved? How should triggering system be extended?
42
Outline Introduction & Motivation Related Work The Problem Definition The Proposed Solution The Platform Extensions The Research plan
43
The Research Plan Complete solution for simple queries (month 0-3) Providing probabilistic guarantee for inst. triggers Supporting triggers with min, max operation Applications (month 4-8) End-host profiling to facilitate anomaly detection Triggering on profiles to enable coordinated detection Solution to support complex queries (month 9-12) Sketching techniques Prediction models Write dissertation and apply for jobs (month 12-18)
44
http://www.cs.berkeley.edu/~hling/ hling@cs.berkeley.edu hling@cs.berkeley.edu Thank You!
45
45 Backup Slides
46
Handle Data Loss: Overview Local filtering is data loss! Data loss due to Filtering (voluntarily) Network delay (involuntarily) Network congestion (involuntarily) Mechanism Qos Priority delivery for monitoring data Small bandwidth consumption and is affordable Statistical estimation Data interpolation and extrapolation Dual prediction model at both monitors and coordinator
47
Data Acquisition with Statistical Estimation Prediction model can be any of: 1) Last value, 2) Simple averaging, 3) ARMA, 4) Multi-level prediction, 5) Kalman filtering, etc. Is update available from monitors? No, request a prediction Aggregation/ Queuing Prediction value Update value Yes Calibration Is prediction outside slack bound? Streaming Source Prediction Model update to coordinator Yes Calibration No, drop the data _ The Dual-Module Data Acquisition Mechanism Prediction Model Monitor Coordinator
48
Handle Network Failure Detect failure Heart beat to keep alive Handle failure Multiple paths to coordinator Multiple coordinators Backup coordinator Different triggers on different coordinators P2P protocol to maintain resilient topology P2P has embedded tree P2P gracefully handles node join and leaving P2P can exploit alternative path for fault-tolerance routing
49
A Paradox Triggering protocol uses more resource when system at critical state, in which less resource is available Separate resource for monitoring data and normal traffic When system is persistently in critical state, coordinator tells monitors that they should not update information unless their states change substantially
50
50 3) Fixed-window Triggers Fire an alarm for a given if
51
The transformation Let’s define Then So, protocols for instantaneous triggers work for fixed- window triggers
52
Framework for Fixed-window Triggers Window-based local sum, then filtering ……
53
Examples Enterprise security operations Distributed monitors are IDS boxes Coordinator for global log repository and analysis inside security operations center. ISP IT teams Monitors on each link Network operation center which pulls data for detection of hot spots, failures, attacks, and check when upgrades needed. Monitoring time series can be Number of TCP requests Number of DNS transactions Traffic volume per port 80 ……
54
Large enables large local smoothing to reduce the communication cost. However it may absorb too much update “space”, thus causing missed detection make the system globally bursty, thus causing false alarms Missed detection happens when the queue in the centralized model overflows (real violation), but our queue in the solution model does not (no alarm) False alarm happens when queue in the centralized model does not overflows (no violation), but the queue in the solution model overflows (fires alarm) Queuing Analysis: Some Intuitions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.