Download presentation
Presentation is loading. Please wait.
Published byDamian Tyler Modified over 9 years ago
1
RelSamp: Preserving Application Structure in Sampled Flow Measurements Myungjin Lee, Mohammad Hajjat, Ramana Rao Kompella, Sanjay Rao
2
Internet A plethora of Internet applications Objectives Re-provision networks Detect undesirable behaviors of applications Prepare network better against major application trends 2) Measure/Monitor 1) Emergence of new applications 3) Characterization
3
Monitoring applications at an edge Goal: Monitoring application behavior Identify number of flows Identify number of packets Current Solution: Sampled NetFlow Supported by most modern routers Key limitation: Application session structure gets distorted Small # of flows per application session Small # of packets per application session Enterprise Network Edge Router Internet Sampled NetFlow
4
Preserving application structure in flow measurements Benefit 1: Enables continuous monitoring of applications Better understanding about communication patterns Better understanding of characteristics (# of flows, packets) Benefit 2: Application classification becomes easier Statistical machine learning techniques: SVM, C4.5, etc. Social behavior-based classifier: BLINC Benefit 3: Detecting undesirable traffic patterns of an application
5
Contributions Introduce the notion of related sampling Flows belonging to the same application session are sampled with higher probability Propose RelSamp architecture for realizing related sampling Uses three stages of sampling to preserve application structure Show efficacy in preserving application structure Captures more number of flows per application session Significant increase of accuracy in application classification
6
Related sampling App2 App1App3 Original application structure Sampled NetFlow Related sampling Key idea: Sample more flows from fewer application sessions
7
Realizing related sampling Question 1: How to sample an application session ? Question 2: How to sample packets within an application session ?
8
Defining application session A sequence of packets from an application on a given host with inter-arrival time ≤ τ seconds Packets may belong to different flows to different destinations Example 1: BitTorrent connections to several destinations within a short span of time constitute an application session Example 2: Web connections from a browser several seconds apart constitute different application sessions
9
Sampling an application session One possible approach: Similar to Sampled NetFlow Sample packets with some probability Create an application session record if no record exists Update the application session record Problem: Hard to do in an online fashion No application session identifier (like flow key) Need to know all flows that constitute an application session DPI-based techniques are both difficult and incomplete
10
Our approach: sampling hosts Observation: Host is a super-set of an application session Sample more flows from the same host Flows originating at a same host closely in time typically belong to few application sessions About 80% hosts run fewer than 2 applications in our study More details in the paper
11
RelSamp design Three-stage sampling process consisting of host, flow, and packet selection stages Host stage: hash-based sampling No state maintained on a per-application basis Many application sessions for a given host are possibly sampled Change hash function periodically to track different hosts Flow and packet stages: random packet sampling Controls fraction of flows sampled in an application session and packets sampled in a flow Post processing: Can separate flow records into application sessions using port-based/statistical classifiers
12
RelSamp architecture Host-level bias stage Flow-level bias stage Pkt-level bias stage 1 1 Copy PhPh Selection range H(SrcIP) Hash space P h = selection range / hash space PfPf if ( random no. ≤ P f && no flow record) create a flow record PpPp if ( random no. ≤ P p && flow record) update the flow record 1 Tunable parameters 2 2 Flow Memory
13
Exploring parametric space Router sampling budget P e = f(P h, P f, P p ) Trade-off between accuracy of flow statistics and # flows/application session Parameters can be tuned depending on Objective Network environment Examples of tuning parameters by objective Application classification: low P h, high P f, low P p Application characterization: lower P h, high P f, high P p Flow statistics of all flows: P h = P f = P p = P e
14
Evaluation goals Application characterization Question 1: Is RelSamp effective for sampling more # of flows in an application session? Question 2: Can RelSamp estimate statistics of an application session? Application classification Questions 3: Is sampling more # flows in an application session beneficial for application classification?
15
Experimental setup
16
Flows per application session #captured flows/#total flows in an app session CDF More # of flows per app session
17
Accuracy of BLINC classifier Sampling rate Accuracy (%) Note: classification results on flows using non-standard port ~ 50% increase
18
Related work Flow Sampling [ToN ’06] Samples flows once flow record is created Flow Slices [IMC ’05] Focuses on controlling router resources (CPU and memory) cSamp [NSDI ’08] Supports sampling of all traffic by coordinating various vantage points in a network FlexSample [IMC ’08] Support monitoring of traffic subpopulations, but needs to maintain extra states for approximate checking of predicates
19
Summary Introduced the notion of related sampling Samples more number of related flows in the same application session with higher probability Proposed RelSamp architecture Preserve application structure in sampled flow records Effective to preserving application session structure 5-10x more flows per application session compared to Sampled NetFlow Up to 50% higher classification accuracy than Sampled NetFlow
20
Thank you! Questions?
21
Evaluation method of classification techniques DPI-based Classifier RelSamp Sampled NetFlow Flow Sampling Ground Truth Flow Record1 Flow Record2 Flow Record3 Classification Algorithm (e.g., BLINC, SVM, C4.5) Packet Trace Report Tstat
22
Comparison with other solutions using BLINC Sampling rate # of accurately classified flows Note: classification results on flows using non-standard port
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.