Download presentation
Presentation is loading. Please wait.
Published byLana Queen Modified over 9 years ago
1
Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College of Computing, Georgia Tech + AT&T Labs - Research
2
Flow matrix FM FM [i, j, f] = the size of the flow f flowing from node i to node j Useful in Computing usage pattern of ISPs Detecting of flapping routes Detecting DDoS attacks Traffic and flow matrices Traffic matrix TM TM [i, j] = traffic volume from node i to node j Useful in Capacity planning and forecasting Routing configuration Network fault/reliability diagnoses Provisioning for SLA
3
Existing approaches Traffic matrix Indirect inference (holistic) Link counts from SNMP Routing matrix Network model Direct measurement Sampling Our approach Flow matrix Not well studied yet Straightforward approach: sampling
4
Data streaming algorithms Data streaming: processing a long stream of data items in one pass using a small working memory in order to answer a class of queries regarding the stream. Our context Packet arrival rate is high (e.g., 10-40 Gbps) Small but fast memory — SRAM (10ns per access) will be used. Challenge: how to fully use SRAM to remember as much information pertinent to traffic/flow matrix as possible?
5
Two data streaming schemes The bitmap-based scheme Traffic matrix The counter array-based scheme Flow matrix Traffic matrix
6
System model Online streaming module Online streaming module Data analysis module Node i Node j Sever
7
The bitmap-based scheme Online streaming module Data analysis module
8
Online streaming module The data digest data-structure is a bit array (bitmap) initially set to all 0’s. It is updated upon each packet arrival. Measurement proceeds in epochs.
9
Example packet 012i 0 Invariant packet header + the first 8 bytes of the payload [Snoeren et al. SIGCOMM’01] shows that these 28 bytes are sufficient to differentiate almost all non-identical packets. H(.) U := U-1 If U/b < Threshold save the bitmap start a new epoch b-1 1
10
Complexities Computational complexity One hash function computation One write to the memory Storage complexity Each packet only produces a little more than one bit as its digest. This can be further reduced using sampling.
11
The bitmap-based scheme Online streaming module Data analysis module
12
What we have so far? (for TM [i, j]): BM i generated by the traffic at node i (T i ) and BM j generated by the traffic at node j (T j ) What we want to estimate
13
Estimation based on BM i and BM j [Whang et al. 1990] proposed a method to infer |T| from BM, i.e., where is the number of “0”s in BM. |T i U T j | can be inferred from the bitwise-OR of BM i and BM j. An estimator of TM [i, j] is given by We derive the variance of the estimator
14
Multipaging 1 1 234 23 t1t1 t2t2 Node i Node j
15
Eliminating the effects of clock offset and packets in transit 1 1 234 23 t Node i Node j T1 : a tight upper bound of clock offset (e.g., 50ms in a NTP enabled network) If t < T1, then overlap(1,2) = 1 Combining with packets in transit T2 : a tight upper bound of packet traversal time If t < T1+T2, then overlap(1,2) = 1
16
Counter array based scheme Online streaming module Data analysis module
17
Online streaming module The data digest data-structure is a counter array. It is updated upon each packet arrival. Measurement proceeds in epochs.
18
Example packet 012i b-1 n Flow label H(.) n+1
19
Counter array based scheme Online streaming module Data analysis module
20
Principle: find good counter-value matching between ingress nodes and egress nodes Challenge: the hashing collisions make the one- to-one matching fail. Method: iterative elephant-first matching Accuracy: work well for the medium-to-large flow matrix elements due to the Zipfian nature of Internet traffic.
21
Elephant-first matching K a1a1 Node i a2a2 Node j a1>a2 a1-a2 Node i 0 Node j FM[i, j, f] = a2 K a1a1a2a2 a1<=a2 0a2-a1FM[i, j, f] = a1
22
Evaluation Ideally it would require packet-level traces collected simultaneously at hundreds of ingress and egress routers in an ISP during a certain period of time. We construct the synthetic experiments based on 16 publicly available packet- level traces from NLANR.
23
Evaluation: traffic matrix bitmap schemecounter array scheme
24
Metric
25
RMSRE: traffic matrix
26
RMSRE: flow matrix
27
Conclusion A novel data streaming algorithm that can produces traffic matrix estimation much more accurate than existing approaches. Another data streaming algorithm that very accurately estimates flow matrix, a finer-grained characterization than traffic matrix. Both algorithms are designed to operate at very high speed networks.
28
Thank You! Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.