Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College of Computing, Georgia Tech + AT&T Labs - Research
Flow matrix FM FM [i, j, f] = the size of the flow f flowing from node i to node j Useful in Computing usage pattern of ISPs Detecting of flapping routes Detecting DDoS attacks Traffic and flow matrices Traffic matrix TM TM [i, j] = traffic volume from node i to node j Useful in Capacity planning and forecasting Routing configuration Network fault/reliability diagnoses Provisioning for SLA
Existing approaches Traffic matrix Indirect inference (holistic) Link counts from SNMP Routing matrix Network model Direct measurement Sampling Our approach Flow matrix Not well studied yet Straightforward approach: sampling
Data streaming algorithms Data streaming: processing a long stream of data items in one pass using a small working memory in order to answer a class of queries regarding the stream. Our context Packet arrival rate is high (e.g., Gbps) Small but fast memory — SRAM (10ns per access) will be used. Challenge: how to fully use SRAM to remember as much information pertinent to traffic/flow matrix as possible?
Two data streaming schemes The bitmap-based scheme Traffic matrix The counter array-based scheme Flow matrix Traffic matrix
System model Online streaming module Online streaming module Data analysis module Node i Node j Sever
The bitmap-based scheme Online streaming module Data analysis module
Online streaming module The data digest data-structure is a bit array (bitmap) initially set to all 0’s. It is updated upon each packet arrival. Measurement proceeds in epochs.
Example packet 012i 0 Invariant packet header + the first 8 bytes of the payload [Snoeren et al. SIGCOMM’01] shows that these 28 bytes are sufficient to differentiate almost all non-identical packets. H(.) U := U-1 If U/b < Threshold save the bitmap start a new epoch b-1 1
Complexities Computational complexity One hash function computation One write to the memory Storage complexity Each packet only produces a little more than one bit as its digest. This can be further reduced using sampling.
The bitmap-based scheme Online streaming module Data analysis module
What we have so far? (for TM [i, j]): BM i generated by the traffic at node i (T i ) and BM j generated by the traffic at node j (T j ) What we want to estimate
Estimation based on BM i and BM j [Whang et al. 1990] proposed a method to infer |T| from BM, i.e., where is the number of “0”s in BM. |T i U T j | can be inferred from the bitwise-OR of BM i and BM j. An estimator of TM [i, j] is given by We derive the variance of the estimator
Multipaging t1t1 t2t2 Node i Node j
Eliminating the effects of clock offset and packets in transit t Node i Node j T1 : a tight upper bound of clock offset (e.g., 50ms in a NTP enabled network) If t < T1, then overlap(1,2) = 1 Combining with packets in transit T2 : a tight upper bound of packet traversal time If t < T1+T2, then overlap(1,2) = 1
Counter array based scheme Online streaming module Data analysis module
Online streaming module The data digest data-structure is a counter array. It is updated upon each packet arrival. Measurement proceeds in epochs.
Example packet 012i b-1 n Flow label H(.) n+1
Counter array based scheme Online streaming module Data analysis module
Principle: find good counter-value matching between ingress nodes and egress nodes Challenge: the hashing collisions make the one- to-one matching fail. Method: iterative elephant-first matching Accuracy: work well for the medium-to-large flow matrix elements due to the Zipfian nature of Internet traffic.
Elephant-first matching K a1a1 Node i a2a2 Node j a1>a2 a1-a2 Node i 0 Node j FM[i, j, f] = a2 K a1a1a2a2 a1<=a2 0a2-a1FM[i, j, f] = a1
Evaluation Ideally it would require packet-level traces collected simultaneously at hundreds of ingress and egress routers in an ISP during a certain period of time. We construct the synthetic experiments based on 16 publicly available packet- level traces from NLANR.
Evaluation: traffic matrix bitmap schemecounter array scheme
Metric
RMSRE: traffic matrix
RMSRE: flow matrix
Conclusion A novel data streaming algorithm that can produces traffic matrix estimation much more accurate than existing approaches. Another data streaming algorithm that very accurately estimates flow matrix, a finer-grained characterization than traffic matrix. Both algorithms are designed to operate at very high speed networks.
Thank You! Questions?