Network Traffic Modeling Punit Shah CSE581 Internet Technologies OGI, OHSU 2002, March 6
03/06/2002CSE581, Winter 2002 | Punit Shah 2 Papers Generating Representative Web Workloads for Networks and Server performance Evaluation –Paul Bardford, Mark Crovells. Comp Sci Department, Boston University. Self-Similarity in WWW traffic: Evidence and possible cause –Mark Crovells, Azer Bestavros. Comp Sci Department, Boston University. On the Self-Similar Nature of Ethernet Traffic –Will Leland et al. IEEE members. Funded by Boston University.
03/06/2002CSE581, Winter 2002 | Punit Shah 3 Traffic modeling Understand a nature of the network traffic – Establish a traffic pattern –Characteristics, metrics varies by the network stack layer
03/06/2002CSE581, Winter 2002 | Punit Shah 4 Why to model a traffic ? Understand behavior of the servers, network etc. in workload conditions. –Capacity management –infrastructure planning –Performance improvement –Design of the software and services –Testing and Validation Developing a simulators (work load generators), e.g. ns (CMU), SURGE, SpecWeb96 and many commercially available simulators.
03/06/2002CSE581, Winter 2002 | Punit Shah 5 Model Parameters Application layer (HTTP) –server file size distribution –request size distribution (file size + protocol headers) –temporal locality (caching) etc. Data Link layer (Ethernet) –packets per second –mean time between two consecutive packets –bandwidth utilization –effect of number of hosts etc.
03/06/2002CSE581, Winter 2002 | Punit Shah 6 Time Series Analysis Primer Correlation –Under similar circumstances if any two events exhibits an identical(opposite) pattern, then events are called positively(negatively) correlated. –Range for degree of correlation is [-1, 1]. –Correlation models. Long range dependence –Current event is positively correlated to the future event. Heavy tail –Non-negligible random distribution in the tail, e.g. hyperbolic CDF plot. Simplest distribution is Pareto. p(x) ~ x - ; 0< < 2
03/06/2002CSE581, Winter 2002 | Punit Shah 7 Self-Similarity Term introduced by Mandelbrot in Let X = (Xt: t = 0, 1, 2, ….) be a time series mean and variance 2 lim r(k) = k (- ), 0 < < 1 k autocorrelation function For each m = 1, 2, 3 … X (m) = (X k (m) : k = 1, 2 …m) is new time series, i.e. original series is divided into m non- overlapping segments, whose autocorrelation function is r (m) (k). If r (m) (k) = r(k), then X is called (asymptotically) second order self-similar with degree H = 1 - /2. Where X k (m) = (X km-m+1 + … + X km )/m Also by k r(k) = , self-similar means long-range dependence.
03/06/2002CSE581, Winter 2002 | Punit Shah 8 Self-similar
03/06/2002CSE581, Winter 2002 | Punit Shah 9 Self-similar Xi = i=1,m ‘Self-Similarity’ == Burstiness
03/06/2002CSE581, Winter 2002 | Punit Shah 10 Ethernet Traffic Data Collection Data collected over four years, Aug 1989 to Feb 1992 to account for various network topologies. Main traffic at the time (1994) rlogin,e- mail, NFS, local radio station audio. Hosts ~27M packets. An instance of data collection encompassed low, medium, busy hours. Timestamp with 20 s accuracy.
03/06/2002CSE581, Winter 2002 | Punit Shah 11 Packets/unit time (empirical)
03/06/2002CSE581, Winter 2002 | Punit Shah 12 Packets/unit time (synthetic)
03/06/2002CSE581, Winter 2002 | Punit Shah 13 Statistical tests for self-similarity Variance-time plot –variance of log(X (m) ) is plotted against log(m); straight line with slope - > -1; H = 1 - /2 R/S plot (rescaled adjusted range stats.) –plot grows according to power law with exponent H as a n, i.e. n H periodogram –slope of the power spectrum of the series
03/06/2002CSE581, Winter 2002 | Punit Shah 14 Ethernet Variance Time plot Increasing m, slowly decreasing variance. Curve will cross threshold-line, if not self-similar.
03/06/2002CSE581, Winter 2002 | Punit Shah 15 Ethernet Traffic Analysis Ethernet traffic is self-similar. Unlike common belief, during busy times degree if self-similarity (burstiness) increases. >>50% traffic TCP packets, but no apparent effect of the non-TCP packets.
03/06/2002CSE581, Winter 2002 | Punit Shah 16 Web Traffic Data collection Traces collected from the real users accessing the web documents (Nov 94 - May 95) using HTTP v0.9 and 1.0 (No parallel connections) –4700 sessions –591 users –575,775 URL requests (46,830 unique per session) –130,140 files transferred Each file request is logged –URL –session, user, workstation ID –timestamp –size of doc, file transfer time
03/06/2002CSE581, Winter 2002 | Punit Shah 17 Trace Analysis Web traffic is self-similar
03/06/2002CSE581, Winter 2002 | Punit Shah 18 Reasons for the self-similarity Web transmission times –Distribution is highly variable. Available files are heavy-tailed. –Multi-media files to be blamed (image, audio, video) Quite time –Active off and inactive off
03/06/2002CSE581, Winter 2002 | Punit Shah 19 Quite Times
03/06/2002CSE581, Winter 2002 | Punit Shah 20 Quite Time Distribution
03/06/2002CSE581, Winter 2002 | Punit Shah 21 Generating Web Workload SURGE User Equivalence (UE) –Synthesized behavior should emulate the users –Multi-threaded program. HTTP v1.0. No parallel connections Distribution models –File sizes –Request sizes File size + Protocol Headers zero, if already cached Popularity –Zipf’s law: if files are ordered in decreasing popularity, then reference to a file is inversely proportional to its rank. P 1/r –Empirical data shows the popular web-docs are extremely popular and others receive a few hits
03/06/2002CSE581, Winter 2002 | Punit Shah 22 Model Parameters (contd.) Embedded object count –Determines a quite time, specifically ‘active off’ Temporal Locality (Caching) –Probability that same object would be requested again –Effect on network access –Stack distance OFF Times –Important parameter, self-similarity is lost if OFF times are ignored Matching problem: Assign the popularity to each file for given distribution of the file size and empirical request size (count?) distribution
03/06/2002CSE581, Winter 2002 | Punit Shah 23 SURGE Approach Use different (well known) models for each of the model parameter
03/06/2002CSE581, Winter 2002 | Punit Shah 24 SURGE Validation Compared with SpecWeb96 (specbench.org) –#of HTTP requests per second (h) –#of threads (t), per thread h/t requests Packets/sec - baseline –tests for 70,300, 500 packets/sec
03/06/2002CSE581, Winter 2002 | Punit Shah 25 Results Roughly similar #of TCP packets and requests in 30min run Mean active TCP connection is v/s 13.9 for SURGE, with very high variance of 3.92 (0.18) indicating self-similarity Server CPU utilization, active TCP connections are quite higher then the SepcWeb96
03/06/2002CSE581, Winter 2002 | Punit Shah 26 Active TCP Connections PPS SpecWeb96 SURGE
03/06/2002CSE581, Winter 2002 | Punit Shah 27 CPU Utilization SpecWeb96 SURGEPPS
03/06/2002CSE581, Winter 2002 | Punit Shah 28 Self-Similarity SpecWeb96 SURGEPPS
03/06/2002CSE581, Winter 2002 | Punit Shah 29 Conclusion Self-similarity (burstiness) is integral part of the network traffic behavior. Degree of self-similarity increases with the load. Server and network load is radically different than the non-self-similar models. Nature of the congestion produced by the self- similar traffic is drastically different from the non self-similar traffic.