Spatio-Temporal Modeling of Traffic Workload in a Campus WLAN Felix Hernandez-Campos3 Merkouris Karaliopoulos2 Maria Papadopouli 1,2,3 Haipeng Shen2 1 Foundation for Research & Technology-Hellas (FORTH) & University of Crete 2 University of North Carolina at Chapel Hill 3 Google 1IBM Faculty Award 2005, EU Marie Curie IRG, GSRT “Cooperation with non-EU countries” grants
Motivation Growing demand for wireless access Mechanisms for better than best-effort service provision need to be deployed Examples: capacity planning, monitoring, AP selection, load balancing Evaluate these mechanisms via simulations & analytically Models for network & user activity are fundamental requirements
Wireless infrastructure disconnection Internet Router Wired Network AP3 Switch Wireless Network User A AP 1 AP 2 User B
Wireless infrastructure Internet disconnection Router Wired Network Switch AP3 Wireless Network User A AP 1 AP 2 roaming roaming User B Session Associations 1 2 3 Flows Packets
Modeling Traffic Demand Multi-level spatio-temporal nature Different spatial scales Entire infrastructure, AP-level, client-level Time granularities Packet-level, flow-level, session-level
Modelling objectives Distinguish two important dimensions on wireless network modelling User demand (access & traffic) Topology (network, infrastructure, radio propagation) Find concepts that are well-behaved, robust to network dependencies & scalable
Internet Wired Network Switch Wireless Network Events disconnection Router Switch AP3 Wireless Network User A AP 1 AP 2 Events User B Session 1 2 3 Association Flow Arrivals t1 t2 t3 t4 t5 t6 t7 time
Our Models Session Arrival process Starting AP Flow within a session Number of flows Size Systems-wide & AP-level Captures interaction between clients & network Above packet level for traffic analysis & closed-loop traffic generation
Wireless Infrastructure 488 APs, 26,000 students, 3,000 faculty, 9,000 staff over 729-acre campus SNMP data collected every 5 minutes Packet-header traces: 8-day period April 13th ‘05 – April 20th ‘05 175GB captured on the link between UNC & the rest of the Internet using a high-precision monitoring card captured on the link between UNC and the rest of the Internet using a high-precision monitoring card (Endace DAV 4.3GE)
Time Series on Session Arrivals
Session Arrivals Time-varying Poisson Process The interarrival times for the Poisson process are i.i.d exponential random variables. A natural generalization is to consider a counting process for which the interarrival times are i.i.d with an arbitrary distribution. Such process is called a renewal process. The top figure shows an exponential quantile plot of the R_{I,j} during one randomly chosen hour. The bottom figure shows the autocorrelations of the Rij up to 20 lags. The sample autocorrelation are always within the confidence interval, so the Rijs do not exhibit any significant correlations. We got similar results when repeating the same analysis for other one-hour intervals of the 8day dataset. If you add simulation envelope, then each line is plotting the quantiles of a simulated data vs. the empirical data. The simulation data is generated from the distribution of interest using parameters (not necessarily the mean/sd, in the case of Bipareto and some other dist) estimated from the empirical data. This should be the most novel way to check a distribution. The R^2 measure used in CS literature is not as strong as this technique.
AP Preference Distribution The original data, show in in red, lie within the natural variability of the lognormal model, since they remain within the blue simulation envelop. The only departure from lognormality is for the smallest values, i.e., for APs that more rarely serve as session-starting APs, hence featuring very small number of samples. Overall, the lognormal distribution is an excellent description of the data. We have also considered other models but they are clearly outperformed by the lognormal fit. For example, Zipf’s law, a classic way of describing popularity, is very far from the AP-preference distribution in our data
Number of Flows Per Session
Stationarity of the Distribution of Number of Flows within Session We found consistent tails for the eight days suggesting that weekly periodicities are not critical for modelling the number of flows per session.
Flow Inter-Arrivals within Session The simulation envelop is very narrow in this case, and shows that some deviations from the lognormal model I the upper part are significant. While more complex models, e.g., an ON/OFF model, may provide a better approximation, our lognormal fit certainly provides a reasonable description of the data using only two parameters.
Flow Size Model We have also examined the stationarity of the flow size distributions over different days. We found consistent tails for the eight days suggesting that weekly periodicities are not critical for modelling the flow sizes. The pareto distributon has two parametres a>0 and k>0 that are the decay exponent and scale parameters, respectively. The scale prameters also the minimum possible value of the random variable. Similar parameters has the bipareto. The tail distribution initiall decays as a power law with exponent a>0. Then, in the vicinity of a breakpoint kb, the decay exponent gradually changes to b>0.
Model Validation Methodology Produced synthetic data based on Our models on session and flows-per-session Session arrivals: Time-Varying Poisson Flow interarrival in session: Lognormal Compound model (session, flows-per-session) Flows interarrival in session: Weibull Flat model No session concept Flows: renewal process Given the heavy-tailed session duration, we impose simulation times in the order of days. In particular, we let the simulator synthesize traffic over a 3-day interval (simulation time), and process the measured traffic variables obtained in the third day.
Model Validation Methodology Simulations -- Synthetic data vs. original trace Metrics: Variables not explicitly addressed by our models Aggregate flow arrival count process Aggregate flow interarrival time-series (1st & 2nd order statistics) Systems-wide & AP-based Different tracing periods (in 2005 & 2006) Given the heavy-tailed session duration, we impose simulation times in the order of days. In particular, we let the simulator synthesize traffic over a 3-day interval (simulation time), and process the measured traffic variables obtained in the third day.
Simulations Produce synthetic data based on aforementioned models Synthesize sessions & flows for a 3-day period in simulations Consider flows generated during the third day (due to heavy-tailed session duration)
Validation Number of Aggregate Flow Arrivals Depicts the number of aggregate flow arrivals within intervals of one hour. The 2-level model tracks closely the original trace in this respect, and certainly better than the other two approaches, although, it overestimates the arrivals during the busy hours. The compound model yields less satisfactory matching, although, it can respond to the non-stationarity of flow arrivals thanks to its provision for time-varying Poisson session arrivals. On the contrary, the flat model cannot respond to the time variations of flow arrivals, since the emiprical distribution is estimated over the full trace and averages the hourly fluctuations of the traffic demand.
Validation Coefficient of Variation
Validation: Autocorrelation
Aggregate Flow Inter-arrivals We found that the flow interarrivals within a session follow a lognormal distribution; the compound model with the transformed Weibull variables cannot give an equally good fit for these interarrivals and this is is reflected in the aggregate flow interarrival data. 99.9th percentile
Related Work in Modeling Traffic in Wired Networks Flow-level in several protocols (mainly TCP) Session-level FTP, web traffic Session borders are heuristically defined by intervals of inactivity
Related work in Modeling Wireless Demand Flow-level modelling by Meng et al. [mobicom04] No session concept Flow interarrivals follow Weibull Modelling flows to specific APs over one-hour intervals Does not scale well
Conclusions First system-wide, multi-level parametric modelling of wireless demand Enables superimposition of models for demand on a given topology Focuses on the right level of detail Masks network-related dependencies that may not be relevant to a range of systems Makes the wireless networks amenable to statistical analysis & modeling
Future Work Explore the spatial distribution of flows & sessions at various scales of spatial aggregation Examples: building, building type, groups of buildings Model the client dynamics The collected SNMP data do not include all information required for our two-level modelling approach
UNC/FORTH Web Archive Login/ password access after free registration Online repository of wireless measurement data models tools Packet header, SNMP, SYSLOG, signal quality http://www.cs.unc.edu/Research/mobile/datatraces.htm Login/ password access after free registration Joint effort of Mobile Computing Groups @ UNC & FORTH
WitMeMo’06 2nd International Workshop on Wireless Traffic Measurements and Modeling August 5th, 2006 Boston http://www.witmemo.org