Modeling the Wireless Traffic Workload Maria Papadopouli Assistant Professor Department of Computer Science, University of Crete & Institute of Computer Science, Foundation for Research & Technology-Hellas (FORTH) Joint research with: F. Hernandez-Campos, M. Karaliopoulos, H. Shen, E. Raftopoulos IBM Faculty Award, EU Marie Curie IRG, GSRT “Cooperation with non-EU countries” grants
Research Projects @ UoC/FORTH Measurements on large-scale wireless networks Delays, packet losses, traffic characterization, impact of caching Measurement-based modelling of wireless networks Mechanisms for improving wireless access & spectrum utilization AP selection and caching mechanisms Evaluating user experience running streaming applications over wireless Location-sensing Mobile p2p computing Impact of caching in mobile social networking Design & evaluation of mobile applications
Empirical measurements Can be beneficial in revealing deficiencies of a wireless technology different phenomena of the wireless access & workload Impel modelling efforts to produce more realistic models & synthetic traces based on these models Enable meaningful performance analysis studies using such empirical and synthetic traces Highlight the ability of empirical-based models to capture the characteristics of the user-workload and provide a flexible framework for using them in performance analysis
Modelling and trace generation The definition of realism must be considered in the context of its usage eg requirements for capacity planning vs. queue management Our motivation: Capacity planning, admission control, AP selection algorithms Modelling objectives: Accuracy, scalability, re-usability, tractability (easy to interpret)
Roadmap Background Proposed models Modelling methodology Model evaluation & validation Scalability vs. accuracy tradeoffs Conclusions On-going research
Related work Rich literature in traffic characterization in wired networks Willinger, Taqqu, Leland, Park on self-similarity of Ethernet LAN traffic Crovela, Barford on Web traffic Feldmann, Paxson on TCP Paxson, Floyd on WAN Jeffay, Hernandez-Campos, Smith on HTTP Traffic generators for wired traffic Hernandez-Campos, Vahdat, Barford, Ammar, Pescape, … P2P traffic Saroiu, Sen, Gummadi, He, Leibowitz, … On-line games Pescape, Zander, Lang, Chen, … Modelling of wireless traffic Meng et al.
Wireless infrastructure Internet disconnection Router Wired Network Switch AP3 Wireless Network User A AP 1 AP 2 roaming roaming User B Associations 1 2 3 Flows Packets
Dimensions in modeling wireless access Intended user demand User mobility patterns Arrival at APs Roaming across APs Link conditions Network topology
Main approaches for traffic generation Packet-level replay An exact reproduction of a collected trace in terms of packet arrival times, size, source, destination, content type Reflects specific traffic conditions Suffers from arbitrary delays e.g., interrupts, service mechanisms, scheduling processes difficult to incorporate feedback-loop characteristics Source-level generation Allows the underlying network, protocol, & application layer to specify & control the packet arrival process Simplest example: infinite source model
Our approach Inspired by the source-level (or network independent) modelling Assumptions: Client arrivals at an infrastructure (initiated by humans) at a large extent are not affected by the underlying network technology Very low % of packet loss at the network layer flow arrivals & sizes approximate intended user traffic demand
Internet Wired Network Switch Wireless Network Events Session Flow disconnection Wired Network Router Switch AP3 Wireless Network User A AP 1 AP 2 Events User B Session 1 2 3 Flow Arrivals t1 t2 t3 t4 t5 t6 t7 time
Traffic Demand Parameters Session arrival process starting AP Flow within session number of flows size (in bytes) Captures interaction between clients & network Above packet-level analysis
Wireless infrastructure & acquisition 26,000 students, 3,000 faculty, 9,000 staff in over 729-acre campus 488 APs (April 2005), 741 APs (April 2006) SNMP data collected every 5 minutes Several months of SNMP & SYSLOG data from all APs Packet-header traces: Two weeks (in April 2005 and April 2006) Captured on the link between UNC & rest of Internet via a high-precision monitoring card captured on the link between UNC and the rest of the Internet using a high-precision monitoring card (Endace DAG 4.3GE) 8-day period April 13th ‘05 – April 20th ’05 Custom snmp-polling system relying on a non-blocking snmp library. APs are polled independently so that delays incurring during the processing of SNMP polls bhy the slower APs do not affect the other APs
Related modeling approaches Flow-level modeling by Meng [mobicom ‘04] No session concept Weibull for flow interarrivals Lognormal for flow sizes AP-level over hourly intervals Hierarchical modeling by Papadopouli [wicon ‘06] Time-varying Poisson process for session arrivals BiPareto for in-session flow numbers & flow sizes Lognormal for in-session flow interarrivals Sessions capture the non-stationarity of traffic workload
Modeling methodology Selection of models (e.g., various distributions) Fitting parameters using empirical traces Evaluation and comparison of models Visual inspection e.g., CCDFs & QQ plots of models vs. empirical data Statistical-based criteria e.g., QQ/simulation envelopes, Kullback-Liebler divergence Systems-based criteria e.g., throughput, delay, jitter, queue size Validation of models Generalization of models
Synthetic trace generation
Synthetic traces based on empirical ones original data from the real-life infrastructure Produced by this process: Generate session arrivals within each session: generate number of flows for each flow: generate flow arrivals & sizes based on specific models Session arrivals: using hourly, building-specific empirical traces Flow-related data: using empirical traces of different spatial scales
Model validation Use empirical data from different tracing periods April 2005 & 2006 spatial scales AP-level < building-level < building-type-level < network-wide traffic conditions @ AP campus-wide wireless infrastructures UNC, Dartmouth Do the same distributions persist across these traces ? Compare their performance (empirical traces: “ground truth”) YES!
Model evaluation Create synthetic data based on models Analysis with metrics not explicitly addressed by the models Statistical-based aggregate flow arrival count process aggregate flow interarrival (1st & 2nd order statistics) System-based: performance of an IEEE802.11 LAN traffic load and queue size in various time scales per-flow & hourly aggregate throughput per-flow delay and jitter Compare their performance (empirical traces: “ground truth”) 19
Modeling in Various Spatio-temporal Scales Sufficient spatial detail Scalable Amenable to analysis Hourly period @ AP Network-wide Objective Scales Tradeoff with respect to accuracy, scalability & reusability
Scalability vs. Accuracy: Flow Interarrivals Spatial /Temporal Scales EMPIRICAL BDLG(DAY) BDLGTYPE(DAY) NETWORK(TRACE)
Scalability vs. Accuracy: Number of Flow Arrivals in an Hour BDLGTYPE(TRACE) BDLG(DAY) EMPIRICAL NETWORK(TRACE)
Model evaluation Create synthetic data based on models Analysis with metrics not explicitly addressed by the models Statistical-based aggregate flow arrival count process aggregate flow interarrival (1st & 2nd order statistics) System-based: performance of an IEEE802.11 LAN traffic load and queue size in various time scales per-flow & hourly aggregate throughput per-flow delay and jitter Compare their performance (empirical traces: “ground truth”) Dominant parameters ? Impact of application mix?
Simulation/Emulation Testbed Internet Router Wired Network AP3 Switch User D Wireless Network User A AP 1 AP 2 User B User C Assign traffic demand Scenario of wireless access Various traffic conditions Scenario: User A generates a flow of size X @ T1 User B generates a flow of size Y @ T2 ▪
Simulation/Emulation testbed TCP flows UDP Wired clients: senders Wireless clients: receivers
Hourly aggregate throughput FLOW SIZE—FLOW (INTER)ARRIVAL EMPIRICAL Impact of flow size Fixed flow sizes & empirical flow arrivals (aggregate traffic as in EMPIRICAL) BIPARETO-LOGNORMAL-AP Pareto flow sizes, empirical flow arrivals BIPARETO-LOGNORMAL
Per-flow throughput FLOWSIZE—FLOWARRIVAL Pareto flow sizes & uniform flow arrivals BIPARETO-LOGNORMAL EMPIRICAL BIPARETO-LOGNORMAL-AP due to large % of small size flows (= MSS) Pareto flow sizes Fixed flow sizes & empirical number of flows
Aggregate hourly downloaded traffic
Impact of application mix on per-flow throughput TCP-based scenario AP with 85% web traffic AP with 80% p2p traffic AP with 50% web & 40% p2p traffic
Amount of Trx Bytes & Queue Size
m=4 m=12 Forwarded bytes @ router In various times scales (2m ms) m=8 m=14
UDP traffic scenario Wireless hotspot AP Wireless clients downloading Wired traffic transmit at 25Kbps Total aggregate traffic sent in CBR and in empirical is the same Empirical: 1.4 Kbps Bipareto-Lognormal-AP: 2.4 Kbps Bipareto-Lognormal: 2.6 Kbps NA to epalh8eusw auto me ton Elia kai na doume ean yparxoun kai plots (px tou Februariou 15.6 15.5 pou na voh8oun Empirical: 1.7 Kbps Bipareto-Lognormal-AP: 9.7 Kbps Bipareto-Lognormal: 10.3 Kbps Large differences in the distributions
Conclusions Model validation over two different periods (2005 and 2006) over two different campus-wide infrastructures (UNC & Dartmouth) BiPareto captures well the flow sizes over heavy & normal traffic conditions @ AP using statistical-based metrics using system-based metrics hourly aggregate throughput per-flow delay per-flow throughput Enables superimposition of models for demand on a given topology proposed statistical distributions valid over two different periods Explores spatial distribution of flows & sessions at various scales of spatial aggregation individual buildings / groups of buildings (clusters
Conclusions (con’t) Accurate and scalable models of wireless demand Accuracy: our models perform very close to the empirical traces popular models deviate substantially from the empirical traces Scalability: same distributions at various spatial & temporal scales group of APs per bldg addresses scalability-accuracy tradeoffs
Conclusions (con’t) Impact of various parameters Application mix of AP traffic mostly web: very accurate models both web & p2p : models are ok mostly p2p: large deviations from empirical data Modelling P2P traffic is challenging due to the increased number, diversity, complexity & unpredictability in user interaction Both flow size and flow interarrivals Enables superimposition of models for demand on a given topology proposed statistical distributions valid over two different periods Explores spatial distribution of flows & sessions at various scales of spatial aggregation individual buildings / groups of buildings (clusters
In progress … Evaluate the performance of AP or channel selection, load balancing & admission control protocols under real-life traffic conditions IEEE802.11 Mesh & infrastructure-based testbeds Heterogeneous wireless networks
Revisiting modelling approach Physical meaning of the models and their parameters Client profile e.g., depending on the application-mix, amount of traffic Group mobility Multiple network interfaces Cooperative client models Dependencies among traffic demand & network conditions Impact of underlying network conditions on application & usage patterns
UNC/FORTH web archive Online repository of models, tools, and traces Packet header, SNMP, SYSLOG, synthetic traces, … http://netserver.ics.forth.gr/datatraces/ Free login/ password to access it Simulation & emulation testbeds that replay synthetic traces for various traffic conditions Mobile Computing Group @ University of Crete/FORTH http://www.ics.forth.gr/mobile/ maria@csd.uoc.gr