Measuring Service in Multi-Class Networks Aleksandar Kuzmanovic and Edward W. Knightly Rice Networks Group http://www.ece.rice.edu/networks
Background QoS services SLA guaranteed rate Relative performance Ex. Class X serviced at minimum rate R Relative performance Ex. Class X has strict priority over class Y Statistical service Ex. P(class X pkt. Delay>100ms)<.001 QoS mechanisms Priority queues Rate-based, delay-based... Policing Rate limiting... Over-engineering Just add more bandwidth... Need: Tools for network clients to assess the networks QoS capabilities
Inverse QoS Problem Is a class rate limited? What is the inter-class relationship? Fair/weighted fair/strict priority Is resource borrowing fully allowed or not? Is the service’s upper bound identical to its lower bound? What are the service’s parameters?
Applications - Network Example Providers reluctant to divulge precise QoS policy (if any...) SLA validation for VPNs Is the SLA fulfilled? Capacity planning What is the relationship among classes? Edge-based admission control [CK00] and implementation [SSYK01]
Performance Monitoring and Resource Management Single WEB server CPU resource sharing Listen queue differentiation Admission control Distributed WEB server Load balancing Internet Data Center Machine migration Goal: Estimate a class’ net “guaranteed rate”
“Off-Line” Solution is Simple Consider a router with unknown QoS mechanisms
“On-Line” Case: Operational Network Undesirable to disrupt on-going services High rate probes to detect inter-class relationships would degrade performance Impossible to force other classes to be idle … to detect policers
System Model and Problem Formulation Two stage server Non-work conserving elements Multi-class scheduler Observations Arrival and departure times Class ID Packet size
Determine... Infer the service discipline Most likely hypothesis among WFQ, EDF and SP Detect the existence of non-work conserving elements Rate limiters (ex. leaky bucket policers) Estimate the system parameters WFQ guaranteed rates, EDF deadlines, rate limiter values
Remaining Outline Inter-class Resource Sharing Theory Empirical Arrival and Service Models MLE of Parameters EDF/WFQ/SP Hypothesis Testing Simulation Results and Conclusions
Theoretical Tool: Statistical Service Envelopes [QK99] General statistical char. for a (virtual) minimally backlogged flow Flows receive additional service beyond min rate Function of other flow demand Function of scheduler General characterization of inter-class resource sharing Framework for admission control for EDF/WFQ/SP
Strategy Inter-class theory Key technique: Passively monitor arrivals and services at edges Devise hypothesis tests to jointly: Detect most likely hypothesis Estimate unknown parameters
Empirical Arrival Model time t + I t E*( I ) = 3 Envelopes characterize arrivals as a function of interval length Statistical traffic envelope [QK99] Empirical envelope - measure first two moments of arrivals over multiple time scales Goal: assuming Gaussian distribution for B
Empirical Service Model A real-world paradigm for statistical service envelope Observe: Service can be measured only when packets are backlogged
Empirical Service Distributions For each class and time scale Expected service distributions Service measures (data) Empirical service distributions WFQ (400 ms) SP (400 ms)
Parameter Estimation and Scheduler Inference GLRT for each time scale Under MLE parameters for each scheduler Choose most likely scheduler Apply majority rule over all time scales
EDF/WFQ Testing Correctness ratio True WFQ 94% True EDF 100% Importance of time scales Short time scales Fluid vs. packet model Long time scales Ratio of delay shift and time scale decreases as time scale increases (d1=25ms)
Measurable Regions What if there is no traffic in particular class? What traffic load “allows” inferences? Region where we are able to estimate true value within 5% Typical utilization should be > 62% for 1.5 Mbps link Otherwise, active probing required
Conclusions Framework for clients of multi-class services to assess a system’s core QoS mechanisms Scheduler type Estimate parameters (both w-c and n-w-c) General multiple time-scale traffic and service model to characterize a broad set of behaviors within a unified framework
Measuring Service in Multi-Class Networks Aleksandar Kuzmanovic and Edward W. Knightly Rice Networks Group http://www.ece.rice.edu/networks
Ongoing Work Unknown cross-traffic Cannot monitor all systems inputs/outputs Treat cross-traffic statistics as another unknown Web servers Evaluation of the framework in a single web server through trace driven simulations Capacity is statistically characterized
WFQ Parameter Estimation Class 1: 65-68 flows Class 2: 25-28 flows Large windows improve confidence level T=2sec: 95% in 11% of true value T=10sec: 95% in 1.4% of true value Flow level dynamics & non- stationarities must be considered
Rate Limited Class State Detection Can include parameter r in service envelope equations for each class Importance of time scales Example Class based fair queuing C=1.5Mbps, r=1Mbps Probability decreases with time scale higher errors when measuring multi-level leaky-buckets
Generalized Likelihood Ratio Test Detection with unknowns Note: we do not find a single value of that maximizes likelihood ratio Under mild conditions (as ), GLRT is Uniformly Most Powerful (maximizes the probability of detection)