Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures D. Cotroneo, L. De Simone, A.K. Iannillo, A. Lanzaro, R. Natella Critiware s.r.l. and Federico II University of Naples, Italy
Towards Network Functions Virtualization Virtual network equipment Physical network equipment RGW DPI BRAS IMS EPC ... Telecom workloads have demanding requirements (99.99...% availability) and cannot afford outages Reduced costs, improved manageability, faster innovation Comparable performance and reliability?
Why engineering reliable NFV is challenging? Complex stack of hardware and software off-the-shelf components Exposure to several sources of hardware and software faults Lack of tools and methodologies for testing fault-tolerance Hardware Hypervisor VM Guest OS Virtualization ? ? As a result, it is hard to trust the reliability of NFV services
In this presentation: An experimental methodology for dependability benchmarking of NFV based on fault injection A case study on a virtual IP Multimedia Subsystem (IMS), analyzing: The impact of faults on performance and availability The sensitivity to different types of faults The pitfalls in the design of NFVIs
What is a dependability benchmark? A dependability benchmark evaluates a system in the presence of (deliberately injected) faults Are NFV services still available and high- performing even when a fault is injected? The dependability benchmark includes: measures (KPIs) for characterizing performance and availability procedures, tools, conditions under which measures are obtained
Overview of the benchmarking process Iterated over several different faults Definition of workload, faultload, and measures Fault Injection experiments Computation of measures and reporting Deployment of VNFs over the NFVI Workload and VNFs execution Data collection Testbed clean-up The first part consists in the definition of key performance indicators (KPIs), the faultload (i.e., a set of faults to inject in the NFVI) and the workload (i.e., inputs to submit to the NFVI) that will support the experimental evaluation of an NFVI. Based on these elements, the second part of the methodology consists in the execution of a sequence of fault injection experiments. In each fault injection experiment, the NFVI under evaluation is first configured, by deploying a set of VNFs to exercise the NFVI; then, the workload is submitted to the VNFs running on the NFVI and, during their execution, faults are injected; at the end of the execution, performance and failure data are collected from the target NFVI; then, the experimental testbed is cleaned-up (e.g., by un-deploying VNFs) before starting the next experiment. This process is repeated several times, by injecting a different fault at each fault injection experiment (while using the same workload and collecting the same performance and failure metrics). The execution of fault injection experiments can be supported by automated tools for configuring virtualization infrastructures, for generating network workloads, and for injecting faults. Finally, performance and failure data from all experiments are processed to compute KPIs, and to support the identification of performance/dependability bottlenecks in the target NFVI. ... ... Injection of the i-th fault
Benchmark measures The dependability benchmark measures the quality of service as perceived by NFV users: VNF latency VNF throughput VNF experimental availability Risk Score We compare fault-injected experiments with the QoS objectives and the fault-free experiment (benchmark baseline)
VNF Latency and Throughput trequest End points VNF VNF VNF tresponse VNF Virtualization Layer Off-The-Shelf hardware and software Fault Injection VNF Latency: the time required to process a unit of traffic (such as a packet or a service request) VNF Throughput: the rate of processed traffic (packets or service requests) per second
Characterization of VNF latency Percentiles of the distribution are compared against QoS objectives, e.g.: 50th percentile ≤ 150ms 90th percentile ≤ 250ms 90th percentiles Response latency fault-free 50th percentiles Response latency with faults, good performance Response latency with faults, bad performance Gap from QoS objectives
VNF Experimental Availability End points VNF End points End points VNF VNF VNF Virtualization Layer Off-The-Shelf hardware and software Fault Injection Experimental availability: the percentage of traffic units that are successfully processed
Risk Score The Risk Score is a brief measure that summarizes the risk of experiencing service unavailability and/or performance failures Performance failures Availability failures Weighted average over all faults
Benchmark faultload I/O faults Compute faults Faults in virtualized environments include disruptions in network and storage I/O traffic, in CPUs and memory A fault injector has been implemented as a set of kernel modules for VMware ESXi and Linux Network frame receive/transmit Corruption Drop Delay Host VM Storage block reads/write I/O faults Compute faults Hogs Termination Code corruption Data corruption CPU Memory Host VM
Benchmark workload The VNFs should be exercised using a representative workload Our dependability benchmarking methodology is not tied to a specific choice of the workload Realistic workloads can be generated using load testing and performance benchmarking tools (e.g., Netperf)
Case study: Clearwater IMS Clearwater: an open-source NFV-oriented implementation of IP Multimedia Subsystem (IMS) In a first round of experiments, we test a replicated, load-balanced deployment over several VMs In a second round of experiments, we introduce the automated recovery of VMs (VMware HA cluster) in the setup We use SIPp to generate SIP call set-up requests VMware ESXi replicated servers Fault Injection
Fault injection test plan We inject faults in one of the physical host machines, and faults in a subset of the VMs (Sprout and Homestead) We inject both I/O (network, storage) and compute (memory, CPU) faults, both intermittently and permanently Each fault injection experiment has been repeated three times In total, 93 fault injection experiments have been performed
Experimental availability We computed performance and availability KPIs from logs of the SIPp workload generator Faults have a strong impact on availability Compute faults and Sprout-VM faults have the strongest impact
VNF latency (by fault type) Over than 10% of requests exhibit a latency much higher than 250ms! T50=150ms T90=250ms
Risk Score and problem determination The overall risk score (55%) is quite high and reflects the strong impact of faults The infrastructure was affected by a capacity problem once a VM or Host fails, the remaining replicas are not able to handle the SIP traffic NFVI design choices have a big impact on reliability! e.g., placement of VMs across hosts, topology of virtual networks and storage, allocation of CPUs and memory for VMs, etc.
Evaluating automated recovery mechanisms Fault-free run Faulty run, load-balancing only ~1m Faulty run, load-balancing + automated recovery Fault injected VM recovered Fault tolerance mechanisms require careful tuning, based on experimentation in our experiments, automated VM recovery was too slow and availability still resulted low
Conclusion Performance and availability are critical concerns for NFV NFVIs are very complex, and making design choices is difficult We proposed a dependability benchmark useful to point out dependability issues and to guide designers Future work will extend the evaluation to alternative virtualization technologies
Thank you! Questions?