GEN: A GPU-Accelerated Elastic Framework for NFV Zhilong Zheng Jun Bi Chen Sun Heng Yu Hongxin Hu Zili Meng Shuhe Wang Kai Gao Jianping Wu
Network Function Virtualization (NFV) Dedicated Dedicated Dedicated Dedicated NFV: Commodity Hardware Devices VM VM VM VM Service Function Chain (SFC) VPN Monitor Firewall Load Balancer Virtualization Techniques Low cost Elasticity control Service provisioning flexibility
General-purpose Multi-core Servers CPU-based NFV OpenNetVM (HotMiddlebox’16) NetBricks (OSDI’16) NFP (SIGCOMM’17) Metron (NSDI’18) … NFV Platforms NFV Infrastructure General-purpose Multi-core Servers Problems Low performance with negative improvement expectation Coarse-grained scaling
Problems of CPU-based NFV Low performance with negative improvement expectation Hard to achieve high performance (e.g., 40~100Gbps) for a wide range of NFs The slow/end of Moore’s Law Coarse-grained scaling IPSec (AES & SHA1) NIDS (Aho-Corasick) E5-2650 v2 (8 Cores, 2.6 GHz) Go, Younghwan, et al. "APUNet: Revitalizing GPU as Packet Processing Accelerator." NSDI. 2017. 2.6 ~ 7.7 Gbps 4.2 ~ 10.4 Gbps Underutilized 1 Mpps 9 Mpps 11 Mpps 10 Mpps 10 Mpps 1 CPU core 2 CPU cores
GPU as An Accelerator for NFV Benefits of GPU Massive processing cores Fine-grained computing units High-performance NFs Potential Fine-grained resource Existing work Router (PacketShader, SIGCOMM’10) SSL proxy (SSLShader, NSDI’11) NIDS (Kargus, CCS’12) IPSec (NBA, EuroSys’15) NFV framework (G-NET, NSDI’18) High-performance SFCs Problems Unsolved Fine-grained fast Scaling
GEN exploits GPU to support high-performance SFCs with fine-grained scaling
GEN Framework Overview Server Server CPU GPU CPU GPU SFC Manager SFC Manager SFC Controllers SFC Controllers GPU GPU
Infrastructure Design High Performance NIC 10 / 40 / 100 GbE Ports CPU (User Space) GPU (2k~3k physical cores) SFC Manager SFC Controller #1 Global Memory Tx Output Queuing Packet Forwarder Packet Dropper ① Chain #1 NF #1 Chain #1 NF #2 Chain #1 NF #3 SFC Agent #1 ② R Adaptive Batcher SFC Starter Rx …… Chain Classifier SFC Agent #n Chain #n NF #1 …… Chain #n NF #mn R SFC Controller #n Elastic Scaling
Problem #1: SFC Model Selection Pipelining Run-to-completion (RTC) Packets Packets NF1 NF2 NF1 NF2 Instance #1 Instance #2 Instance #1
SFC Model Selection: Pipelining Two potential ways to support pipelining in GPU Sequenced invocations Persistent kernels CPU GPU CPU GPU Packet batch Packet Buffer Packet batch Packet Buffer 1. Packet copying 3. Reading 1. Packet copying 2. Reading Worker-NF1 2. Kernel invocation NF1 Worker-SFC NF1 (persistent) 4. Synchronization 5. Next NF Out 3. Next NF Kernel invocation at startup of the system 6. Kernel invocation Worker-NF2 NF2 NF2 (persistent) 7. Reading 4. Reading 8. Synchronization Out High overhead from frequent kernel invocations (~5us per invocation) Hard and costly scaling
SFC Model Selection: RTC RTC-based Model CPU GPU Less kernel invocations (once per SFC) Packet batch Packet Buffer 1. Packet copying Worker-SFC RTC Model 2. Kernel invocation NF1 4. Synchronization Easier scaling (not persistent) Out NF2 Packet NFs are integrated into a specific SFC Agent kernel fusion SFC Agent (in GPU) is Launched by SFC Starter (in CPU)
Problem #2: Elastic Scaling Avoid monitoring NF load for scaling Avoid deciding when to scale Avoid deciding to what extent an NF should be scaled Avoid considering how to quickly carry out NF scaling Avoid state management caused by scale out / in Intuition: Use scale up / down to avoid state management Adaptive Batcher
Elastic Scaling – Adaptive Batcher Design of the adaptive batcher Keeping the buffer occupancy at a low level Scaling up/in GPU resource provisioning State management avoidance Adaptive Batcher Buffer Packets All packets In the buffer Fetching Batching GPU Scaling up/in more mini-batches in GPU Load monitoring avoidance
Preliminary Evaluation Hardware CPU: Two Intel Xeon E5-2650 v4 (10 physical cores) GPU: NVIDIA TITAN Xp NIC: Two Intel X520 (40 Gbps in total) Software DPDK 17.11 for networking IO CUDA 8.0 for GPU programming NFs & SFCs IPV4Router (1k entries) NIDS (3k rules) IPSec (SHA1 & AES-128-CBC)
Performance of RTC vs. Pipelining 95th 33.7% 29.2% and 28.1%
Fast Elastic Scaling Fast converging (< 100ms)
Conclusion and Future Work Gen: a GPU-accelerated elastic framework for NFV High-performance SFC Elastic scaling Future work More SFC performance enhancement in GPU Coordination between CPU and GPU Impact of dynamic traffic load
Thank You http://netarchlab.tsinghua.edu.cn