GEN: A GPU-Accelerated Elastic Framework for NFV

GEN: A GPU-Accelerated Elastic Framework for NFV
Zhilong Zheng Jun Bi Chen Sun Heng Yu Hongxin Hu Zili Meng Shuhe Wang Kai Gao Jianping Wu

Network Function Virtualization (NFV)
Dedicated Dedicated Dedicated Dedicated NFV: Commodity Hardware Devices VM VM VM VM Service Function Chain (SFC) VPN Monitor Firewall Load Balancer Virtualization Techniques Low cost Elasticity control Service provisioning flexibility

General-purpose Multi-core Servers
CPU-based NFV OpenNetVM (HotMiddlebox’16) NetBricks (OSDI’16) NFP (SIGCOMM’17) Metron (NSDI’18) … NFV Platforms NFV Infrastructure General-purpose Multi-core Servers Problems Low performance with negative improvement expectation Coarse-grained scaling

Problems of CPU-based NFV
Low performance with negative improvement expectation Hard to achieve high performance (e.g., 40~100Gbps) for a wide range of NFs The slow/end of Moore’s Law Coarse-grained scaling IPSec (AES & SHA1) NIDS (Aho-Corasick) E v2 (8 Cores, 2.6 GHz) Go, Younghwan, et al. "APUNet: Revitalizing GPU as Packet Processing Accelerator." NSDI 2.6 ~ 7.7 Gbps 4.2 ~ 10.4 Gbps Underutilized 1 Mpps 9 Mpps 11 Mpps 10 Mpps 10 Mpps 1 CPU core 2 CPU cores

GPU as An Accelerator for NFV
Benefits of GPU Massive processing cores Fine-grained computing units High-performance NFs Potential Fine-grained resource Existing work Router (PacketShader, SIGCOMM’10) SSL proxy (SSLShader, NSDI’11) NIDS (Kargus, CCS’12) IPSec (NBA, EuroSys’15) NFV framework (G-NET, NSDI’18) High-performance SFCs Problems Unsolved Fine-grained fast Scaling

GEN exploits GPU to support high-performance SFCs with fine-grained scaling

GEN Framework Overview
Server Server CPU GPU CPU GPU SFC Manager SFC Manager SFC Controllers SFC Controllers GPU GPU

Infrastructure Design
High Performance NIC 10 / 40 / 100 GbE Ports CPU (User Space) GPU (2k~3k physical cores) SFC Manager SFC Controller #1 Global Memory Tx Output Queuing Packet Forwarder Packet Dropper ① Chain #1 NF #1 Chain #1 NF #2 Chain #1 NF #3 SFC Agent #1 ② R Adaptive Batcher SFC Starter Rx …… Chain Classifier SFC Agent #n Chain #n NF #1 …… Chain #n NF #mn R SFC Controller #n Elastic Scaling

Problem #1: SFC Model Selection
Pipelining Run-to-completion (RTC) Packets Packets NF1 NF2 NF1 NF2 Instance #1 Instance #2 Instance #1

SFC Model Selection: Pipelining
Two potential ways to support pipelining in GPU Sequenced invocations Persistent kernels CPU GPU CPU GPU Packet batch Packet Buffer Packet batch Packet Buffer 1. Packet copying 3. Reading 1. Packet copying 2. Reading Worker-NF1 2. Kernel invocation NF1 Worker-SFC NF1 (persistent) 4. Synchronization 5. Next NF Out 3. Next NF Kernel invocation at startup of the system 6. Kernel invocation Worker-NF2 NF2 NF2 (persistent) 7. Reading 4. Reading 8. Synchronization Out High overhead from frequent kernel invocations (~5us per invocation) Hard and costly scaling

SFC Model Selection: RTC
RTC-based Model CPU GPU Less kernel invocations (once per SFC) Packet batch Packet Buffer 1. Packet copying Worker-SFC RTC Model 2. Kernel invocation NF1 4. Synchronization Easier scaling (not persistent) Out NF2 Packet NFs are integrated into a specific SFC Agent kernel fusion SFC Agent (in GPU) is Launched by SFC Starter (in CPU)

Problem #2: Elastic Scaling
Avoid monitoring NF load for scaling Avoid deciding when to scale Avoid deciding to what extent an NF should be scaled Avoid considering how to quickly carry out NF scaling Avoid state management caused by scale out / in Intuition: Use scale up / down to avoid state management Adaptive Batcher

Elastic Scaling – Adaptive Batcher
Design of the adaptive batcher Keeping the buffer occupancy at a low level Scaling up/in GPU resource provisioning State management avoidance Adaptive Batcher Buffer Packets All packets In the buffer Fetching Batching GPU Scaling up/in more mini-batches in GPU Load monitoring avoidance

Preliminary Evaluation
Hardware CPU: Two Intel Xeon E v4 (10 physical cores) GPU: NVIDIA TITAN Xp NIC: Two Intel X520 (40 Gbps in total) Software DPDK for networking IO CUDA 8.0 for GPU programming NFs & SFCs IPV4Router (1k entries)  NIDS (3k rules)  IPSec (SHA1 & AES-128-CBC)

Performance of RTC vs. Pipelining
95th 33.7% 29.2% and 28.1%

Fast Elastic Scaling Fast converging (< 100ms)

Conclusion and Future Work
Gen: a GPU-accelerated elastic framework for NFV High-performance SFC Elastic scaling Future work More SFC performance enhancement in GPU Coordination between CPU and GPU Impact of dynamic traffic load

Thank You

GEN: A GPU-Accelerated Elastic Framework for NFV

Similar presentations

Presentation on theme: "GEN: A GPU-Accelerated Elastic Framework for NFV"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

GEN: A GPU-Accelerated Elastic Framework for NFV

Similar presentations

Presentation on theme: "GEN: A GPU-Accelerated Elastic Framework for NFV"— Presentation transcript:

Similar presentations

About project

Feedback