GPUNFV: a GPU-Accelerated NFV System Xiaodong Yi, Jingpu Duan, Chuan Wu The University of Hong Kong Hello everyone! I am Duan Jingpu from HKU. I’m going to talk about GPUNFV. GPUNFV is a GPU accelerated NFV system that focuses on stateful service chain processing.
Background NFV(Network Function Virtualization) To run network functions (NFs) as virtualized applications on commodity servers. A network service (service chain) typically consists of a sequence of NFs. First I’m gonna introduce some simple background. NFV has been a hot research trend in recent years. NFV means to run network functions as virtualized applications on commodity servers. There is an important concept in NFV called service chain. A service chain consists of a sequence of NFs and traffic must traverse each NF on the chain in sequence. Service chain helps network operators to accomplish complicated network management.
Background GPU(Graphics Processing Unit) Widely used in 3D graphics and machine learning. Highly parallel architecture. More efficient than CPU when processing a huge block of data in parallel. Then, GPUs have been widely used in 3D graphics and deep learning, due to its highly parallel structure. Generally speaking, GPUs are more powerful when processing large blocks of data in parallel.
Existing Work NFV System GPU Acceleration for Packet Processing Base on CPUs. Use DPDK/Netmap to accelerate. Use RSS to scale to multiple cores. May not scale up to support 40Gbps/100Gbps NIC. GPU Acceleration for Packet Processing Most of the systems are for stateless NFs. Stateful service chain processing is more practical. WHY? Lack of proper software abstraction to provide flow-level micro services for CPU and GPU. Flow state management, communication between flow and important system modules. Example of micro services.
GPUNFV Support stateful service chain processing on GPU. Provide flow-level micro-service on CPU and GPU. Flow actor on CPU. GPU thread on GPU. Provide a set of convenient APIs to implement GPU-based VNFs.
Architecture Architecture of GPUNFV CPU thread Input port Output port Flows Flow Classifier Flow actors Batcher Forwarder Packet batch and state batch Flow states Packets GPU Proxy Send to the flow classifier. Create a new flow actor for each new flow. Forward to flow actors. Poll several times, batcher kicks in to create a packet batch and a state batch. Batch contains correct memory layout to be processed by GPU threads. GPU proxy waits for the previous job to complete. Update flow state on flow actor. Forward the packets out from the forwarder. Launch the next GPU computation. Each flow is processed by a single GPU thread GPU threads Architecture of GPUNFV
Flow Actor Lightweight software abstraction for flow-level micro-service on CPU side. One flow actor handles one flow. Packets in this queue will be fetched by batcher. Flow actor Message handler Packet Queue Finally, at the CPU side, the flow actor facilitates the per-flow management by providing a flow-level micro service. Packets of a flow Flow state storage After the GPU processing Updated flow state
Suppose the number of all flows is m Batcher Suppose the number of all flows is m m Construct correct memory layout for GPU threads. Use page lock memory to avoid data copy. 1 2 3 m+1 m+2 m+3 …… …… Packet batch (on page lock memory) …… Flow state batch (on page lock memory) The batcher fetches flow packets after several polling from the NIC. Guarantee the flow packets of a flow are processed by a single GPU thread. GPU thread executes the same piece of kernel code and uses an index to identify itself. When accessing GPU memory positions, it should adds Packet from flow a, GPU thread index 1 Flow state of flow a Packet from flow b, GPU thread index 2 Flow state of flow b Packet from flow c, GPU thread index 3 Flow state of flow c
GPU Proxy 1. Check completion. (May block waiting.) CPU GPU Flow actor Previous Flow State Batch 2. Update flow state. Flow State 3. Forward processed packets Previous Flow Packet Batch Forwarder Check completion Update flow state Forward processed packets Run next GPU job Current Flow State Batch 4. Run next GPU job. GPU Previous Flow Packet Batch
Dynamic Sizing of Packet Batching GPU proxy may block waiting when checking for completion. Dynamically adjust the size of packet batch. Check for GPU completion Blocking time threashold : 0.1 ms Size of packet batch is the maximum number of packets that a packet batch can hold. Get an initial number Measure the blocking time Adjust the size for the next round
Dynamic Sizing of Packet Batching The size of packet batch affects the number of polling attempts. CPU working time >= GPU processing time. No block waiting.
Experiments Environment Metrics A Mac Pro server equipped with two 2.4GHz 6-Core Intel Xeon E5645 processors and one NVIDIA TITAN X GPU. Metrics Packet Processing Throughput. Processing Time. Dynamic sizing of packet batch.
Packet Processing Throughput Service Chain: flow monitor(FB)->firewall(FW)->load balancer(LB) Flow generated: 50000 flows at the rate of 3Mpps(packets per second) Vertical, should be maximum
Packet Processing Throughput Service Chain: flow monitor(FB)->firewall(FW)->load balancer(LB) Flow generated: 50000 flows, rate of all around flows is about 3Mpps, rate of single flow differs from 20-100pps(packet per second) Batch Size: 40k
Processing Time Service Chain: FM->FW(180rules)->LB Flow generated: 50000 flows at the rate about 3Mpps(packets per second) Running Time: 20 s
Dynamic sizing of packet batch Service Chains : “FM- >FW(60rules)->LB” on one runtime and “FM->FW(180rules)->LB” on another. Flow generated: 50000 flows at the total rate of 4 Mpps. Initial batch size: 320 packets.
Conclusion&Discussion GPUNFV is a GPU-based NFV system, which provides flow-level micro service for stateful service chain. Our prototype works best for stateful computation-intensive service chains. The current prototype may incur a longer packet processing delay.