GPUNFV: a GPU-Accelerated NFV System

Slides:

Advertisements

Similar presentations

Categories of I/O Devices

Advertisements

COM S 614 Advanced Systems Novel Communications U-Net and Active Messages.

University of Michigan Electrical Engineering and Computer Science Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke Sponge: Portable.

Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.

© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.

MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.

GPU Architecture and Programming

Operating Systems Objective n The historic background n What the OS means? n Characteristics and types of OS n General Concept of Computer System.

Lecture 5 Page 1 CS 111 Online Processes CS 111 On-Line MS Program Operating Systems Peter Reiher.

MemcachedGPU Scaling-up Scale-out Key-value Stores Tayler Hetherington – The University of British Columbia Mike O’Connor – NVIDIA / UT Austin Tor M. Aamodt.

Background Computer System Architectures Computer System Software.

Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.

Silberschatz and Galvin  Operating System Concepts Module 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming.

NFP: Enabling Network Function Parallelism in NFV

Introduction to Operating Systems Concepts

CUDA C/C++ Basics Part 2 - Blocks and Threads

Unit 2 Technology Systems

Xin Li, Chen Qian University of Kentucky

Applied Operating System Concepts

New Approach to OVS Datapath Performance

GPU Architecture and Its Application

NFV Compute Acceleration APIs and Evaluation

TensorFlow– A system for large-scale machine learning

BESS: A Virtual Switch Tailored for NFV

OPERATING SYSTEMS CS 3502 Fall 2017

Operating System Structures

Chapter 10: Computer systems (1)

PROCESS MANAGEMENT IN MACH

The Mach System Sri Ramkrishna.

Advanced OS Concepts (For OCR)

Andy Wang COP 5611 Advanced Operating Systems

Chapter 2: System Structures

Linux Operating System Architecture

Assembly Language for Intel-Based Computers, 5th Edition

Multi-PCIe socket network device

TYPES OFF OPERATING SYSTEM

COP 4600 Operating Systems Fall 2010

Operating Systems and Systems Programming

NFP: Enabling Network Function Parallelism in NFV

Chapter 4: Threads.

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks

Department of Computer Science University of California, Santa Barbara

GEN: A GPU-Accelerated Elastic Framework for NFV

Chapter 2: System Structures

NFP: Enabling Network Function Parallelism in NFV

Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy

Dynamic Packet-filtering in High-speed Networks Using NetFPGAs

Operating System Concepts

1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.

Threads Chapter 4.

Multithreaded Programming

Outline Chapter 2 (cont) OS Design OS structure

Prof. Leonardo Mostarda University of Camerino

CSC3050 – Computer Architecture

Software Acceleration in Hybrid Systems Xiaoqiao (XQ) Meng IBM T. J

Chapter 13: I/O Systems I/O Hardware Application I/O Interface

IP Control Gateway (IPCG)

The Main Features of Operating Systems

- When you approach operating system concepts there might be several confusing terms that may look similar but in fact refer to different concepts: multiprogramming, multiprocessing, multitasking,

Chapter 2: Operating-System Structures

NetCloud Hong Kong 2017/12/11 NetCloud Hong Kong 2017/12/11 PA-Flow:

Programming with Shared Memory Specifying parallelism

Department of Computer Science University of California, Santa Barbara

Operating System Concepts

Chapter 13: I/O Systems.

Multicore and GPU Programming

A Closer Look at NFV Execution Models

Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.

Presentation transcript:

GPUNFV: a GPU-Accelerated NFV System Xiaodong Yi, Jingpu Duan, Chuan Wu The University of Hong Kong Hello everyone! I am Duan Jingpu from HKU. I’m going to talk about GPUNFV. GPUNFV is a GPU accelerated NFV system that focuses on stateful service chain processing.

Background NFV(Network Function Virtualization) To run network functions (NFs) as virtualized applications on commodity servers. A network service (service chain) typically consists of a sequence of NFs. First I’m gonna introduce some simple background. NFV has been a hot research trend in recent years. NFV means to run network functions as virtualized applications on commodity servers. There is an important concept in NFV called service chain. A service chain consists of a sequence of NFs and traffic must traverse each NF on the chain in sequence. Service chain helps network operators to accomplish complicated network management.

Background GPU(Graphics Processing Unit) Widely used in 3D graphics and machine learning. Highly parallel architecture. More efficient than CPU when processing a huge block of data in parallel. Then, GPUs have been widely used in 3D graphics and deep learning, due to its highly parallel structure. Generally speaking, GPUs are more powerful when processing large blocks of data in parallel.

Existing Work NFV System GPU Acceleration for Packet Processing Base on CPUs. Use DPDK/Netmap to accelerate. Use RSS to scale to multiple cores. May not scale up to support 40Gbps/100Gbps NIC. GPU Acceleration for Packet Processing Most of the systems are for stateless NFs. Stateful service chain processing is more practical. WHY? Lack of proper software abstraction to provide flow-level micro services for CPU and GPU. Flow state management, communication between flow and important system modules. Example of micro services.

GPUNFV Support stateful service chain processing on GPU. Provide flow-level micro-service on CPU and GPU. Flow actor on CPU. GPU thread on GPU. Provide a set of convenient APIs to implement GPU-based VNFs.

Architecture Architecture of GPUNFV CPU thread Input port Output port Flows Flow Classifier Flow actors Batcher Forwarder Packet batch and state batch Flow states Packets GPU Proxy Send to the flow classifier. Create a new flow actor for each new flow. Forward to flow actors. Poll several times, batcher kicks in to create a packet batch and a state batch. Batch contains correct memory layout to be processed by GPU threads. GPU proxy waits for the previous job to complete. Update flow state on flow actor. Forward the packets out from the forwarder. Launch the next GPU computation. Each flow is processed by a single GPU thread GPU threads Architecture of GPUNFV

Flow Actor Lightweight software abstraction for flow-level micro-service on CPU side. One flow actor handles one flow. Packets in this queue will be fetched by batcher. Flow actor Message handler Packet Queue Finally, at the CPU side, the flow actor facilitates the per-flow management by providing a flow-level micro service. Packets of a flow Flow state storage After the GPU processing Updated flow state

Suppose the number of all flows is m Batcher Suppose the number of all flows is m m Construct correct memory layout for GPU threads. Use page lock memory to avoid data copy. 1 2 3 m+1 m+2 m+3 …… …… Packet batch (on page lock memory) …… Flow state batch (on page lock memory) The batcher fetches flow packets after several polling from the NIC. Guarantee the flow packets of a flow are processed by a single GPU thread. GPU thread executes the same piece of kernel code and uses an index to identify itself. When accessing GPU memory positions, it should adds Packet from flow a, GPU thread index 1 Flow state of flow a Packet from flow b, GPU thread index 2 Flow state of flow b Packet from flow c, GPU thread index 3 Flow state of flow c

GPU Proxy 1. Check completion. (May block waiting.) CPU GPU Flow actor Previous Flow State Batch 2. Update flow state. Flow State 3. Forward processed packets Previous Flow Packet Batch Forwarder Check completion Update flow state Forward processed packets Run next GPU job Current Flow State Batch 4. Run next GPU job. GPU Previous Flow Packet Batch

Dynamic Sizing of Packet Batching GPU proxy may block waiting when checking for completion. Dynamically adjust the size of packet batch. Check for GPU completion Blocking time threashold : 0.1 ms Size of packet batch is the maximum number of packets that a packet batch can hold. Get an initial number Measure the blocking time Adjust the size for the next round

Dynamic Sizing of Packet Batching The size of packet batch affects the number of polling attempts. CPU working time >= GPU processing time. No block waiting.

Experiments Environment Metrics A Mac Pro server equipped with two 2.4GHz 6-Core Intel Xeon E5645 processors and one NVIDIA TITAN X GPU. Metrics Packet Processing Throughput. Processing Time. Dynamic sizing of packet batch.

Packet Processing Throughput Service Chain: flow monitor(FB)->firewall(FW)->load balancer(LB) Flow generated: 50000 flows at the rate of 3Mpps(packets per second) Vertical, should be maximum

Packet Processing Throughput Service Chain: flow monitor(FB)->firewall(FW)->load balancer(LB) Flow generated: 50000 flows, rate of all around flows is about 3Mpps, rate of single flow differs from 20-100pps(packet per second) Batch Size: 40k

Processing Time Service Chain: FM->FW(180rules)->LB Flow generated: 50000 flows at the rate about 3Mpps(packets per second) Running Time: 20 s

Dynamic sizing of packet batch Service Chains : “FM- >FW(60rules)->LB” on one runtime and “FM->FW(180rules)->LB” on another. Flow generated: 50000 flows at the total rate of 4 Mpps. Initial batch size: 320 packets.

Conclusion&Discussion GPUNFV is a GPU-based NFV system, which provides flow-level micro service for stateful service chain. Our prototype works best for stateful computation-intensive service chains. The current prototype may incur a longer packet processing delay.