GEN: A GPU-Accelerated Elastic Framework for NFV

Slides:

Advertisements

Similar presentations

Live migration of Virtual Machines Nour Stefan, SCPD.

Advertisements

Shredder GPU-Accelerated Incremental Storage and Computation

NetSlices: Scalable Multi-Core Packet Processing in User-Space Tudor Marian, Ki Suh Lee, Hakim Weatherspoon Cornell University Presented by Ki Suh Lee.

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

Supercharging PlanetLab A High Performance,Multi-Alpplication,Overlay Network Platform Reviewed by YoungSoo Lee CSL.

Design and Implementation of a Consolidated Middlebox Architecture 1 Vyas SekarSylvia RatnasamyMichael ReiterNorbert Egi Guangyu Shi.

Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.

Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.

Word Processing, Web Browsing, File Access, etc. Windows Operating System (Kernel) Window (GUI) Platform Dependent Code Virtual Memory “Swap” Block Data.

Operating Systems Should Manage Accelerators Sankaralingam Panneerselvam Michael M. Swift Computer Sciences Department University of Wisconsin, Madison,

Network Support for Cloud Services Lixin Gao, UMass Amherst.

Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.

Improving Network I/O Virtualization for Cloud Computing.

MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.

Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.

Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.

Para-Snort : A Multi-thread Snort on Multi-Core IA Platform Tsinghua University PDCS 2009 November 3, 2009 Xinming Chen, Yiyao Wu, Lianghong Xu, Yibo Xue.

Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.

VTurbo: Accelerating Virtual Machine I/O Processing Using Designated Turbo-Sliced Core Embedded Lab. Kim Sewoog Cong Xu, Sahan Gamage, Hui Lu, Ramana Kompella,

Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.

July 12th 1999Kits Workshop 1 Active Networking at Washington University Dan Decasper.

Emerging applications in cloud High performance computing E-Commerce Media hosting Web hosting Content delivery... –from Amazon AWS survey 1 Emulated network.

NFP: Enabling Network Function Parallelism in NFV

Shaopeng, Ho Architect of Chinac Group

Md Baitul Al Sadi, Isaac J. Cushman, Lei Chen, Rami J. Haddad

Ready-to-Deploy Service Function Chaining for Mobile Networks

Xin Li, Chen Qian University of Kentucky

APUNet: Revitalizing GPU as Packet Processing Accelerator

New Approach to OVS Datapath Performance

NFV Compute Acceleration APIs and Evaluation

Yotam Harchol The Hebrew University of Jerusalem

CIS 700-5: The Design and Implementation of Cloud Networks

A Survey of Network Function Placement

BESS: A Virtual Switch Tailored for NFV

GPUNFV: a GPU-Accelerated NFV System

Abstractions for Network Functions

Gwangsun Kim, Jiyun Jeong, John Kim

Cloud Challenges C. Loomis (CNRS/LAL) EGI-TF (Amsterdam)

KyoungSoo Park Department of Electrical Engineering KAIST

CS427 Multicore Architecture and Parallel Computing

Napatech Acceleration Platform

6WIND MWC IPsec Demo Scalable Virtual IPsec Aggregation with DPDK for Road Warriors and Branch Offices Changed original subtitle. Original subtitle:

15-744: Computer Networking

mOS: An open middlebox platform with programmable network stacks

Written by : Thomas Ristenpart, Eran Tromer, Hovav Shacham,

High-performance tracing of many-core systems with LTTng

Linux Operating System Architecture

Are You Insured Against Your Noisy Neighbor - A VSPERF Use Case

SOFTWARE-BASED NETWORKS: LEVERAGING HIGH-PERFORMANCE NFV PLATFORMS TO MEET FUTURE COMMUNICATION CHALLENGES Time: 16:40 pm - 18:00 pm, 2017/12/28 (Check-in:

Multi-PCIe socket network device

Anna Giannakou Christine Morin, Jean-Louis Pazat, Louis Rilling

NSH_SFC Performance Report FD.io NSH_SFC and CSIT Team

Aled Edwards, Anna Fischer, Antonio Lain HP Labs

Accelerating MapReduce on a Coupled CPU-GPU Architecture

A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.

Linchuan Chen, Xin Huo and Gagan Agrawal

NFP: Enabling Network Function Parallelism in NFV

Network Function Virtualization: Challenges and

OSDI ‘14 Best Paper Award Adam Belay George Prekas Ana Klimovic

NFP: Enabling Network Function Parallelism in NFV

Woojoong Kim Dept. of CSE, POSTECH

Exploring New Principals and Use-Cases in Linux XIA

Smita Vijayakumar Qian Zhu Gagan Agrawal

VNIDS: Towards Elastic Security with Safe and Efficient Virtualization of Network Intrusion Detection Systems Hongda Li1, Hongxin Hu1, Guofei Gu2, Gail-Joon.

IP Control Gateway (IPCG)

Lecture 21, Computer Networks (198:552)

NetCloud Hong Kong 2017/12/11 NetCloud Hong Kong 2017/12/11 PA-Flow:

Wei Zhang, Jinho Hwang, Shriram Rajagopal, k. k

NitroSketch: Robust and General Sketch-based Monitoring in Software Switches Alan (Zaoxing) Liu Joint work with Ran Ben-Basat, Gil Einziger, Yaron Kassner,

A Closer Look at NFV Execution Models

Presentation transcript:

GEN: A GPU-Accelerated Elastic Framework for NFV Zhilong Zheng Jun Bi Chen Sun Heng Yu Hongxin Hu Zili Meng Shuhe Wang Kai Gao Jianping Wu

Network Function Virtualization (NFV) Dedicated Dedicated Dedicated Dedicated NFV: Commodity Hardware Devices VM VM VM VM Service Function Chain (SFC) VPN Monitor Firewall Load Balancer Virtualization Techniques Low cost Elasticity control Service provisioning flexibility

General-purpose Multi-core Servers CPU-based NFV OpenNetVM (HotMiddlebox’16) NetBricks (OSDI’16) NFP (SIGCOMM’17) Metron (NSDI’18) … NFV Platforms NFV Infrastructure General-purpose Multi-core Servers Problems Low performance with negative improvement expectation Coarse-grained scaling

Problems of CPU-based NFV Low performance with negative improvement expectation Hard to achieve high performance (e.g., 40~100Gbps) for a wide range of NFs The slow/end of Moore’s Law Coarse-grained scaling IPSec (AES & SHA1) NIDS (Aho-Corasick) E5-2650 v2 (8 Cores, 2.6 GHz) Go, Younghwan, et al. "APUNet: Revitalizing GPU as Packet Processing Accelerator." NSDI. 2017. 2.6 ~ 7.7 Gbps 4.2 ~ 10.4 Gbps Underutilized 1 Mpps 9 Mpps 11 Mpps 10 Mpps 10 Mpps 1 CPU core 2 CPU cores

GPU as An Accelerator for NFV Benefits of GPU Massive processing cores Fine-grained computing units High-performance NFs Potential Fine-grained resource Existing work Router (PacketShader, SIGCOMM’10) SSL proxy (SSLShader, NSDI’11) NIDS (Kargus, CCS’12) IPSec (NBA, EuroSys’15) NFV framework (G-NET, NSDI’18) High-performance SFCs Problems Unsolved Fine-grained fast Scaling

GEN exploits GPU to support high-performance SFCs with fine-grained scaling

GEN Framework Overview Server Server CPU GPU CPU GPU SFC Manager SFC Manager SFC Controllers SFC Controllers GPU GPU

Infrastructure Design High Performance NIC 10 / 40 / 100 GbE Ports CPU (User Space) GPU (2k~3k physical cores) SFC Manager SFC Controller #1 Global Memory Tx Output Queuing Packet Forwarder Packet Dropper ① Chain #1 NF #1 Chain #1 NF #2 Chain #1 NF #3 SFC Agent #1 ② R Adaptive Batcher SFC Starter Rx …… Chain Classifier SFC Agent #n Chain #n NF #1 …… Chain #n NF #mn R SFC Controller #n Elastic Scaling

Problem #1: SFC Model Selection Pipelining Run-to-completion (RTC) Packets Packets NF1 NF2 NF1 NF2 Instance #1 Instance #2 Instance #1

SFC Model Selection: Pipelining Two potential ways to support pipelining in GPU Sequenced invocations Persistent kernels CPU GPU CPU GPU Packet batch Packet Buffer Packet batch Packet Buffer 1. Packet copying 3. Reading 1. Packet copying 2. Reading Worker-NF1 2. Kernel invocation NF1 Worker-SFC NF1 (persistent) 4. Synchronization 5. Next NF Out 3. Next NF Kernel invocation at startup of the system 6. Kernel invocation Worker-NF2 NF2 NF2 (persistent) 7. Reading 4. Reading 8. Synchronization Out High overhead from frequent kernel invocations (~5us per invocation) Hard and costly scaling

SFC Model Selection: RTC RTC-based Model CPU GPU Less kernel invocations (once per SFC) Packet batch Packet Buffer 1. Packet copying Worker-SFC RTC Model 2. Kernel invocation NF1 4. Synchronization Easier scaling (not persistent) Out NF2 Packet NFs are integrated into a specific SFC Agent kernel fusion SFC Agent (in GPU) is Launched by SFC Starter (in CPU)

Problem #2: Elastic Scaling Avoid monitoring NF load for scaling Avoid deciding when to scale Avoid deciding to what extent an NF should be scaled Avoid considering how to quickly carry out NF scaling Avoid state management caused by scale out / in Intuition: Use scale up / down to avoid state management Adaptive Batcher

Elastic Scaling – Adaptive Batcher Design of the adaptive batcher Keeping the buffer occupancy at a low level Scaling up/in GPU resource provisioning State management avoidance Adaptive Batcher Buffer Packets All packets In the buffer Fetching Batching GPU Scaling up/in more mini-batches in GPU Load monitoring avoidance

Preliminary Evaluation Hardware CPU: Two Intel Xeon E5-2650 v4 (10 physical cores) GPU: NVIDIA TITAN Xp NIC: Two Intel X520 (40 Gbps in total) Software DPDK 17.11 for networking IO CUDA 8.0 for GPU programming NFs & SFCs IPV4Router (1k entries)  NIDS (3k rules)  IPSec (SHA1 & AES-128-CBC)

Performance of RTC vs. Pipelining 95th 33.7% 29.2% and 28.1%

Fast Elastic Scaling Fast converging (< 100ms)

Conclusion and Future Work Gen: a GPU-accelerated elastic framework for NFV High-performance SFC Elastic scaling Future work More SFC performance enhancement in GPU Coordination between CPU and GPU Impact of dynamic traffic load

Thank You http://netarchlab.tsinghua.edu.cn