memif - shared memory packet interface for container networking

Slides:

Advertisements

Similar presentations

Profit from the cloud TM Parallels Dynamic Infrastructure AndOpenStack.

Advertisements

Keith Wiles DPACC vNF Overview and Proposed methods Keith Wiles – v0.5.

Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.

Home: Phones OFF Please Unix Kernel Parminder Singh Kang Home:

3.5 Interprocess Communication

Tesseract A 4D Network Control Plane

1 Process Description and Control Chapter 3 = Why process? = What is a process? = How to represent processes? = How to control processes?

Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.

Tanenbaum 8.3 See references

Hosting Virtual Networks on Commodity Hardware VINI Summer Camp.

UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.

High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.

A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.

Threads, Thread management & Resource Management.

Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,

EECB 473 DATA NETWORK ARCHITECTURE AND ELECTRONICS PREPARED BY JEHANA ERMY JAMALUDDIN Basic Packet Processing: Algorithms and Data Structures.

Processes CSCI 4534 Chapter 4. Introduction Early computer systems allowed one program to be executed at a time –The program had complete control of the.

Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.

LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.

Processes and Virtual Memory

Full and Para Virtualization

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

1 Process Description and Control Chapter 3. 2 Process A program in execution An instance of a program running on a computer The entity that can be assigned.

Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.

Introduction to Operating Systems Concepts

/csit CSIT Readout to LF OPNFV Project 01 February 2017

/csit CSIT Readout to FD.io Board 08 February 2017

Balazs Voneki CERN/EP/LHCb Online group

Virtual Machine Monitors

New Approach to OVS Datapath Performance

NFV Compute Acceleration APIs and Evaluation

/csit CSIT Readout to FD.io Board 09 February 2017

CIS 700-5: The Design and Implementation of Cloud Networks

BESS: A Virtual Switch Tailored for NFV

Module 3: Operating-System Structures

VPP overview Shwetha Bhandari

Heitor Moraes, Marcos Vieira, Italo Cunha, Dorgival Guedes

5/22/2018 Cisco Live 2017.

CS 6560: Operating Systems Design

Towards a single virtualized data path for VPP

6WIND MWC IPsec Demo Scalable Virtual IPsec Aggregation with DPDK for Road Warriors and Branch Offices Changed original subtitle. Original subtitle:

Case Study 1: UNIX and LINUX

Operating System Concepts

143A: Principles of Operating Systems Lecture 6: Address translation (Paging) Anton Burtsev October, 2017.

Chapter 3 – Process Concepts

OS Virtualization.

Indigo Doyoung Lee Dept. of CSE, POSTECH

Get the best out of VPP and inter-VM communications.

Page Replacement.

Chapter 3: Operating-System Structures

Xen Network I/O Performance Analysis and Opportunities for Improvement

CS703 - Advanced Operating Systems

Integrating DPDK/SPDK with storage application

Chapter 2: Operating-System Structures

Computer Architecture

Introduction to Operating Systems

CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs

Chapter 8: Memory Management strategies

CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors

Chapter 13: I/O Systems I/O Hardware Application I/O Interface

CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs

Outline Operating System Organization Operating System Examples

NetCloud Hong Kong 2017/12/11 NetCloud Hong Kong 2017/12/11 PA-Flow:

CS703 - Advanced Operating Systems

Process Description and Control in Unix

Chapter 13: I/O Systems I/O Hardware Application I/O Interface

CSE 153 Design of Operating Systems Winter 2019

Process Description and Control in Unix

Chapter 2: Operating-System Structures

Presentation transcript:

memif - shared memory packet interface for container networking FastData.io – VPP Damjan Marion, lf-id/iirc: damarion, damarion@cisco.com Maciek Konstantynowicz, lf-id/iirc: mackonstan, mkonstan@cisco.com

Agenda Context Memif Performance SDN/NFV Evolution Bare-metal performance limits Moving “virtualization” to native Memif Motivation Security Control channel Share memory layout Connection lifecycle Performance

Should allow us to get closer to the native bare-metal limits .. SDN NFV Evolution to Cloud-native Moving on from VMs to Pods/Containers Network function workloads moving from VMs to Containers Native code execution on compute nodes, much less execution overhead Lighter workloads, many more of them, much more dynamic environment Orchestration moving from OpenStack VMs to K8s Pods/Containers Pod/Container networking being addressed: Multus, Ligato, “Network Services Mesh” Pressing need for optimized user-mode packet virtual interface Equivalent of “virtio-vhostuser” for Containers, but much faster Must be compatible with Container orchestration stack Opportunity to do it right! Should allow us to get closer to the native bare-metal limits ..

Baremetal Data Plane Performance Limit FD Baremetal Data Plane Performance Limit FD.io benefits from increased Processor I/O YESTERDAY Intel® Xeon® E5-2699v4 22 Cores, 2.2 GHz, 55MB Cache Network I/O: 160 Gbps Core ALU: 4-wide parallel µops Memory: 4-channels 2400 MHz Max power: 145W (TDP) 1 2 3 4 Intel® Xeon® Platinum 8168 24 Cores, 2.7 GHz, 33MB Cache TODAY Network I/O: 280 Gbps Core ALU: 5-wide parallel µops Memory: 6-channels 2666 MHz Max power: 205W (TDP) 1 2 3 4 0 200 400 600 800 1000 1200 160 280 320 560 640 Server [1 Socket] [2 Sockets] 2x [2 Sockets] +75% PCle Packet Forwarding Rate [Gbps] Intel® Xeon® v3, v4 Processors Intel® Xeon® Platinum 8180 Processors 1,120* Gbps * On compute platforms with all PCIe lanes from the Processors routed to PCIe slots. FD.io Takes Full Advantage of Faster Intel® Xeon® Scalable Processors No Code Change Required Breaking the Barrier of Software Defined Network Services 1 Terabit Services on a Single Intel® Xeon® Server ! https://goo.gl/UtbaHy

Cloud-network Services within a Compute Node Diving Deeper Bare-metal Virtual Machines virtio vhost Moving “virtualization” to native compute node execution Containers Get closer to bare-metal speeds .. ! memif

It needs to be fast, but performance is not a number 1 priority. memif – Motivation Create packet based shared memory interface for user-mode application Be container friendly (no privileged containers needed) Support both polling and interrupt mode operation Interrupts simulated with linux eventfd infrastructure Support for interrupt masking in polling mode Support vpp-to-vpp, vpp-to-3rd-party and 3rd-party-to-3rd-party operation Support for multiple queues (incl. asymmetric configurations) Jumbo frames support (chained buffers) Take security seriously Multiple operation mode: ethernet, ip, punt/inject Lightweight library for apps - allows easy creation of applications which communicate over memif It needs to be fast, but performance is not a number 1 priority.

Memory copy is a MUST for security. memif – Security Point-to-point Master/Slave concept: Master - Never exposes memory to slave Slave - Responsible for allocation and sharing memory region(s) to Master Slave can decide if it will expose internal buffers to master or copy data into separate shared memory buffer Shared memory data structures (rings, descriptors) are pointer-free Interfaces are always point-to-point, between master-slave pair Shared memory is initialized on connect and freed on disconnect Interface is uniquely identified by unix socket filename and interface id pair There is optional shared secret support per interface Optionally master can get PID, UID, GID for each connection to socket listener Memory copy is a MUST for security.

memif – Control Channel Implemented as Unix Socket connection (AF_UNIX) Master is socket listener (allows multiple connections on single listener) Slave connects to socket Communication is done with fixed size messages (128 bytes): HELLO (m2s): announce info about Master INIT (s2m): starts interface initialization ADD_REGION (s2m): shares memory region with master (FD passed in ancillary data) ADD_RING (s2m): shares ring information with master (size, offset in mem region, interrupt eventfd) CONNECT (s2m): request interface state to be changed to connected CONNECTED (m2s): notify slave that interface is connected DISCONNECT (m2s, s2m): disconnect interface ACK (m2s, s2m): Acknowledge

memif – Shared Memory layout

memif – Shared Memory layout Rings and buffers in shared memory are referenced with (region_index, offset) pair Much easier to deal with SEGFAULTS caused by eventual memory corruption Slave shares one or more memory regions with master by passing mmap() file descriptor and region size information (ADD_REGION message) Slave initializes rings and descriptors and shares their location (region_index, offset), size, direction and efd with master (ADD_RING) message Each ring contains header and array of buffer descriptors number of descriptors is always power-of-2 for performance reasons (1024 as default) Buffer descriptor is 16 byte data structure which contains: flags (2byte) – space for various flags, currently only used for buffer chaining region_index (2 byte) – memory region where buffer is located offset (4 bytes) – buffer start offset in particular memory region length (4 byte) – length of actual data in the buffer metadata (4 byte) – custom use space

Memif Performance – L2 memif DUT Note: packets are passing “vswitch, vrouter” DUT twice per direction, so the external throughput numbers reported in the table should be doubled to get per CPU core throughput. * Maximum Receive Rate (MRR) Throughput - measured packet forwarding rate under the maximum load offered by traffic generator over a set trial duration, regardless of packet loss. Hsw – Intel Xeon® Haswell, E5-2699v3, 2.3GHz, noHT. Results scaled up to 2.5GHz and HT eanbled. Skx – Intel Xeon® Skylake, Platinum 8180, 2.5GHz, HT enabled. TB – TurboBoost enabled. noTB – TurboBoost disabled

Memif Performance – IPv4 DUT Note: packets are passing “vswitch, vrouter” DUT twice per direction, so the external throughput numbers reported in the table should be doubled to get per CPU core throughput. * Maximum Receive Rate (MRR) Throughput - measured packet forwarding rate under the maximum load offered by traffic generator over a set trial duration, regardless of packet loss. Hsw – Intel Xeon® Haswell, E5-2699v3, 2.3GHz, noHT. Results scaled up to 2.5GHz and HT eanbled. Skx – Intel Xeon® Skylake, Platinum 8§80, 2.5GHz, HT enabled. TB – TurboBoost enabled. noTB – TurboBoost disabled

Summary FD.io memif is a cleanslate virtual packet interface for containers Safe and Secure Zero memory copy on Slave side Optimized for performance (Mpps, Gbps, CPP* and IPC**) Implementation in VPP shows good performance Getting closer to bare-metal limits.. Memif library for apps available Allows easy integration into cloud-native apps for communicating over memif Potential to become de facto standard based on adoption.. * CPP, Cycles Per Packet ** IPC, Instructions per Cycle

THANK YOU !