Software Routers: NetMap Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking October 8, 2014.

Slides:



Advertisements
Similar presentations
NetSlices: Scalable Multi-Core Packet Processing in User-Space Tudor Marian, Ki Suh Lee, Hakim Weatherspoon Cornell University Presented by Ki Suh Lee.
Advertisements

CS 345 Computer System Overview
Keith Wiles DPACC vNF Overview and Proposed methods Keith Wiles – v0.5.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
Xen and the Art of Virtualization A paper from the University of Cambridge, presented by Charlie Schluting For CS533 at Portland State University.
Network Implementation for Xen and KVM Class project for E : Network System Design and Implantation 12 Apr 2010 Kangkook Jee (kj2181)
Home: Phones OFF Please Unix Kernel Parminder Singh Kang Home:
Disco Running Commodity Operating Systems on Scalable Multiprocessors.
RDMA ENABLED WEB SERVER Rajat Sharma. Objective  To implement a Web Server serving HTTP client requests through RDMA replacing the traditional TCP/IP.
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
Data Center Networks and Basic Switching Technologies Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems.
Data Center Virtualization: Open vSwitch Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking.
Operating System Support for Virtual Machines Samuel King, George Dunlap, Peter Chen Univ of Michigan Ashish Gupta.
1 OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.
A Scalable, Commodity Data Center Network Architecture.
Virtualization for Cloud Computing
Virtual Machine Monitors CSE451 Andrew Whitaker. Hardware Virtualization Running multiple operating systems on a single physical machine Examples:  VMWare,
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Data Center Traffic and Measurements: Available Bandwidth Estimation Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance.
Data Center Virtualization: VirtualWire Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking.
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
Stack Management Each process/thread has two stacks  Kernel stack  User stack Stack pointer changes when exiting/entering the kernel Q: Why is this necessary?
Christopher Bednarz Justin Jones Prof. Xiang ECE 4986 Fall Department of Electrical and Computer Engineering University.
Software Routers: NetSlice Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking October 15,
1 CS503: Operating Systems Part 1: OS Interface Dongyan Xu Department of Computer Science Purdue University.
Performance Tradeoffs for Static Allocation of Zero-Copy Buffers Pål Halvorsen, Espen Jorde, Karl-André Skevik, Vera Goebel, and Thomas Plagemann Institute.
Hosting Virtual Networks on Commodity Hardware VINI Summer Camp.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Achieving 10 Gb/s Using Xen Para-virtualized.
Data Center Virtualization: Xen and Xen-blanket
CS533 Concepts of Operating Systems Jonathan Walpole.
LiNK: An Operating System Architecture for Network Processors Steve Muir, Jonathan Smith Princeton University, University of Pennsylvania
LWIP TCP/IP Stack 김백규.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
Xen I/O Overview. Xen is a popular open-source x86 virtual machine monitor – full-virtualization – para-virtualization para-virtualization as a more efficient.
LWIP TCP/IP Stack 김백규.
1.4 Open source implement. Open source implement Open vs. Closed Software Architecture in Linux Systems Linux Kernel Clients and Daemon Servers Interface.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
A summary by Nick Rayner for PSU CS533, Spring 2006
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
Introduction to virtualization
LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.
Processes and Virtual Memory
Full and Para Virtualization
1.4 Open source implement. Open source implement Open vs. Closed Software Architecture in Linux Systems Linux Kernel Clients and Daemon Servers Interface.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
Exploiting Task-level Concurrency in a Programmable Network Interface June 11, 2003 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
Major OS Components CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
Introduction to Operating Systems Concepts
Virtualization for Cloud Computing
Virtual Machine Monitors
NFV Compute Acceleration APIs and Evaluation
Arrakis: The Operating System is the Control Plane
Jonathan Walpole Computer Science Portland State University
Presented by Yoon-Soo Lee
LWIP TCP/IP Stack 김백규.
CS 286 Computer Organization and Architecture
CSE 153 Design of Operating Systems Winter 2018
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Xen Network I/O Performance Analysis and Opportunities for Improvement
CS703 - Advanced Operating Systems
Lecture Topics: 11/1 General Operating System Concepts Processes
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Translation Buffers (TLB’s)
CSE 153 Design of Operating Systems Winter 2019
Translation Buffers (TLBs)
Review What are the advantages/disadvantages of pages versus segments?
Presentation transcript:

Software Routers: NetMap Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking October 8, 2014 Slides from the “NetMap: A Novel Framework for Fast Packet I/O” at the USENIX Annual Technical Conference (ATC), June 2012.

Overview and Basics Data Center Networks – Basic switching technologies – Data Center Network Topologies (today and Monday) – Software Routers (eg. Click, Routebricks, NetMap, Netslice) – Alternative Switching Technologies – Data Center Transport Data Center Software Networking – Software Defined networking (overview, control plane, data plane, NetFGPA) – Data Center Traffic and Measurements – Virtualizing Networks – Middleboxes Advanced Topics Where are we in the semester?

Goals for Today Gates Data Center and Fractus tour NetMap: A Novel Framework for Fast Packet I/O – L. Rizzo. USENIX Annual Technical Conference (ATC), June 2012, pages

Production-side and Experimental side 1 rack of 19 machines each Network – 1x HP 10/100 Mb management switch – 1x Top-of-Rack switch: Dell Force10 10 Gb Ethernet data switch Contains a bonded pair of 40 Gb links to the research rack (80 Gb total) Servers – 2x 8-core Intel Xeon E GHz CPUs (16 cores/32 threads total) – 96 GB RAM – A bonded pair of 10 Gb Ethernet links (20 Gb total) – 4x 900 GB 10k RPM SAS drives in RAID 0 Fractus Cloud

Server racks TOR switches Tier-1 switches Tier-2 switches Border router Access router Internet Fractus Cloud

Fractus Cloud – Production-side Multiple virtual machines on one physical machine Applications run unmodified as on real machine VM can migrate from one computer to another 6

Goals for Today Gates Data Center and Fractus tour NetMap: A Novel Framework for Fast Packet I/O – L. Rizzo. USENIX Annual Technical Conference (ATC), June 2012, pages

Good old sockets (BPF, raw sockets) – Flexible, portable, but slow Direct Packet I/O Options

Direct Packet I/O options Good old sockets (BPF, raw sockets) – Flexible, portable, but slow – Raw socket: all traffic from all NICs to user-space – Too general, hence complex network stack – Hardware and software are loosely coupled – Applications have no control over resources Network Stack Application Raw socket Network Stack Network Stack Network Stack Network Stack Network Stack Network Stack Network Stack Network Stack Application Network Stack

Good old sockets (BPF, raw sockets) – Flexible, portable, but slow Memory mapped buffers (PF_PACKET, PF_RING) – Efficient, if mbufs/skbufs do not get in the way Direct Packet I/O Options

Good old sockets (BPF, raw sockets) – Flexible, portable, but slow Memory mapped buffers (PF_PACKET, PF_RING) – Efficient, if mbufs/skbufs do not get in the way

Good old sockets (BPF, raw sockets) – Flexible, portable, but slow Memory mapped buffers (PF_PACKET, PF_RING) – Efficient, if mbufs/skbufs do not get in the way Run in the kernel (NETFILTER, PFIL, Netgraph, NDIS, Click) – Can be fast, especially if bypassing mbufs Custom Libraries (OpenOnLoad, Intel DPDK) – Vendor specific: Normally tied to vendor hardware Can we find a better (fast, safe, HW-independent) solution? Direct Packet I/O Options

How slow is the traditional raw socket and host network stack? Traditional Network Stack

Significant amount of time spent at all levels of the stack The system call cannot be avoided (or can it?) Device programming is extremely expensive Complex mbufs are a bad idea Data copies and mbufs can be saved in some cases Headers should be cached and reused if possible Traditional Network Stack

Start from the easy problem: raw packet I/O Simpler to address (some layers are not involved) Useful to determine the maximum achievable performance Poorly supported by current APIs, so the task has a high return (both for me and the users) Motivation for a new design

Design Guidelines No requirement/reliance on special hardware features Amortize costs over large batches (syscalls) Remove unnecessary work (copies, mbufs, alloc/free) Reduce runtime decisions (a single frame format) Modifying device drivers is permitted, as long as the code can be maintained Motivation for a new design

Framework for raw packet I/O from userspace 65 cycles/packet between the wire in userspace – 14Mpps on one 900 MHz core Device and OS independent Good scalability with number of CPU frequency and number of cores libpcap emulation library for easy porting of applications NetMap summary

packet buffers – Numbered and fixed size netmap rings – Device independent copy of NIC ring cur: current tx/rx position avail: available slots Ptrs stored as offsets or indexes (so not dependent of virtual mem) netmap_if – Contains references to all rings attached to an interface Netmap data structure and API

Rely on standard OS mechanism The NIC is not exposed to the userspace Kernel validates the netmap ring before using its contents Cannot crash the kernel or trash other processes memory Protection

No races between kernel and user Global fields, and [ cur…cur+avail-1 ]: – Owned by the application, updated during system calls other slots/buffers – Owned by the kernel Interrupt handler never touches shared memory regions Data Ownership

Access – open: returns a file descriptor (fd) – ioctl: puts an interface into netmap mode – mmap: maps buffers and rings into user address space Transmit (TX) – Fill up to avail bufs, starting at slot cur – ioctl(fd, NIOCTXSYNC) queues the packets Receive (RX) – ioctl(fd, NIOCTXSYNC) reports newly received packets – Process up to avail bufs, starting at cur poll()/select() NetMap API

Access – open: returns a file descriptor (fd) – ioctl: puts an interface into netmap mode – mmap: maps buffers and rings into user address space Transmit (TX) – Fill up to avail bufs, starting at slot cur – ioctl(fd, NIOCTXSYNC) queues the packets Receive (RX) – ioctl(fd, NIOCTXSYNC) reports newly received packets – Process up to avail bufs, starting at cur poll()/select() NetMap API

One netmap ring per physical NIC ring By default, the fd is bound to all NIC rings, but – ioctl can restrict the binding to a single NIC ring pair – multiple fd’s can be bound to different rings on the same card – The fd’s can be managed by different threads – Threads mapped to cores with pthread_setaffinity() Multiqueue/multicore support

Netmap Mode Data path – Normal path between NIC and host stack is removed Control path – OS believes NIC is still there – Ifconfig, ioctl, etc still work NetMap and Host Stack

Netmap Mode Data path – Packets from NIC end up in netmap ring – Packets from TX netmap ring are sent to the NIC Control path – OS believes NIC is still there – Ifconfig, ioctl, etc still work NetMap and Host Stack

Zero-copy forwarding by swapping buffer indexes Zero-copy also works with rings from/to host stack – Firewalls, NAT boxes, IDS mechanisms Zero copy

Performance Results

TX tput vs clock speed and number of cores Performance Results

TX tput vs burst size Performance Results

RX tput vs packet size Performance Results

Forwarding performance R Performance Results

Framework for raw packet I/O from userspace 65 cycles/packet between the wire in userspace – 14Mpps on one 900 MHz core Device and OS independent Good scalability with number of CPU frequency and number of cores libpcap emulation library for easy porting of applications NetMap summary

Before Next time Project Progress – Need to setup environment as soon as possible – And meet with groups, TA, and professor Lab3 – Packet filter/sniffer – Due Thursday, October 16 – Use Fractus instead of Red Cloud Required review and reading for Friday, October 15 – “NetSlices: Scalable Multi-Core Packet Processing in User-Space”, T. Marian, K. S. Lee, and H. Weatherspoon. ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), October 2012, pages – – Check piazza: Check website for updated schedule