PISCES: A Programmable, Protocol-Independent Software Switch

Slides:

Advertisements

Similar presentations

1 Building a Fast, Virtualized Data Plane with Programmable Hardware Bilal Anwer Nick Feamster.

Advertisements

IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.

Programming Protocol-Independent Packet Processors

P4 demo: a basic L2/L3 switch in 170 LOC

How to tell your plumbing what to do Protocol Independent Forwarding

P4: specifying data planes

ENGINEERING WORKSHOP Compute Engineering Workshop P4: specifying data planes Mihai Budiu San Jose, March 11, 2015.

Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick.

Outlines Backgrounds Goals Implementation Performance Evaluation

PAGE 2 PAGE 3 INTRODUCING HYPER-V EXTENSIBLE SWITCH.

Introduction to ISA 2004 Dana Epp Microsoft Security MVP.

Keith Wiles DPACC vNF Overview and Proposed methods Keith Wiles – v0.5.

Network Implementation for Xen and KVM Class project for E : Network System Design and Implantation 12 Apr 2010 Kangkook Jee (kj2181)

ECE 526 – Network Processing Systems Design IXP XScale and Microengines Chapter 18 & 19: D. E. Comer.

KVM/ARM: The Design and Implementation of the Linux ARM Hypervisor Fall 2014 Presented By: Probir Roy.

The Case for an Intermediate Representation for Programmable Data Planes Muhammad Shahbaz and Nick Feamster Princeton University.

Jennifer Rexford Princeton University MW 11:00am-12:20pm SDN Software Stack COS 597E: Software Defined Networking.

Microsoft Virtual Academy Module 4 Creating and Configuring Virtual Machine Networks.

Building Compilers for Reconfigurable Switches Lavanya Jose, Lisa Yan, Nick McKeown, and George Varghese 1 Research funded by AT&T, Intel, Open Networking.

InterVLAN Routing Design and Implementation. What Routers Do Intelligent, dynamic routing protocols for packet transport Packet filtering capabilities.

Information-Centric Networks10b-1 Week 13 / Paper 1 OpenFlow: enabling innovation in campus networks –Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru.

Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.

© 2006 Cisco Systems, Inc. All rights reserved. Optimizing Converged Cisco Networks (ONT) Module 4: Implement the DiffServ QoS Model.

1 The Internet and Networked Multimedia. 2 Layering  Internet protocols are designed to work in layers, with each layer building on the facilities provided.

SDX: A Software-Defined Internet eXchange Jennifer Rexford Princeton University

Labelcast Protocol Presented by Wang Hui 80th IETF, March 2011 draft-sunzhigang-sam-labelcast-01.

XStream: Rapid Generation of Custom Processors for ASIC Designs Binu Mathew * ASIC: Application Specific Integrated Circuit.

Virtual Machines Created within the Virtualization layer, such as a hypervisor Shares the physical computer's CPU, hard disk, memory, and network interfaces.

Arbitrary Packet Matching in Openflow

Customizing OVS using P4 Muhammad Shahbaz with Sean Choi, Ben Pfaff, Chaitanya Kodeboyina, Changhoon Kim, Nick McKeown, Nick Feamster, and Jen Rexford.

Full and Para Virtualization

Lecture 12 Virtualization Overview 1 Dec. 1, 2015 Prof. Kyu Ho Park “Understanding Full Virtualization, Paravirtualization, and Hardware Assist”, White.

P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, D. Walker SIGCOMM CCR, 2014 Presented.

Forwarding Programming in Protocol- Oblivious Instruction Set Author : Jingzhou Yu, Xiaozhong Wang, Jian Song, Yuanming Zheng, Haoyu Song Conference: 2014.

VIRTUAL NETWORK PIPELINE PROCESSOR Design and Implementation Department of Communication System Engineering Presented by: Mark Yufit Rami Siadous.

AVS Brazos : IPv6. Agenda AVS IPv6 background Packet flows TSO/TCO Configuration Demo Troubleshooting tips Appendix.

Experiences with Programmable Dataplanes Ronald van der Pol SURFnet TNC 2016, June, Prague (CZ)

Network Virtualization Ben Pfaff Nicira Networks, Inc.

Nick McKeown (Many thanks to Lisa and Lavanya) CS244 Programmable Switches Forwarding metamorphosis: fast programmable match-action processing … Pat Bosshart,

Converging Approaches in Software Switches

P4: Programming Protocol-Independent Packet Processors

Shaopeng, Ho Architect of Chinac Group

COS 561: Advanced Computer Networks

P4 Language and Software Switches

P4: specifying data planes

Programmable switches

Virtualization.

New Approach to OVS Datapath Performance

NFV Compute Acceleration APIs and Evaluation

HULA: Scalable Load Balancing Using Programmable Data Planes

BESS: A Virtual Switch Tailored for NFV

Muhammad Shahbaz Nick Feamster Jennifer Rexford Sean Choi Nick McKeown

Jennifer Rexford Princeton University

P4 (Programming Protocol-independent Packet Processors)

PISCES: A Programmable, Protocol-Independent Software Switch

PISCES: A Programmable, Protocol-Independent Software Switch

Converging Approaches in Software Switches

DC.p4: Programming the forwarding plane of a datacenter switch

Network base Network base.

Reprogrammable packet processing pipeline

All or Nothing The Challenge of Hardware Offload

P4FPGA : A Rapid Prototyping Framework for P4

Enabling Programmable Infrastructure for Multi-Tenant Data Centers

Linux Network Programming with P4

Toward Self-Driving Networks

Toward Self-Driving Networks

Internet Protocol version 6 (IPv6)

SPINE: Surveillance protection in the network Elements

Elmo Muhammad Shahbaz Lalith Suresh, Jennifer Rexford, Nick Feamster,

Chapter 4: outline 4.1 Overview of Network layer data plane

Presentation transcript:

PISCES: A Programmable, Protocol-Independent Software Switch P4 + OVS == Fast Forwarding! Muhammad Shahbaz, Sean Choi, Ben Pfaff, Changhoon Kim, Nick Feamster, Nick McKeown, and Jennifer Rexford Hi, my name is Muhammad Shahbaz. Today, I’m going to talk about our work on PISCES: A Programmable, Protocol Independent Software Switch. In this talk, I will show that P4 with OVS can enable fast packet forwarding and there is hardly any cost of programmability on performance.

Also appears at SIGCOMM 2016! http://goo.gl/wmBmTu This work also appears at SIGCOMM 2016, and you can access the current draft here.

Importance of Software Switches VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM OVS OVS OVS OVS Hypervisor Hypervisor Hypervisor Hypervisor ToR ToR ToR ToR Core @Shahbaz: update … reader notes from here onwards. When we think of Ethernet switches and routers, we tend to think of physical boxes sitting inside the core of a network or at the top of a rack of servers. However, in today’s virtualized data centers like Amazon EC2 and Rackspace, each hypervisor contains a software switch (such as Open vSwitch) for routing traffic to and from virtual machines and the outside world.

Importance of Software Switches VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM OVS OVS OVS OVS Hypervisor Hypervisor Hypervisor Hypervisor ToR ToR ToR ToR Core Because these switches are written in software they can be upgraded more easily than hardware switches.

Ease of Customization? Is it REALLY the case? OVS VM VM VM VM OVS Enable Rapid Development and Deployment of Network Features! Hypervisor ToR Is it REALLY the case? Core That’s why it’s a common assumption that software switches enable rapid development and deployment of network features. But wait a minute, is this really the case?

What about adding new protocols? Ease of Customization? VM VM VM VM For example, OVS supports following tunneling protocols: VXLAN: Virtual Extensible LAN STT: Stateless Transport Tunneling NVGRE: Network Virtualization Generic Routing VM VM VM VM OVS OVS Hypervisor Hypervisor What about adding new protocols? ToR ToR Core For example, OVS currently supports the following tunneling protocols, i.e., VXLAN, STT, and NVGRE. But what about adding new protocols?

Rapid Development & Deployment? OVS Kernel Fast Packet IO (or Forwarding) DPDK A software switch is based on a large body of complex codebase like Kernel, DPDK, and more needed to set up the machinery for fast packet IO (or forwarding).

Rapid Development & Deployment? Requires domain expertize in: Network protocol design Software development OVS Develop Test Deploy Parser Packet Processing Logic Match-Action Pipeline … large, complex codebases. Arcane APIs Kernel Can take 3-6 months to get a new feature in. DPDK And at the same time, one has to specify the logic for packet processing, like parser and match-action pipeline, using methods and interfaces exposed by these complex codebases. Thus, making changes in these switches is a formidable undertaking and requires expertize in network protocol design and software development with the ability to develop, test, and deploy large, complex codebases. It can take up to 3-6 months to push a new feature into the mainstream code. (A usual kernel release cycle is 2 months and then it’s takes a year or so to push the changes into Linux distributions.) Furthermore, customizations require not only incorporating changes into switch code, but also maintaining these customizations across different versions of the switch. Maintaining changes across releases

Rapid Development & Deployment? OVS Parser Match-Action Pipeline Kernel DPDK However, as a protocol designer I’m only interested in specifying how to parse packet headers and the structure of the match-action tables (i.e., which header fields to match and which actions to perform on matching headers). I do not need to understand the complexities of the underlying codebase and the arcane APIs that they expose to enable for fast packet IO.

Rapid Development & Deployment? OVS Parser Match-Action Pipeline Kernel DPDK So what should we do about this?

Rapid Development & Deployment? Parser Match-Action Pipeline 341 lines of code Compile Native OVS OVS Parser Match-Action Pipeline 14,535 lines of code Kernel DPDK How about we separate the packet processing logic from the switch and specify it using P4. And then compile it down to underlying switch, letting the compilation process take care of the arcane APIs provided by these various large and complex codebases. This way, a programmer can specify the packet processing logic of a vanilla OVS using roughly 341 lines of code in P4, whereas, it can take 40x more code when writing the same logic using the APIs exposed by the underlying codebases. So, for us as a network programmer, separating the packet processing logic from the switch clearly has its benefits.

Rapid Development & Deployment? Parser Match-Action Pipeline Performance overhead! Compile OVS Parser Match-Action Pipeline Kernel DPDK However, because we are compiling a P4 program to OVS, it can incur some overhead on performance.

What’s the cost of programmability on Performance? So, in this work, we try to answer this question i.e., what’s the cost of programmability on performance?

PISCES: A Protocol-Independent Software Switch vSwitch OVS Our PISCES prototype uses P4 as the high-level DSL for expressing packet processing logic, and [click] a modified version of OVS as the underlying packet forwarding engine. TODO: Talk about earlier work on P4 to OVS compilation. P4 has other things and we don’t support all of them. Maybe, talk about code-bloating.

PISCES: A Protocol-Independent Software Switch Runtime Flow Rules Compiler Flow Rule Checker parse match action OVS A P4-to-OVS compiler compiles the P4 code to OVS. [click] It generates the parse, match, and action code needed to build the corresponding switch executable. The compilation process also generates an independent type checking program to ensure that any runtime configuration directives from the controller (e.g., insertion of flow rules) conforms to the protocol specified in the P4 program. Executable

PISCES: A Protocol-Independent Software Switch P4 and OVS packet forwarding models. Performance overhead of a naïve mapping from P4 to OVS. PISCES compiler optimizations to reduce the performance overhead. In the remainder of this talk, I will talk about the forwarding model that both P4 and OVS support and show how a naïve mapping from P4 to OVS can lead to performance overhead. I will then discuss different optimizations that we have implemented to reduce this performance overhead.

P4 Forwarding Model (or Post-Pipeline Editing) Ingress Packet Parser Packet Deparser Egress Match-Action Tables Ingress Packet Egress Packet Header Fields In P4 packet forwarding model, [click] a packet parser [click] identifies the headers and [click] extracts them as packet header fields (i.e., essentially a copy of the content of the packets). [click] The match-action tables operate on these header fields. [click] Finally, a packet deparser writes the changes from these header fields back on to the packet. We name this mode of operating of header fields as “post-pipeline editing.”

OVS Forwarding Model (or Inline Editing) Ingress Packet Parser Egress Match-Action Tables Ingress Packet Egress Packet Whereas, in OVS packet forwarding model, [click] a packet parser only [click] identifies the headers and [click] the match-action tables directly operate on these headers inside the packet. We name this mode of operating of packet header fields as “inline editing.”

(Modified) OVS Forwarding Model Ingress Packet Parser Packet Deparser* Egress Match-Action Tables Supports both editing modes: Inline Editing Post-pipeline Editing So, in order to map P4 to OVS, we modified OVS to provide support for both these editing modes

Naïve Mapping from P4 to OVS A naïve compilation of L2L3-ACL benchmark application Performance overhead of ~ 40% We see that a naïve compilation of our benchmark application (which is essentially a router with an access-control list) shows [click] that PISCES has a performance overhead of about 40% compared to the native OVS. We observe that there are two main aspects that significantly affect the performance of PISCES.

Causes of Performance Degradation Packet Parser Match-Action Pipeline Deparser Ingress Egress CPU Cycles per Packet The first one is the number of CPU cycles consumed in processing a single packet.

Causes of Performance Degradation Factors affecting CPU cycles: Extra copy of headers in the post-pipeline editing mode Fully-specified checksum calculation Redundant parsing of header fields and more …

Causes of Performance Degradation Factor #1: Extra copy of headers Editing Mode Pros Cons Post-Pipeline Extra copy of headers Inline No extra copy of headers Post-pipeline editing consumes 2x more cycles than inline editing when parsing VXLAN protocol. The post-pipeline editing mode requires maintaining an extra copy of headers. Whereas, inline editing mode doesn’t require any extra copy. Maintaining these extra copies has a significant overhead. [click] For example, while using a tunneling protocol (like VXLAN), the post-pipeline editing mode consumed 2x more cycles than inline editing mode.

Causes of Performance Degradation Factor #1: Extra copy of headers Editing Mode Pros Cons Post-Pipeline Packets are adjusted once Extra copy of headers Inline No extra copy of headers Multiple adjustments to packet Post-pipeline editing Inline editing The post-pipeline editing mode requires maintaining an extra copy of headers. Whereas, inline editing mode doesn’t require any extra copy. Maintaining these extra copies has a significant overhead. [click] For example, while using a tunneling protocol (like VXLAN), the post-pipeline editing mode consumed 2x more cycles than inline editing mode.

Causes of Performance Degradation Factor #2: Fully-Specified Checksums Incremental-Checksum (ttl) Checksum ( version, ihl, diffserv, totalLen, identification, flags, fragOffset, ttl, protocol, hdrChecksum, srcAddr, dstAddr) decrement(ttl) Ingress Packet Parser Packet Deparser Egress The post-pipeline editing mode requires maintaining an extra copy of headers. Whereas, inline editing mode doesn’t require any extra copy. Maintaining these extra copies has a significant overhead. [click] For example, while using a tunneling protocol (like VXLAN), the post-pipeline editing mode consumed 2x more cycles than inline editing mode.

Causes of Performance Degradation Factor #3: Redundant parsing of headers L3 L2 L2 L4 Ingress Packet Parser Packet Deparser Egress Match-Action Pipeline Our third optimization is parser specialization. Parser specialization is the process of optimizing the parser, so that it only extracts fields that are actually used by the match-action pipeline. [click] For example, if we are only using L2 headers in the match-action pipeline then, in that case, [click] the parser shouldn’t have to extract L3 and L4 headers.

Optimizing for CPU Cycles Optimizations Inline vs. post-pipeline editing Incremental checksum Parser specialization Action specialization Action coalescing However, P4 enabled us to assess and evaluate this cost offline and thus enabled us to optimize for this cost at compile time. We implement seven optimizations, five for per-packet cost and two for cache misses.

Optimizing for CPU Cycles Optimizations Inline vs. post-pipeline editing Incremental checksum Parser specialization Action specialization Action coalescing Extra Copy of Headers Fully-Specified Checksum Redundant Parsing We implement seven optimizations, five for per-packet cost and two for cache misses.

Optimizing for CPU Cycles Optimizations Inline vs. post-pipeline editing Incremental checksum Parser specialization Action specialization Action coalescing We implement seven optimizations, five for per-packet cost and two for cache misses.

Optimized Mapping from P4 to OVS All optimizations together Performance overhead of < 2% After applying all these optimizations, we see that the performance overhead is now less than 2% between the optimized version of PISCES and the native OVS. So, this shows that with appropriate compiler optimizations the cost of programmability on performance is negligible.

Another Cause for Performance Degradation Match-Action Tables Optimizations Cached field modifications Stage assignment Cache Misses Match-Action Cache Ingress Packet Parser Packet Deparser* Egress And the second one is the number of cache misses.

Next Steps Support for stateful memories and INT Integration with the mainline OVS Interning at VMware to make this happen! In summary, we talked about the importance of software switches and the need for rapid development and deployment of network features. We presented PISCES, a programmable and protocol-independent software switch. We showed that with appropriate compiler optimizations, PISCES performs comparable to native software implementations, and there is hardly any cost of programmability on performance! Talk about next steps … !

P4 + OVS == Fast Forwarding! Summary With appropriate compiler optimizations … P4 + OVS == Fast Forwarding! So, in summary, we showed that with appropriate compiler optimizations … P4 and OVS can enable fast packet forwarding and there is hardly any cost of programmability on performance.

Questions? https://github.com/P4-vSwitch Learn more and try PISCES here: https://github.com/P4-vSwitch Muhammad Shahbaz mshahbaz@cs.princeton.edu With this I conclude my presentation. Thank you, and I’m happy to take any questions.