Programmable Switches

Slides:

Advertisements

Similar presentations

IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.

Advertisements

Programming Protocol-Independent Packet Processors

P4: specifying data planes

ENGINEERING WORKSHOP Compute Engineering Workshop P4: specifying data planes Mihai Budiu San Jose, March 11, 2015.

NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius

Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick.

OpenFlow overview Joint Techs Baton Rouge. Classic Ethernet Originally a true broadcast medium Each end-system network interface card (NIC) received every.

Slick: A control plane for middleboxes Bilal Anwer, Theophilus Benson, Dave Levin, Nick Feamster, Jennifer Rexford Supported by DARPA through the U.S.

An Overview of Software-Defined Network Presenter: Xitao Wen.

SDN and Openflow.

Scalable Flow-Based Networking with DIFANE 1 Minlan Yu Princeton University Joint work with Mike Freedman, Jennifer Rexford and Jia Wang.

Router Architecture : Building high-performance routers Ian Pratt

Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:

Chapter 19 Binding Protocol Addresses (ARP) Chapter 20 IP Datagrams and Datagram Forwarding.

Router Architectures An overview of router architectures.

OpenFlow Switch Limitations. Background: Current Applications Traffic Engineering application (performance) – Fine grained rules and short time scales.

Building Compilers for Reconfigurable Switches Lavanya Jose, Lisa Yan, Nick McKeown, and George Varghese 1 Research funded by AT&T, Intel, Open Networking.

An Overview of Software-Defined Network Presenter: Xitao Wen.

Jennifer Rexford Princeton University MW 11:00am-12:20pm Programmable Data Planes COS 597E: Software Defined Networking.

ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Software-Defined Networks Jennifer Rexford Princeton University.

Dr. John P. Abraham Professor UTPA

Routers. These high-end, carrier-grade 7600 models process up to 30 million packets per second (pps).

Packet Forwarding. A router has several input/output lines. From an input line, it receives a packet. It will check the header of the packet to determine.

Arbitrary Packet Matching in Openflow

P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, D. Walker SIGCOMM CCR, 2014 Presented.

P4: Programming Protocol-Independent Packet Processors

COS 561: Advanced Computer Networks

Programmable switches

HULA: Scalable Load Balancing Using Programmable Data Planes

Programming SDN Newer proposals Frenetic (ICFP’11) Maple (SIGCOMM’13)

P4 (Programming Protocol-independent Packet Processors)

Packet Forwarding.

NOX: Towards an Operating System for Networks

CS4470 Computer Networking Protocols

April 28, 2017 SUMIT MAHESHWARI INES UGALDE

dRMT: Disaggregated Programmable Switching

Chapter 6: Network Layer

Load Balancing Memcached Traffic Using SDN

SDN Overview for UCAR IT meeting 19-March-2014

Net 323: NETWORK Protocols

Srinivas Narayana MIT CSAIL October 7, 2016

CS 1652 The slides are adapted from the publisher’s material

Programmable Networks

What’s “Inside” a Router?

The Stanford Clean Slate Program

CS 31006: Computer Networks – The Routers

Sonata Query-driven Streaming Network Telemetry

DC.p4: Programming the forwarding plane of a datacenter switch

Toward Taming Policy Enforcement for SDN_______ in the RIGHT way_

Dr. John P. Abraham Professor UTPA

Network Core and QoS.

Dr. John P. Abraham Professor UTRGV, EDINBURG, TX

Dr. John P. Abraham Professor UTPA

Implementing an OpenFlow Switch on the NetFPGA platform

Sonata: Query-Driven Streaming Network Telemetry

Net 323 D: Networks Protocols

P4FPGA : A Rapid Prototyping Framework for P4

EE 122: Lecture 7 Ion Stoica September 18, 2001.

Programmable Networks

Network Layer: Control/data plane, addressing, routers

Project proposal: Questions to answer

Ch 17 - Binding Protocol Addresses

Techniques and problems for

Taurus: An Intelligent Data Plane

Design principles for packet parsers

Network Core and QoS.

Control-Data Plane Separation

Control-Data Plane Separation

Chapter 4: outline 4.1 Overview of Network layer data plane

Presentation transcript:

Programmable Switches Lecture 13, Computer Networks (198:552)

Part of the control plane SDN router data plane Part of the control plane Data plane implements per- packet decisions On behalf of control & management planes Forward packets at high speed Manage contention for switch/link resources (part of) control plane Processor data plane Switching fabric Net intf Net intf Net intf Net intf

Life of a packet: RMT architecture Many modern switches share similar architecture FlexPipe, Xpliant, Tofino, … Pipelined packet processing with a 1 GHz clock

What data plane policies might we need? Parsing Ex: Turn raw bits 0x0a000104fe into IP header 10.0.1.4 and proto 254 Stateless lookups Ex: Send all packets with protocol 254 through port 5 Stateful processing Ex: If # packets sent from any IP in 10.0/16 exceeds 500, drop Traffic management Ex: Packets from 10.0/16 have high priority unless rate > 10 Kb/s Buffer management Ex: restrict all traffic outside of 10.0/16 to 80% of the switch buffer

Programmability Allow network designers/operators to specify all of the above Needs hardware design and language design Software pkt processing could incorporate all of these features However: limited throughput, low port density, high power Key Q: Can we achieve programmability with high performance?

Programmability: Topics today 1: Packet parsing 2: Flexible stateless processing 3: Flexible stateful processing 4, if we have time: complex policies without perf penalties

(1) Packet parsing: Need to generalize In the beginning, OpenFlow was simple: Match-Action Single rule table on a fixed set of fields (12 fields in OF 1.0) Needed new encapsulation formats, different versions of protocols, additional measurement-related headers Number of headers ballooned to 41 in OF 1.4 specification! With multiple stages of heterogenous tables

(1) Parsing abstractions Goal: can we make transforming bits to headers more flexible? A parser state machine where each state may emit headers TCP IP Ethernet UDP Payload Custom protocol

(1) Parsing implementation in hardware Use TCAM to store state machine transitions & hdr bit locations Extract fields into packet header vector in a separate action RAM

(2) How are the parsed headers used? Headers carried through the rest of the pipeline To be used in general-purpose match-action tables Headers

(2) Abstractions for stateless processing Goal: specify a set of tables & control flow between them Actions: more general than OpenFlow 1.0 forward/drop/count Copy, add, remove headers Arithmetic, logical, and bit-vector operations! Set metadata on packet header for control flow between tables

(2) Table dependency graph (TDG)

(2) Match-action table implementation Mental model: Match and Action units supplied with the Packet Header Vector Each pipeline stage accesses its own local memory PHV

(2) Match-action table implementation Hardware realization: separately configurable memory blocks SRAM Exact match Action memory Statistics! PHV PHV TCAM ternary match

(2) Match-action table implementation Hardware realization: separately configurable memory blocks SRAM Exact match Action memory Statistics! PHV Match RAM blocks also contain pointers to action memory and instructions PHV TCAM ternary match

(1,2) Parse & pipeline specification with High-level goals Allow reconfiguring packet processing in the field Protocol independent Target independent Declarative: specify parse graph and TDG Headers, parsing, metadata Tables, actions, control flow P4 separates table configuration from table population

(1,2) Parse & pipeline specification with Header and state machine spec header_type ethernet_t { fields { dstMac : 48; srcMac : 48; ethType : 16; } header ethernet_t ethernet; parser start { extract(ethernet); return ingress; }

(1,2) Parse & pipeline specification with Actions Rule Table action _drop() { drop(); } action fwd(dport) { modify_field(standard_metadata. egress_spec, dport); table forward { reads { ethernet.dstMac: exact; } actions { fwd; _drop; size: 200; Control Flow control ingress { apply(forward); }

(3) Flexible stateful processing What if the action depends on previously seen (other) packets? Example: send every 100th packet to a measurement server Other examples: Flowlet switching, DNS TTL change tracking, XCP, … Actions in a single match-action table aren’t expressive enough Example: if (pkt.field1 + pkt.field2 == 10) { counter++; }

(3) An example: “Flowlet” load balancing Consider the time of arrival of the current packet and the last packet of the same flow If current packet arrives 1 ms later than the last packet did, consider rerouting the packet to balance load Else, keep the packet on the same route as the last packet Q: why might you want to do this?

(3) Abstraction: Packet transaction A piece of code along with state that runs to completion on each packet before processing the next [Domino’16] Why is this challenging to implement on switch hardware? Hint: Switch is clocked at 1 GHz!

(3) Abstraction: Packet transaction A piece of code along with state that runs to completion on each packet before processing the next [Domino’16] Why is this challenging to implement on switch hardware? Hint: Switch is clocked at 1 GHz! (1) Switch must process a new packet every 1 ns Transaction code may not run completely in one pipeline stage

(3) Abstraction: Packet transaction A piece of code along with state that runs to completion on each packet before processing the next [Domino’16] Why is this challenging to implement on switch hardware? Hint: Switch is clocked at 1 GHz! (1) Switch must process a new packet every 1 ns Transaction code may not run completely in one pipeline stage (2) Read and write to state must happen in the same pipeline stage Need atomic operation in hardware

(3) Insight #1: Stateful atoms The atoms constitute the switch’s action instruction set: run under 1 ns

(3) Insight #2: Pipeline the stateless actions if (pkt.field1 + pkt.field2 == 10) { counter ++; } Have a compiler do this analysis for us  Stateless operations (whose results depend only on the current packet) can execute over multiple stages Only the stateful operation must run atomically in one pipeline stage

(4) Implementing complex policies What if you have a very large P4 (or Domino) program? Ex: too many logical tables in TDG Ex: logical table keys are too wide Sharing memory across stages leads to paucity in physical tables Ex: too many (stateless) actions per logical table Sharing compute across stages leads to paucity in physical tables Solution in RMT architecture: Re-circulation

(4) Re-circulation “extends” the pipeline Recirculate to ingress But throughput drops by 2x!

(4) Decouple pkt compute & mem access! Allow packet processing to run to completion on separate physical processors [dRMT’17] Aggregate per-stage memory clusters into a shared memory pool Crossbar enables all processors to access each memory Schedule instructions on each core to avoid contention

(4) RMT: compute and memory access Stage 1 Stage 2 Stage N Parser Match Action Deparser Pkt Header Queues In Out keys results As background, I’ll briefly review how programmable switches are architected in hardware. For concreteness, I’ll focus on the RMT architecture because it is representative of many commercial products today. In the RMT architecture, a parser turns bytes from the wire into a bag of packet headers. In each pipeline stage, a programmable match unit extracts the relevant part of the packet header as a key to look up in the match-action table. It then sends this to the memory cluster, which performs the lookup and returns a result. This result is used by a programmable action unit to transform the packet headers appropriately. This process then repeats itself. Memory Cluster 1 Memory Cluster 2 Memory Cluster N

(4) dRMT: Memory disaggregation Stage 1 Stage 2 Stage N Parser Match Action Deparser Queues In Out First, we disagg. memory. We replace the stage-local memory in the pipeline with a shared array of memory clusters accessible from any stage through a crossbar. Now, the stages in aggregate have access to all the memory in the system, because the memory doesn’t belong to any one stage in particular. Memory Cluster 1 Memory Cluster 2 Memory Cluster N

(4) dRMT: Compute disaggregation Queues Parser In Deparser Distributor Out Proc. 1 Proc. 2 Proc. N Match Action Match Action Match Action Pkt 1 Pkt 2 Pkt N Next, we disaggregate compute. We replace each of the pipeline stages that is rigidly forced to always execute matches followed by actions with a match-action processor that can execute matches and actions in any order that respects program dependencies. Now, once packets have been parsed, they are distributed by a distributor to one of the processors in round-robin order. So the first packet goes to processor 1, the second to processor 2, and so on. Once a processor receives a packet, it is responsible for carrying out all operations on that packet. Unlike the pipeline, packets don’t move around between processors. Let’s look at pkt 2 on proc 2. Over the duration of this packet, the proc might access tables in different memory clusters. Memory Cluster 1 Memory Cluster 2 Memory Cluster N