Your Programmable NIC Should Be a Programmable Switch

Slides:



Advertisements
Similar presentations
Interconnection Networks: Flow Control and Microarchitecture.
Advertisements

What happens when you try to build a low latency NIC? Mario Flajslik.
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
Supercharging PlanetLab : a high performance, Multi-Application, Overlay Network Platform Written by Jon Turner and 11 fellows. Presented by Benjamin Chervet.
Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached Bohua Kou Jing gao.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design N. Vinay Krishnan EE249 Class Presentation.
Katz, Stoica F04 EECS 122 Introduction to Computer Networks (Fall 2003) Network simulator 2 (ns-2) Department of Electrical Engineering and Computer Sciences.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
1 Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles.
Data Center Virtualization: Open vSwitch Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking.
The Future of the Internet Jennifer Rexford ’91 Computer Science Department Princeton University
Router Architectures An overview of router architectures.
Microsoft Virtual Academy Module 4 Creating and Configuring Virtual Machine Networks.
Router Architectures An overview of router architectures.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Programmable Data Planes COS 597E: Software Defined Networking.
Virtualized FPGA accelerators in Cloud Computing Systems
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Software-Defined Networks Jennifer Rexford Princeton University.
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Mapping of scalable RDMA protocols to ASIC/FPGA platforms
VeriFlow: Verifying Network-Wide Invariants in Real Time
TILEmpower-Gx36 - Architecture overview & performance benchmarks – Presented by Younghyun Jo 2013/12/18.
SeGW function offload 1/4 SeGW VNF SmGW VNF Virtual Switch Other VNF VNFs NFVI Network Processor Offload “programming” 1)VNF need to talk to Packet Processor.
Srihari Makineni & Ravi Iyer Communications Technology Lab
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
CS 4396 Computer Networks Lab Router Architectures.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
MemcachedGPU Scaling-up Scale-out Key-value Stores Tayler Hetherington – The University of British Columbia Mike O’Connor – NVIDIA / UT Austin Tor M. Aamodt.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
Forwarding Programming in Protocol- Oblivious Instruction Set Author : Jingzhou Yu, Xiaozhong Wang, Jian Song, Yuanming Zheng, Haoyu Song Conference: 2014.
Virtual-Channel Flow Control William J. Dally
High-Bandwidth Packet Switching on the Raw General-Purpose Architecture Gleb Chuvpilo Saman Amarasinghe MIT LCS Computer Architecture Group January 9,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
VIRTUAL NETWORK PIPELINE PROCESSOR Design and Implementation Department of Communication System Engineering Presented by: Mark Yufit Rami Siadous.
Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen
NFP: Enabling Network Function Parallelism in NFV
Balazs Voneki CERN/EP/LHCb Online group
High Performance and Reliable Multicast over Myrinet/GM-2
Low-Latency Software Rate Limiters for Cloud Networks
BESS: A Virtual Switch Tailored for NFV
Data Center Networks for the Application
Routing and Switching Fabrics
Advanced Computer Networks
6WIND MWC IPsec Demo Scalable Virtual IPsec Aggregation with DPDK for Road Warriors and Branch Offices Changed original subtitle. Original subtitle:
Alternative system models
Chapter 4: Network Layer
Aled Edwards, Anna Fischer, Antonio Lain HP Labs
NFP: Enabling Network Function Parallelism in NFV
What’s “Inside” a Router?
HyperLoop: Group-Based NIC Offloading to Accelerate Replicated Transactions in Multi-tenant Storage Systems Daehyeok Kim Amirsaman Memaripour, Anirudh.
NFP: Enabling Network Function Parallelism in NFV
Dynamic Packet-filtering in High-speed Networks Using NetFPGAs
Azure Accelerated Networking: SmartNICs in the Public Cloud
Open vSwitch HW offload over DPDK
All or Nothing The Challenge of Hardware Offload
Distributed Consensus and Coordination in Hardware
RDMA over Commodity Ethernet at Scale
Javad Ghaderi, Tianxiong Ji and R. Srikant
Lecture 14, Computer Networks (198:552)
Programmable Switches
Beyond FTP & hard drives: Accelerating LAN file transfers
Loom: Flexible and Efficient NIC Packet Scheduling
NetCloud Hong Kong 2017/12/11 NetCloud Hong Kong 2017/12/11 PA-Flow:
Routing and Switching Fabrics
NetFPGA - an open network development platform
Chapter 4: Network Layer
Elmo Muhammad Shahbaz Lalith Suresh, Jennifer Rexford, Nick Feamster,
Offloading Distributed Applications onto SmartNICs using iPipe
Cluster Computers.
Presentation transcript:

Your Programmable NIC Should Be a Programmable Switch HotNets 2018 Brent Stephens Aditya Akella, Mike Swift

Programmable (“Smart”) NICs It is hard for CPUs to keep up with increasing line-rates (5-120ns per packet @100Gbps) Offloading to programmable NICs can help drive increasing line-rates (100Gbps+) P1 PN … CPU NIC App P1 PN … CPU Programmable NIC Offload App

Use Cases: Programmable NICs can accelerate a wide range of cloud applications and services Applications Infrastructure

Problem: Although there are many programmable NICs, no programmable NIC is good at running multiple offloads Mellanox Innova-2 Flex Netronome Agilio LX Cavium Liquid IO II NetFPGA Sume Azure SmartNIC

NIC Requirements: Chaining: Generality: High Performance: Isolation: It should be possible to send packets through offloads in any order Generality: The NIC should not restrict the types of offloads (e.g., FPGAs, ASICs, and CPUs) High Performance: The NIC should forward at line-rate without increasing latency Isolation: Competing offloads should fairly share resources

Insight: Not every packet uses every offload Goal Build a NIC that meets our requirements (and can support a wide-range of diverse offloads) Solution PANIC, a programmable NIC that is a programmable switch Insight: Not every packet uses every offload

Limitations of Existing NIC Designs Outline Limitations of Existing NIC Designs PANIC Overview Motivation

NIC Design Overview x Pipelined NICs Tiled NICs RMT NICs PANIC Chaining Generality Performance Isolation x

Pipelined NICs x x Problems: Benefits: Example: Mellanox Innova-2 Smart NIC NIC Problems: Benefits: x Chaining Generality Isolation Chaining: Static Chaining Isolation: Head-of-Line Blocking General: Offload 1 may be an FPGA while Offload N may be an ASIC x

Tiled NICs x x Problems: Benefits: Example: Cavium Liquid IO II NIC Chaining Generality Performance x Performance: Generality: High latency Requires CPUs Low per-flow tput Benefits: Problems: Chaining: On-chip network makes chaining easy x

RMT NICs x Problem: Benefits: Example: FlexNIC to CPU … P1 M + A N Match + Action 1 M + A N to CPU … … PN Benefits: Problem: x Generality Performance Predictable performance Protocol Independence Not all offloads can be supported (e.g., crypto, compression, and RDMA)

PANIC Overview PANIC Components: Heavyweight RMT Engines: Parse packets and determine offload chain High-throughput on-chip network: Forwards packets between engines Independent Engines Distributed Scheduling: Local priority queues P1 P2 P3 FPGA 1 Core 2 Crypto/Zip 3 DMA RDMA/TCP To CPU RMT 1 RMT 2 RMT 3

PANIC Satisfies Our Requirements Chaining: RMT engines compute source routes Generality: Independent engines may be arbitrary High Performance: RMT engines and the on-chip network provide high performance Isolation: Packets are scheduled at every engine P1 P2 P3 FPGA 1 Core 2 Crypto/Zip 3 DMA RDMA/TCP To CPU RMT 1 RMT 2 RMT 3

Life of a Packet in PANIC Pkt Hdrs (L2/L3/L4) Packets: Search with HW-accel for machine learning Engines: FPGA 1 -> DMA 1 P1 P2 P3 FPGA 1 Core 2 Crypto/Zip 3 DMA RDMA/TCP To CPU RMT 1 RMT 2 RMT 3 Pkt Hdrs (L2/L3/L4) Packets: VSwitch offload for container to container networking Engines: DMA 2 -> RMT 2 -> DMA 1 Pkt Hdrs (L2/L3/L4) Packets: Encrypted One-sided RDMA Engines: Crypto -> RMT 3 -> RDMA -> RMT 3 -> P3

Life of a Packet in PANIC Pkt Hdrs (L2/L3/L4) Packets: Search with HW-accel for machine learning Engines: FPGA 1 -> DMA 1 P1 P2 P3 FPGA 1 Core 2 Crypto/Zip 3 DMA RDMA/TCP To CPU RMT 1 RMT 2 RMT 3 Pkt Hdrs (L2/L3/L4) Packets: VSwitch offload for container to container networking Engines: DMA 2 -> RMT 2 -> DMA 1 Pkt Hdrs (L2/L3/L4) Packets: Encrypted One-sided RDMA Engines: Crypto -> RMT 3 -> RDMA -> RMT 3 -> P3

PANIC Feasibility: RMT Pipeline On-Chip Network PANIC needs sufficient throughput from both: RMT Pipeline On-Chip Network Reasonable RMT pipelines and on-chip networks provide high throughput and long chains!

Future PANIC Implementation and simulation Topology Design and Engine Placement Build new offloads and languages

Conclusions Supporting a wide-range of diverse offloads is difficult on current NICs PANIC overcomes the limitations of existing designs with an on-NIC switch