Support for Smart NICs Ian Pratt. Outline Xen I/O Overview –Why network I/O is harder than block Smart NIC taxonomy –How Xen can exploit them Enhancing.

Slides:



Advertisements
Similar presentations
Diagnosing Performance Overheads in the Xen Virtual Machine Environment Aravind Menon Willy Zwaenepoel EPFL, Lausanne Jose Renato Santos Yoshio Turner.
Advertisements

Device Virtualization Architecture
Unmodified Device Driver Reuse and Improved System Dependability via Virtual Machines J. LeVasseur V. Uhlig J. Stoess S. G¨otz University of Karlsruhe,
Redesigning Xen Memory Sharing (Grant) Mechanism Kaushik Kumar Ram (Rice University) Jose Renato Santos (HP Labs) Yoshio Turner (HP Labs) Alan L. Cox (Rice.
Virtual Switching Without a Hypervisor for a More Secure Cloud Xin Jin Princeton University Joint work with Eric Keller(UPenn) and Jennifer Rexford(Princeton)
Status Report Ian Pratt University of Cambridge and Founder of XenSource Inc. Computer Laboratory.
Xen and the Art of Virtualization Ian Pratt University of Cambridge and Founder of XenSource Inc. Computer Laboratory.
Virtual Machine Queue Architecture Review Ali Dabagh Architect Windows Core Networking Don Stanwyck Sr. Program Manager NDIS Virtualization.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
XEN AND THE ART OF VIRTUALIZATION Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, lan Pratt, Andrew Warfield.
Bart Miller. Outline Definition and goals Paravirtualization System Architecture The Virtual Machine Interface Memory Management CPU Device I/O Network,
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
Xen and the Art of Virtualization A paper from the University of Cambridge, presented by Charlie Schluting For CS533 at Portland State University.
Network Implementation for Xen and KVM Class project for E : Network System Design and Implantation 12 Apr 2010 Kangkook Jee (kj2181)
CMSC 414 Computer and Network Security Lecture 13 Jonathan Katz.
Operating System Support for Virtual Machines Samuel King, George Dunlap, Peter Chen Univ of Michigan Ashish Gupta.
Virtualization for Cloud Computing
Jennifer Rexford Princeton University MW 11:00am-12:20pm SDN Software Stack COS 597E: Software Defined Networking.
Xen and the Art of Virtualization Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, Andrew Warfield.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
E Virtual Machines Lecture 4 Device Virtualization
Tanenbaum 8.3 See references
Zen and the Art of Virtualization Paul Barham, et al. University of Cambridge, Microsoft Research Cambridge Published by ACM SOSP’03 Presented by Tina.
2017/4/21 Towards Full Virtualization of Heterogeneous Noc-based Multicore Embedded Architecture 2012 IEEE 15th International Conference on Computational.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Achieving 10 Gb/s Using Xen Para-virtualized.
Virtualization The XEN Approach. Virtualization 2 CS5204 – Operating Systems XEN: paravirtualization References and Sources Paul Barham, et.al., “Xen.
Lecture 3 Review of Internet Protocols Transport Layer.
Benefits: Increased server utilization Reduced IT TCO Improved IT agility.
Xen Overview for Campus Grids Andrew Warfield University of Cambridge Computer Laboratory.
Xen I/O Overview. Xen is a popular open-source x86 virtual machine monitor – full-virtualization – para-virtualization para-virtualization as a more efficient.
Xen I/O Overview.
© 2010 IBM Corporation Plugging the Hypervisor Abstraction Leaks Caused by Virtual Networking Alex Landau, David Hadas, Muli Ben-Yehuda IBM Research –
IO Memory Management Hardware Goes Mainstream
Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath.
Xen and The Art of Virtualization Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt & Andrew Warfield.
Politecnico di Torino Dipartimento di Automatica ed Informatica TORSEC Group Performance of Xen’s Secured Virtual Networks Emanuele Cesena Paolo Carlo.
CS533 Concepts of Operating Systems Jonathan Walpole.
Nathanael Thompson and John Kelm
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
Windows Server 2012 Hyper-V Networking
Cisco 3 - Switching Perrine. J Page 16/4/2016 Chapter 4 Switches The performance of shared-medium Ethernet is affected by several factors: data frame broadcast.
High Performance Network Virtualization with SR-IOV By Yaozu Dong et al. Published in HPCA 2010.
Outline for Today Announcements –1 st programming assignment coming soon. Objective of the lecture –OS and Virtual Machines.
Virtual Machine Queue Driver Development Sambhrama Mundkur Sr. Software Design Engineer Core Networking
Improving Xen Security through Disaggregation Derek MurrayGrzegorz MilosSteven Hand.
Introduction to virtualization
Operating Systems Security
Full and Para Virtualization
Technical Reading Report Virtual Power: Coordinated Power Management in Virtualized Enterprise Environment Paper by: Ripal Nathuji & Karsten Schwan from.
Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower than CPU.
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
Open Source Virtualisation and Consolidation. Whoami ● Linux and Open Source Consultant ● „Infrastructure Architect“ ● Linux since 0.98 ● IANAKH ● Senior.
Xen and the Art of Virtualization
Presented by Yoon-Soo Lee
CS 286 Computer Organization and Architecture
Xen: The Art of Virtualization
XenFS Sharing data in a virtualised environment
Final Review CS144 Review Session 9 June 4, 2008 Derrick Isaacson
OS Virtualization.
Xen Network I/O Performance Analysis and Opportunities for Improvement
Windows Virtual PC / Hyper-V
Chapter 4 Network Layer Computer Networking: A Top Down Approach 5th edition. Jim Kurose, Keith Ross Addison-Wesley, April Network Layer.
CS703 - Advanced Operating Systems
Xen and the Art of Virtualization
System Virtualization
Presentation transcript:

Support for Smart NICs Ian Pratt

Outline Xen I/O Overview –Why network I/O is harder than block Smart NIC taxonomy –How Xen can exploit them Enhancing Network device channel –NetChannel2 proposal

I/O Architecture Event Channel Virtual MMUVirtual CPU Control IF Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE) Native Device Driver GuestOS (Linux) Device Manager & Control s/w VM0 Native Device Driver GuestOS (Linux) VM1 Front-End Device Drivers GuestOS (Linux) Applications VM2 Front-End Device Drivers GuestOS (Windows) Applications VM3 Safe HW IF Xen Virtual Machine Monitor Back-End Applications

Grant Tables Allows pages to be shared between domains No hypercall needed by granting domain Grant_map, Grant_copy and Grant_transfer operations Signalling via event channels High-performance secure inter-domain communication

Block I/O is easy Block I/O is much easier to virtualize than Network I/O: –Lower # operations per second –The individual data fragments are bigger (page) –Block I/O tends to come in bigger batches –The data typically doesn’t need to be touched Only need to map for DMA DMA can deliver data to final destination –(no need read packet header to determine destination)

Level 0 : Modern conventional NICs Single free buffer, RX and TX queues TX and RX checksum offload Transmit Segmentation Offload (TSO) Large Receive Offload (LRO) Adaptive interrupt throttling MSI support (iSCSI initiator offload – export blocks to guests) (RDMA offload – will help live relocation)

Level 1 : Multiple RX Queues NIC supports multiple free and RX buffer Q’s –Choose Q based on dest MAC, VLAN –Default queue used for mcast/broadcast Great opportunity for avoiding data copy for high-throughput VMs –Try to allocate free buffers from buffers the guest is offering –Still need to worry about bcast, inter-domain etc Multiple TX queues with traffic shapping

Level 2 : Direct guest access NIC allows Q pairs to be mapped into guest in a safe and protected manner –Unprivileged h/w driver in guest –Direct h/w access for most TX/RX operations –Still need to use netfront for bcast,inter-dom Memory pre-registration with NIC via privileged part of driver (e.g. in dom0) –Or rely on architectural IOMMU in future For TX, require traffic shaping and basic MAC/srcIP enforcement

Level 2 NICs e.g. Solarflare / Infiniband Accelerated routes set up by Dom0 –Then DomU can access hardware directly NIC has many Virtual Interfaces (VIs) –VI = Filter + DMA queue + event queue Allow untrusted entities to access the NIC without compromising system integrity –Grant tables used to pin pages for DMA Dom0DomU Hardware Hypervisor Dom0DomU Hardware Hypervisor

Level 3 Full Switch on NIC NIC presents itself as multiple PCI devices, one per guest –Still need to deal with the case when there are more VMs than virtual h/w NIC –Same issue with h/w-specific driver in guest Full L2+ switch functionality on NIC –Inter-domain traffic can go via NIC But goes over PCIe bus twice

NetChannel2 protocol Time to implement a new more extensible protocol (backend can support old & new) –Variable sized descriptors No need for chaining –Explicit fragment offset and length Enable different sized buffers to be queued –Reinstate free-buffer identifiers to allow out- of-order RX return Allow buffer size selection, support multiple RX Q’s

NetChannel2 protocol Allow longer-lived grant mappings –Sticky bit when making grants, explicit un-grant operation Backend free to cache mappings of sticky grants Backend advertises it’s current per-channel cache size –Use for RX free buffers Works great for Windows Linux “alloc_skb_from_cache” patch to promote recycling –Use for TX header fragments Frontend copies header (e.g. 64 bytes) into a pool of sticky mapped buffers Typically no need for backend to map the payload fragments into virtual memory, only for DMA

NetChannel2 protocol Try to defer copy to the receiving guest –Better for accounting and cache behaviour –But, need to be careful to avoid a slow receiving domain from stalling TX domain Use timeout driven grant_copy from dom0 if buffers are stalled Need transitive grants to allow deferred copy for inter-domain communication

Conclusions Maintaining good isolation while attaining high-performance network I/O is hard NetChannel2 improve performance with traditional NICs and is designed to allow Smart NIC features to be fully utilized

Last talk

Smart L2 NIC features Privileged/unprivileged NIC driver model Free/rx/tx descriptor queues into guest Packet demux and tx enforcement Validation of frag descriptors TX QoS CSUM offload / TSO / LRO / intr coalesce

Smart L2 NIC features Packet demux to queues –MAC address (possibly multiple) –VLAN ttag –L3/L4 useful in some environments Filtering –Source MAC address and VLAN enforcement –More advanced filtering TX rate limiting: x KB every y ms

Design decisions Inter-VM communication –Bounce via bridge on NIC –Bounce via switch –Short circuit via netfront Broadcast/multicast Running out of contexts –Fallback to netfront Multiple PCI devs vs. single Card IOMMU vs. architectural

Memory registration Pre-registering RX buffers is easy as they are recycled TX buffers can come from anywhere –Register all guest memory –Copy in guest to pre-registerered buffer –Batch, register and cache mappings Pinning can be done in Xen for architectural IOMMUs, dom0 driver for NIC IOMMUs

VM Relocation Privileged state relocated via xend –Tx rate settings, firewall rules, credentials etc. Guest can carries state and can push down unpriv state on the new device –Promiscuous mode etc Heterogeneous devices –Need to change driver –Device independent way of representing state (more of a challenge for RDMA / TOE)

Design options Proxy device driver –Simplest –Requires guest OS to have a driver Driver in stub domain, communicated to via netchannel like interface –Overhead of accessing driver Driver supplied by hypervisor in guest address space –Highest performance “Architectural” definition of netchannel rings –Way of kicking devices via Xen