Overview QEMU VM VMWare pvrdma Func1: RoCE Func0: ETH ETH

Slides:



Advertisements
Similar presentations
Device Layer and Device Drivers
Advertisements

Device Virtualization Architecture
Device Drivers. Linux Device Drivers Linux supports three types of hardware device: character, block and network –character devices: R/W without buffering.
I/O and Networking Fred Kuhns
Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.
Accelerating the Path to the Guest
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
1 Device Management The von Neumann Architecture System Architecture Device Management Polling Interrupts DMA operating systems.
CSI 400/500 Operating Systems Spring 2009 Lecture #2 – Functional Parts of an Operating System Monday January 23, 2009.
虛擬化技術 Virtualization Techniques
CSE 451: Operating Systems Winter 2010 Module 15 I/O Mark Zbikowski Gary Kimura.
I/O Tanenbaum, ch. 5 p. 329 – 427 Silberschatz, ch. 13 p
Stan Smith Intel SSG/DPD June, 2015 Kernel Fabric Interface KFI Framework.
Hardware Definitions –Port: Point of connection –Bus: Interface Daisy Chain (A=>B=>…=>X) Shared Direct Device Access –Controller: Device Electronics –Registers:
Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS6: Device Management 6.1. Principles of I/O.
Ethernet Driver Changes for NET+OS V5.1. Design Changes Resides in bsp\devices\ethernet directory. Source code broken into more C files. Native driver.
I/O management is a major component of operating system design and operation Important aspect of computer operation I/O devices vary greatly Various methods.
RDMA Bonding Liran Liss Mellanox Technologies. Agenda Introduction Transport-level bonding RDMA bonding design Recovering from failure Implementation.
Datacenter Fabric Workshop August 22, 2005 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies
UDI Tutorial & Driver Walk-Through Part 1 Kurt Gollhardt SCO Core OS Architect
Infiniband and RoCEE Virtualization with SR-IOV
UDI Advanced Topics DMA and Interrupts Robert Lipe UDI Development Team Lead
OSes: 2. Structs 1 Operating Systems v Objective –to give a (selective) overview of computer system architectures Certificate Program in Software Development.
Input Output Techniques Programmed Interrupt driven Direct Memory Access (DMA)
OpenFabrics 2.0 rsockets+ requirements Sean Hefty - Intel Corporation Bob Russell, Patrick MacArthur - UNH.
MINIX Presented by: Clinton Morse, Joseph Paetz, Theresa Sullivan, and Angela Volk.
Stan Smith Intel SSG/DPD June, 2015 Kernel Fabric Interface Kfabric Framework.
Review ATA - IDE Project name : ATA – IDE Training Engineer : Minh Nguyen.
CSCE451/851 Introduction to Operating Systems
Input/Output (I/O) Important OS function – control I/O
I/O Systems Shmuel Wimer prepared and instructed by
Chapter 13: I/O Systems.
Input/Output Device Drivers
Module 12: I/O Systems I/O hardware Application I/O Interface
Chapter 13: I/O Systems Modified by Dr. Neerja Mhaskar for CS 3SH3.
Interrupts and Interrupts Handling
Why VT-d Direct memory access (DMA) is a method that allows an input/output (I/O) device to send or receive data directly to or from the main memory, bypassing.
Agenda About us Why para-virtualize RDMA Project overview Open issues
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Chapter 2: System Structures
Input/Output.
I/O system.
Key Terms By: Kelly, Jackson, & Merle
XenFS Sharing data in a virtualised environment
Chapter 8 Input/Output I/O basics Keyboard input Monitor output
OS Virtualization.
Chapter 13: I/O Systems.
Input Device Interrupt Latency of KVM on ARM using Passthrough
Computer System Overview
CPSC 457 Operating Systems
I/O Systems I/O Hardware Application I/O Interface
Operating System Concepts
13: I/O Systems I/O hardwared Application I/O Interface
CS703 - Advanced Operating Systems
vDPA for Vhost Acceleration
CS 140 Lecture Notes: Virtual Machines
Virtio/Vhost Status Quo and Near-term Plan
Accelerate Vhost with vDPA
CSE 451: Operating Systems Spring 2008 Module 15 I/O
CSE 451: Operating Systems Spring 2007 Module 15 I/O
Computer System Overview
© 2012 Elsevier, Inc. All rights reserved.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
System Calls System calls are the user API to the OS
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Chapter 1: Introduction CSS503 Systems Programming
Chapter 13: I/O Systems.
Module 12: I/O Systems I/O hardwared Application I/O Interface
Interrupt Message Store
III. Operating System Structures
Presentation transcript:

Overview QEMU VM VMWare pvrdma Func1: RoCE Func0: ETH ETH PVRDMA device TAP Kernel Ib device rxe PF VF VF VF netdev netdev HW RoCE device ETH ETH

PCI registers BAR 0: MSIX Vec0: RING EVT | Vec1: Async EVT | Vec3: Cmd EVT BAR 1: Regs VERSION | DSR | CTL | REQ | ERR | ICR | IMR | MAC BAR 2: UAR QP_NUM | SEND|RECV || CQ_NUM | ARM|POLL

PCI registers – BAR 1 BAR 1: Regs VERSION | DSR | CTL | REQ | ERR | ICR | IMR | MAC Version DSR: Device Shared Region (command channel) - CMD slot address - RESPONSE slot address - Device CQ ring CTL - Activate device - Quiesce - Reset REQ - Execute command at CMD slot - The result will be stored in RESPONSE slot ERR - Error details IMR: - Interrupt mask

PCI registers – BAR 2 BAR 2: UAR QP_NUM | SEND|RECV || CQ_NUM | ARM|POLL QP operations - Guest driver writes to QP offset (0) QP num and op mask - The write operation ends after the operation is sent to the backend device CQ operations - Guest driver writes to CQ offset (4) CQ num and op mask - The write operation ends after the command is sent to Only one command at a time

Resource manager All resources are “virtualized” (1-1 mapping with backend dev) PDs, QPs, CQs,… The pvrdma device shares the resources memory allocated by the guest driver by the means of a “shared page directory” Host VM HW QEMU Memory CQ HW CQ Virt CQ QP CQ Page directory CQ Page directory Driver Snd HW QP VirtQP QP Page directory Rcv QP Page directory CMD Channel (DSR) / Info passed in Create QP/CQ

Flows – Create CQ Guest driver Allocates pages for CQ ring Creates page directory to hold CQ ring's pages Initializes CQ ring Initializes 'Create CQ' command object (cqe, pdir etc) Copies the command to 'command' address (DSR) Writes 0 into REQ register Device Reads request object from 'command' address Allocates CQ object and initialize CQ ring based on pdir Creates backend(HW) CQ Writes operation status to ERR register Posts command-interrupt to guest Reads HW response code from ERR register

Flows – Create QP Guest driver Allocates pages for send and receive rings Creates page directory to hold the ring's pages Initializes 'Create QP' command object (max_send_wr, send_cq_handle, recv_cq_handle, pdir etc) Copies the object to 'command' address Writes 0 into REQ register Device Reads request object from 'command' address Allocates QP object and initialize Send and recv rings based on pdir Send and recv ring state Creates backend QP Writes operation status to ERR register Posts command-interrupt to guest Reads HW response code from ERR register

Flows – Post receive Guest driver Initializes a wqe and place it on recv ring Writes to qpn|qp_recv_bit (31) to QP offset in UAR Device Extracts qpn from UAR Walks through the ring and do the following for each wqe Prepares backend CQE context to be used when receiving completion from backend (wr_id, op_code, emu_cq_num) For each sge prepares backend sge; maps the dma address to qemu virtual address Calls backend's post_send

Flows – Process completion A dedicated thread is used to process backend events At initialization it attach to device and create communication channel Thread main loop: Polls completion Unmaps sge's virtual addresses Extracts emu_cq_num, wr_id and op_code from context Writes CQE to CQ ring Writes CQ number to device CQ Sends completion-interrupt to guest Deallocates context Acks the event (ibv_ack_cq_events)

Open issues QP1 – RDMA CM Work with multiple RXE instances Work with multiple GIDS How to pass MAC/GID to guest