OpenFabrics 2.0 rsockets+ requirements Sean Hefty - Intel Corporation Bob Russell, Patrick MacArthur - UNH.

Slides:



Advertisements
Similar presentations
Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.
Advertisements

More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
KOFI Stan Smith Intel SSG/DPD January, 2015 Kernel OpenFabrics Interface.
Chorus and other Microkernels Presented by: Jonathan Tanner and Brian Doyle Articles By: Jon Udell Peter D. Varhol Dick Pountain.
Operating system services Program execution I/O operations File-system manipulation Communications Error detection Resource allocation Accounting Protection.
Chap 2 System Structures.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
Precept 3 COS 461. Concurrency is Useful Multi Processor/Core Multiple Inputs Don’t wait on slow devices.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
Inter Process Communication:  It is an essential aspect of process management. By allowing processes to communicate with each other: 1.We can synchronize.
DAPL: Direct Access Transport Libraries Introduction and Example Yufei 10/01/2010.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
Basic Input/Output Operations
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Gursharan Singh Tatla Transport Layer 16-May
Stan Smith Intel SSG/DPD June, 2015 Kernel Fabric Interface KFI Framework.
Stan Smith Intel SSG/DPD February, 2015 Kernel OpenFabrics Interface kOFI Framework.
IB ACM InfiniBand Communication Management Assistant (for Scaling) Sean Hefty.
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
Process-to-Process Delivery:
Chapter 17 Networking Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William Stallings.
Protocols for Wide-Area Data-intensive Applications: Design and Performance Issues Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi, Brian.
Presentation on Osi & TCP/IP MODEL
Hardware Definitions –Port: Point of connection –Bus: Interface Daisy Chain (A=>B=>…=>X) Shared Direct Device Access –Controller: Device Electronics –Registers:
I/O Systems I/O Hardware Application I/O Interface
Contact Information Office: 225 Neville Hall Office Hours: Monday and Wednesday 12:00-1:00 and by appointment.
OpenFabrics 2.0 or libibverbs 1.0 Sean Hefty Intel Corporation.
Scalable Fabric Interfaces Sean Hefty Intel Corporation OFI software will be backward compatible.
TCP : Transmission Control Protocol Computer Network System Sirak Kaewjamnong.
1 The Internet and Networked Multimedia. 2 Layering  Internet protocols are designed to work in layers, with each layer building on the facilities provided.
OFI SW - Progress Sean Hefty - Intel Corporation.
Fabric Interfaces Architecture Sean Hefty - Intel Corporation.
Windows Network Programming ms-help://MS.MSDNQTR.2004JAN.1033/winsock/winsock/windows_sockets_start_page_2.htm 井民全.
August 22, 2005Page 1 of (#) Datacenter Fabric Workshop Open MPI Overview and Current Status Tim Woodall - LANL Galen Shipman - LANL/UNM.
Scalable RDMA Software Solution Sean Hefty Intel Corporation.
Storage Interconnect Requirements Chen Zhao, Frank Yang NetApp, Inc.
Reconsidering Internet Mobility Alex C. Snoeren, Hari Balakrishnan, M. Frans Kaashoek MIT Laboratory for Computer Science.
CSC 600 Internetworking with TCP/IP Unit 7: IPv6 (ch. 33) Dr. Cheer-Sun Yang Spring 2001.
Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.
Advanced Sockets API-II Vinayak Jagtap
InfiniBand support for Socket- based connection model by CM Arkady Kanevsky November 16, 2005 version 4.
Fabric Interfaces Architecture Sean Hefty - Intel Corporation.
Stan Smith Intel SSG/DPD February, 2015 Kernel OpenFabrics Interface Initialization.
Processes CSCI 4534 Chapter 4. Introduction Early computer systems allowed one program to be executed at a time –The program had complete control of the.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem.
Silberschatz, Galvin and Gagne  Operating System Concepts UNIT II Operating System Services.
Operating Systems 1 K. Salah Module 1.2: Fundamental Concepts Interrupts System Calls.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 4 Computer Systems Review.
OFI SW Sean Hefty - Intel Corporation. Target Software 2 Verbs 1.x + extensions 2.0 RDMA CM 1.x + extensions 2.0 Fabric Interfaces.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
 Process Concept  Process Scheduling  Operations on Processes  Cooperating Processes  Interprocess Communication  Communication in Client-Server.
CSE 60641: Operating Systems The duality of memory and communication in the implementation of a multiprocessor operating system. Young, M., Tevanian, A.,
IP Protocol CSE TCP/IP Concepts Connectionless Operation Internetworking involves connectionless operation at the level of the Internet Protocol.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
Part IVI/O Systems Chapter 13: I/O Systems. I/O Hardware a typical PCI bus structure 2.
Stan Smith Intel SSG/DPD June, 2015 Kernel Fabric Interface Kfabric Framework.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
SC’13 BoF Discussion Sean Hefty Intel Corporation.
The Transport Layer Implementation Services Functions Protocols
Layered Architectures
Fabric Interfaces Architecture – v4
CSCI 315 Operating Systems Design
Transport Protocols Relates to Lab 5. An overview of the transport protocols of the TCP/IP protocol suite. Also, a short discussion of UDP.
Process-to-Process Delivery:
I/O Systems I/O Hardware Application I/O Interface
Operating Systems Chapter 5: Input/Output Management
Chapter 2: Operating-System Structures
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Chapter 2: Operating-System Structures
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

OpenFabrics 2.0 rsockets+ requirements Sean Hefty - Intel Corporation Bob Russell, Patrick MacArthur - UNH

Data Streaming Current: RDMA CM for connection setup Single wait object and event queue –CM and CQ – use same fd –In-band disconnect notification Associate transport resource with an fd –fstat, dup2 Fork support –Migrate resources between user space and kernel chroot support 2

Data Streaming Current: RDMA write with immediate Eliminate address and rkey exchange –Receiver selects key –Sender uses offset Eliminate need for immediate data –Generate event based on write: location and length 3

Data Streaming Eliminate posting receives –No buffer is provided –Concern is overrunning CQ, not RQ Replace RDMA write with send –Receiver posts single buffer that hardware packs multiple messages into –Eliminates RDMA header Count of completed sends –Full completion data unnecessary 4

Data Streaming Split received data into two buffers –Separate header and user data –Pack tightly, but use multiple buffers Partial completion event –Notification of partial transfer for large requests –Allow receive side to being processing 5

Data Streaming Nonblocking support –Signal when transport is ready to accept new data –Available QP and CQ resources, send credits Keepalive support –0-byte send that does not generate a remote event –Similar to RDMA-write, but eliminate header 6

Datagram User selectable transport address (QPN) –High QPN lookup costs –Message backlog Multi-receive message buffer –Single buffer receives multiple messages –Split received data into two buffers Separate header and user data Pack tightly, but use multiple buffers 7

Datagram Fast address resolution –Compact address data Multicast support –Fast access to multicast group 8

General Requests Increase size of immediate data –Provide easy mechanism to discover if immediate data is supported and size Slab based allocation for receive buffers –Eliminate wasted space dealing with max message size Eliminate posting of ‘dummy’ receive for immediate data 9

General Requests Add timeout parameters to all CM operations –E.g. connect, accept, disconnect, join multicast Timeout parameters for reading events Ability to cancel a pending I/O –Including CM operations 10

General Requests Error handling must be consistent –Do not leave to providers –Document which error codes every call can return Similar to POSIX error code documentation –Use a single error return convention Return -1 and set errno? Return –errno? (prefered) Return +errno? –Consistent error values in events Do not mix transport and errno values –Easy mechanism to display error text 11

General Requests Query current status of local queues –Generating an async event (e.g. SRQ) compounds the issue of dealing with multiple fd’s –Eliminate need for these events or provide in-band notification Support memory registration across multiple devices –Register at the system level, not per PD per HCA 12

General Requests Need simple, programmatic way to detect memory alignment restrictions –Or avoid any alignment needs Need better way to discover supported ‘inline’ sizes –Providers should ensure that that reported values actually improve performance 13

General Requests Define reasonable minimum requirements on providers for: –Number of SGEs –Inline size –Immediate data size –CM private data length With a supported minimum for any message 14

General Requests Asynchronous interface can be source of races –E.g. completions before call returns –Have provider update user counters before generating completion Support multiple providers at run-time Provide test suite to verify provider conformance to API specifications –Example programs –Error conventions –Min/max values 15