Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Slides:



Advertisements
Similar presentations
Executional Architecture
Advertisements

Umut Girit  One of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer.
KOFI Stan Smith Intel SSG/DPD January, 2015 Kernel OpenFabrics Interface.
28.2 Functionality Application Software Provides Applications supply the high-level services that user access, and determine how users perceive the capabilities.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
EEC-484/584 Computer Networks Lecture 3 Wenbing Zhao
Socket Programming.
CS3771 Today: network programming with sockets  Previous class: network structures, protocols  Next: network programming Sockets (low-level API) TODAY!
McGraw-Hill©The McGraw-Hill Companies, Inc., 2004 Application Layer PART VI.
Networks 1 CS502 Spring 2006 Network Input & Output CS-502 Operating Systems Spring 2006.
TCP/IP TCP/IP Basics Alvin Kwan. TCP/IP What is TCP/IP?  It is a protocol suite governing how data can be communicated in a network environment, both.
CS-3013 & CS-502, Summer 2006 Network Input & Output1 CS-3013 & CS-502, Summer 2006.
TCP: Software for Reliable Communication. Spring 2002Computer Networks Applications Internet: a Collection of Disparate Networks Different goals: Speed,
1 Version 3.0 Module 10 Routing Fundamentals and Subnetting.
 The Open Systems Interconnection model (OSI model) is a product of the Open Systems Interconnection effort at the International Organization for Standardization.
Stan Smith Intel SSG/DPD June, 2015 Kernel Fabric Interface KFI Framework.
IB ACM InfiniBand Communication Management Assistant (for Scaling) Sean Hefty.
Protocols and the TCP/IP Suite Chapter 4. Multilayer communication. A series of layers, each built upon the one below it. The purpose of each layer is.
New Direction Proposal: An OpenFabrics Framework for high-performance I/O apps OFA TAC, Key drivers: Sean Hefty, Paul Grun.
Open Fabrics Interfaces Architecture Introduction Sean Hefty Intel Corporation.
Module 10. Internet Protocol (IP) is the routed protocol of the Internet. IP addressing enables packets to be routed from source to destination using.
Chapter 17 Networking Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William Stallings.
OpenFabrics 2.0 Sean Hefty Intel Corporation. Claims Verbs is a poor semantic match for industry standard APIs (MPI, PGAS,...) –Want to minimize software.
Protocol Architectures. Simple Protocol Architecture Not an actual architecture, but a model for how they work Similar to “pseudocode,” used for teaching.
Protocols and the TCP/IP Suite
OpenFabrics 2.0 or libibverbs 1.0 Sean Hefty Intel Corporation.
The OSI Model.
Scalable Fabric Interfaces Sean Hefty Intel Corporation OFI software will be backward compatible.
User Datagram Protocol (UDP) Chapter 11. Know TCP/IP transfers datagrams around Forwarded based on destination’s IP address Forwarded based on destination’s.
OFI SW - Progress Sean Hefty - Intel Corporation.
Fabric Interfaces Architecture Sean Hefty - Intel Corporation.
Internetworking Internet: A network among networks, or a network of networks Allows accommodation of multiple network technologies Universal Service Routers.
CCNA 3 Week 4 Switching Concepts. Copyright © 2005 University of Bolton Introduction Lan design has moved away from using shared media, hubs and repeaters.
Internetworking Internet: A network among networks, or a network of networks Allows accommodation of multiple network technologies Universal Service Routers.
Chapter 2 Applications and Layered Architectures Sockets.
The Socket Interface Chapter 21. Application Program Interface (API) Interface used between application programs and TCP/IP protocols Interface used between.
Scalable RDMA Software Solution Sean Hefty Intel Corporation.
The Socket Interface Chapter 22. Introduction This chapter reviews one example of an Application Program Interface (API) which is the interface between.
Storage Interconnect Requirements Chen Zhao, Frank Yang NetApp, Inc.
CSE 6590 Department of Computer Science & Engineering York University 111/9/ :26 AM.
PMI: A Scalable Process- Management Interface for Extreme-Scale Systems Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Jayesh Krishna, Ewing.
Chapter 2 Protocols and the TCP/IP Suite 1 Chapter 2 Protocols and the TCP/IP Suite.
Stan Smith Intel SSG/DPD February, 2015 Kernel OpenFabrics Interface Initialization.
Lecture 4 Overview. Ethernet Data Link Layer protocol Ethernet (IEEE 802.3) is widely used Supported by a variety of physical layer implementations Multi-access.
Network Protocols and Standards (Part 2). The OSI Model In 1984, the International Organization for Standardization (ISO) defined a standard, or set of.
A new thread support level for hybrid programming with MPI endpoints EASC 2015 Dan Holmes, Mark Bull, Jim Dinan
The Client-Server Model And the Socket API. Client-Server (1) The datagram service does not require cooperation between the peer applications but such.
OFI SW Sean Hefty - Intel Corporation. Target Software 2 Verbs 1.x + extensions 2.0 RDMA CM 1.x + extensions 2.0 Fabric Interfaces.
OpenFabrics Interface WG A brief introduction Paul Grun – co chair OFI WG Cray, Inc.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
OpenFabrics 2.0 rsockets+ requirements Sean Hefty - Intel Corporation Bob Russell, Patrick MacArthur - UNH.
Inter-Process Communication 9.1 Unix Sockets You may regard a socket as being a communication endpoint. –For two processes to communicate, both must create.
UDP: User Datagram Protocol Chapter 12. Introduction Multiple application programs can execute simultaneously on a given computer and can send and receive.
Stan Smith Intel SSG/DPD June, 2015 Kernel Fabric Interface Kfabric Framework.
1 Network Communications A Brief Introduction. 2 Network Communications.
1 K. Salah Application Layer Module K. Salah Network layer duties.
SC’13 BoF Discussion Sean Hefty Intel Corporation.
SOCKET PROGRAMMING Presented By : Divya Sharma.
11/18/2016Basic TCP/IP Networking 1 TCP/IP Overview Basic Networking Concepts.
ARP and RARP Objectives Chapter 7 Upon completion you will be able to:
Layered Architectures
Fabric Interfaces Architecture – v4
Advancing open fabrics interfaces
Internet Protocol: Connectionless Datagram Delivery
Persistent memory support
Routing and Switching Essentials v6.0
Request ordering for FI_MSG and FI_RDM endpoints
Operating Systems Chapter 5: Input/Output Management
Application taxonomy & characterization
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Presentation transcript:

Fabric Interfaces Architecture Sean Hefty - Intel Corporation

Changes v2 –Remove interface object –Add open interface as base object –Add SRQ object –Add EQ group object 2

Overview Object Model –Do we have the right type of objects defines? –Do we have the correct object relationships? Interface Synopsis –High-level description of object operations –Is functionality missing? –Are interfaces associated with the right object? Architectural Semantics –Do the semantics match well with the apps? –What semantics are missing? 3

Object “Class” Model Objects represent collection of attributes and interfaces –I.e. object-oriented programming model Consider architectural model only at this point 4 Objects do not necessarily map directly to hardware or software objects

Conceptual Object Hierarchy 5 Fabric Descriptor Fabric Domain Address Vector Map Index Endpoint Msg Passive Active Datagram RDM Shared Receive Queue Event Queue Completion CM AV Domain EQ Group Counter Memory Region Interfaces Object “inheritance”

Object Relationships 6 Fabric Passive EP EQCM Domain AV Map Index Active EP Msg Datagram RDMShared RQ EQ Group EQ CQ CM AV Domain Counter MR Object “scope”

Fabric Represents a communication domain or boundary – Single IB or RoCE subnet, IP (iWarp) network, Ethernet subnet Multiple local NICs / ports Topology data, network time stamps Determines native addressing – Mapped addressing possible – GID/LID versus IP 7 Fabric Passive EP EQ Domain

Passive (Fabric) EP Listening endpoint – Connection-oriented protocols Wildcard listen across multiple NICs / ports Bind to address to restrict listen – Listen may migrate with address 8 Fabric Passive EP EQ Domain

Fabric EQ Associated with passive endpoint(s) Reports connection requests Could be used to report fabric events 9 Fabric Passive EP EQ Domain

Resource Domain Boundary for resource sharing – Physical or logical NIC – Command queue Container for data transfer resources A provider may define multiple domains for a single NIC – Dependent on resource sharing 10 Fabric Passive EP EQ Domain

Domain Address Vectors Maintains list of remote endpoint addresses – Map – native addressing – Index – ‘rank’-based addressing Resolves higher-level addresses into fabric addresses – Native addressing abstracted from user Handles address and route changes 11 Domain AV Active EP EQ EQ Group SRQ Counter MR

Domain Endpoints Data transfer portal – Send / receive queues – Command queues – Ring buffers Multiple types defined – Connection-oriented / connectionless – Reliable / unreliable – Message / stream 12 Domain AV Active EP EQ EQ Group SRQ Counter MR

Domain Event Queues Reports asynchronous events Unexpected errors reported ‘out of band’ Events separated into ‘EQ domains’ – CM, AV, completions – 1 EQ domain per EQ – Future support for merged EQ domains 13 Domain AV Active EP EQ EQ Group SRQ Counter MR

EQ Groups Collection of EQs Conceptually shares same wait object Grouping for progress and wait operations 14 Domain AV Active EP EQ EQ Group SRQ Counter MR

Shared Receive Queue Shares buffers among multiple endpoints Not addressable – Addressable SRQs are abstracted within the endpoint 15 Domain AV Active EP EQ EQ Group SRQ Counter MR

Domain Counters Provides a count of successful completions of asynchronous operations – Conceptual HW counter Count is independent from an actual event reported to the user through an EQ 16 Domain AV Active EP EQ EQ Group SRQ Counter MR

Domain Memory Regions Memory ranges accessible by fabric resources – Local and/or remote access Defines permissions for remote access 17 Domain AV Active EP EQ EQ Group SRQ Counter MR

Interface Synopsis Operations associated with identified ‘classes’ General functionality, versus detailed methods –The full set of methods are not defined here –Detailed behavior (e.g. blocking) is not defined Identify missing and unneeded functionality –Mapping of functionality to objects 18 Use timeboxing to limit scope of interfaces to refine by a target date

19 Base Class CloseDestroy / free object BindCreate an association between two object instances SyncFencing operation that completes only after previously issued asynchronous operations have completed Control(~fcntl) set/get low-level object behavior I/F OpenOpen provider extended interfaces

20 Fabric DomainOpen a resource domain EndpointCreate a listening EP for connection-oriented protocols EQ OpenOpen an event queue for listening EP or reporting fabric events

21 Resource Domain QueryObtain domain specific attributes Open AV, EQ, EP, SRQ, EQ Group Create an address vector, event or completion counter, event queue, endpoint, shared receive queue, or EQ group MR OpsRegister data buffers for access by fabric resources

22 Address Vector InsertInsert one or more addresses into the vector RemoveRemote one or more addresses from the vector LookupReturn a stored address StraddrConvert an address into a printable string

23 Base EP EnableEnables an active EP for data transfers CancelCancel a pending asynchronous operation Getopt(~getsockopt) get protocol specific EP options Setopt(~setsockopt) set protocol specific EP options

24 Passive EP Getname(~getsockname) return EP address ListenStart listening for connection requests RejectReject a connection request

25 Active EP CMConnection establishment ops, usable by connection-oriented and connectionless endpoints MSG2-sided message queue ops, to send and receive messages RMA1-sided RDMA read and write ops Tagged2-sided matched message ops, to send and receive messages (conceptual merge of messages and RMA writes) Atomic1-sided atomic ops TriggeredDeferred operations initiated on a condition being met

26 Shared Receive Queue ReceivePost buffer to receive data

27 Event Queue ReadRetrieve a completion event, and optional source endpoint address data for received data transfers Read ErrRetrieve event data about an operation that completed with an unexpected error WriteInsert an event into the queue ResetDirects the EQ to signal its wait object when a specified condition is met StrerrorConverts error data associated with a completion into a printable string

28 EQ Group PollCheck EQs for events WaitWait for an event on the EQ group

29 Completion Counter ReadRetrieve a counter’s value AddIncrement a counter SetSet / clear a counter’s value WaitWait until a counter reaches a desired threshold

30 Memory Region Desc(~lkey) Optional local memory descriptor associated with a data buffer Key(~rkey) Protection key against access from remote data transfers

Architectural Semantics Progress Ordering - completions and data delivery Multi-threading and locking model Buffering Function signatures and semantics 31 Once defined, object and interface semantics cannot change – semantic changes require new objects and interfaces Need refining

Progress Ability of the underlying implementation to complete processing of an asynchronous request Need to consider ALL asynchronous requests –Connections, address resolution, data transfers, event processing, completions, etc. HW/SW mix 32 All(?) current solutions require significant software components

Progress - Proposal Support two progress models –Automatic and implicit (name?) Separate operations as belonging to one of two progress domains –Data or control –Report progress model for each domain 33

Progress - Proposal Implicit progress –Occurs when reading or waiting on EQ(s) –Application can use separate EQs for control and data –Progress limited to objects associated with selected EQ(s) –App can request automatic progress E.g. app wants to wait on native wait object Implies provider allocated threading 34

Ordering - Completions Outbound –Is any ordering guarantee needed? –‘Sync’ call completion guarantees all selected, previous operations issued on an endpoint have completed Inbound –Ordering only guaranteed for message queue posted receives 35

Ordering – Data Delivery Interfaces may imply specific ordering rules –E.g. Tagged – “If a sender sends two messages in succession to the same destination, and both match the same receive, then this operation cannot receive the second message if the first one is still pending. If a receiver posts two receives in succession, and both match the same message, then the second receive operation cannot be satisfied by this message, if the first one is still pending.” 36

Ordering – Data Delivery Required ordering specified by application –[read | write | send] after [read | write | send] –RAR, RAW, WAR, WAW, SAW Ordering may differ based on message size –E.g. size ≥ MTU 37 Needs more analysis with a formal proposal

Multi-threading and Locking Support both thread safe and lockless models Lockless – based on MPI model –Single – single-threaded app –Funneled – only 1 thread calls into interfaces –Serialized – only 1 thread at a time calls into interfaces –Are all models needed? Thread safe –Multiple – multi-threaded app, with no restrictions 38

Buffering Support both application and network buffering –Zero-copy for high-performance –Network buffering for ease of use Buffering in local memory or NIC –In some case, buffered transfers may be higher- performing (e.g. “inline”) Registration option for local NIC access –Migration to fabric managed registration Required registration for remote access –Specify permissions 39