Fabric Interfaces Architecture – v4

Slides:



Advertisements
Similar presentations
Device Virtualization Architecture
Advertisements

Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.
RIP V1 W.lilakiatsakun.
MPI Requirements of the Network Layer OFA 2.0 Mapping MPI community feedback assembled by Jeff Squyres, Cisco Systems Sean Hefty, Intel Corporation.
KOFI Stan Smith Intel SSG/DPD January, 2015 Kernel OpenFabrics Interface.
Remote Procedure Call (RPC)
28.2 Functionality Application Software Provides Applications supply the high-level services that user access, and determine how users perceive the capabilities.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
Develop Application with Open Fabrics Yufei Ren Tan Li.
DISTRIBUTED CONSISTENCY MANAGEMENT IN A SINGLE ADDRESS SPACE DISTRIBUTED OPERATING SYSTEM Sombrero.
Realizing the Performance Potential of the Virtual Interface Architecture Evan Speight, Hazim Abdel-Shafi, and John K. Bennett Rice University, Dep. Of.
An overview of Infiniband Reykjavik, June 24th 2008 R E Y K J A V I K U N I V E R S I T Y Dept. Computer Science Center for Analysis and Design of Intelligent.
Stan Smith Intel SSG/DPD June, 2015 Kernel Fabric Interface KFI Framework.
IB ACM InfiniBand Communication Management Assistant (for Scaling) Sean Hefty.
New Direction Proposal: An OpenFabrics Framework for high-performance I/O apps OFA TAC, Key drivers: Sean Hefty, Paul Grun.
Open Fabrics Interfaces Architecture Introduction Sean Hefty Intel Corporation.
Discussing an I/O Framework SC13 - Denver. #OFADevWorkshop 2 The OpenFabrics Alliance has recently undertaken an effort to review the dominant paradigm.
OpenFabrics 2.0 Sean Hefty Intel Corporation. Claims Verbs is a poor semantic match for industry standard APIs (MPI, PGAS,...) –Want to minimize software.
1/29/2002 CS Distributed Systems 1 Infiniband Architecture Aniruddha Bohra.
© 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect.
OpenFabrics 2.0 or libibverbs 1.0 Sean Hefty Intel Corporation.
Scalable Fabric Interfaces Sean Hefty Intel Corporation OFI software will be backward compatible.
OFI SW - Progress Sean Hefty - Intel Corporation.
Fabric Interfaces Architecture Sean Hefty - Intel Corporation.
IPSec IPSec provides the capability to secure communications across a LAN, across private and public wide area networks (WANs) and across the Internet.
CCNA 3 Week 4 Switching Concepts. Copyright © 2005 University of Bolton Introduction Lan design has moved away from using shared media, hubs and repeaters.
August 22, 2005Page 1 of (#) Datacenter Fabric Workshop Open MPI Overview and Current Status Tim Woodall - LANL Galen Shipman - LANL/UNM.
Scalable RDMA Software Solution Sean Hefty Intel Corporation.
Minimizing Communication Latency to Maximize Network Communication Throughput over InfiniBand Design and Implementation of MPICH-2 over InfiniBand with.
Storage Interconnect Requirements Chen Zhao, Frank Yang NetApp, Inc.
1 - Charlie Wiseman - 05/11/07 Design Review: XScale Charlie Wiseman ONL NP Router.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
Enterprise Integration Patterns CS3300 Fall 2015.
Fabric Interfaces Architecture Sean Hefty - Intel Corporation.
Stan Smith Intel SSG/DPD February, 2015 Kernel OpenFabrics Interface Initialization.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
The Client-Server Model And the Socket API. Client-Server (1) The datagram service does not require cooperation between the peer applications but such.
IB Verbs Compatibility
OFI SW Sean Hefty - Intel Corporation. Target Software 2 Verbs 1.x + extensions 2.0 RDMA CM 1.x + extensions 2.0 Fabric Interfaces.
OpenFabrics Interface WG A brief introduction Paul Grun – co chair OFI WG Cray, Inc.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Chapter 13 TCP Implementation.
OpenFabrics 2.0 rsockets+ requirements Sean Hefty - Intel Corporation Bob Russell, Patrick MacArthur - UNH.
Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Open Fabrics Interfaces Software Sean Hefty - Intel Corporation.
Stan Smith Intel SSG/DPD June, 2015 Kernel Fabric Interface Kfabric Framework.
SC’13 BoF Discussion Sean Hefty Intel Corporation.
Infiniband Architecture
A Brief Introduction to OpenFabrics Interfaces - libfabric
Chapter 3 Internet Applications and Network Programming
CS533 Concepts of Operating Systems
Other Important Synchronization Primitives
Advancing open fabrics interfaces
Persistent memory support
CSCI 315 Operating Systems Design
Request ordering for FI_MSG and FI_RDM endpoints
Xen Network I/O Performance Analysis and Opportunities for Improvement
I/O Systems I/O Hardware Application I/O Interface
Operating System Concepts
CS703 - Advanced Operating Systems
Basic Mechanisms How Bits Move.
Application taxonomy & characterization
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Chapter 4: Threads.
Ch 17 - Binding Protocol Addresses
Process-to-Process Delivery: UDP, TCP
Chapter 13: I/O Systems.
COT 5611 Operating Systems Design Principles Spring 2014
SocksDirect: Datacenter Sockets can be Fast and Compatible
Presentation transcript:

Fabric Interfaces Architecture – v4 Sean Hefty - Intel Corporation

Overview Do we have the right type of objects defines? Do we have the correct object relationships? Is functionality missing? Are interfaces associated with the right object? Do the semantics match well with the apps? What semantics are missing? www.openfabrics.org

High-Level Architecture Interfaces Fabric Topology Domain (Virtual) NIC Memory Descriptors Endpoint Active Passive Receive Event Groups Queues Counters Address Service Maps Tables Based on object-oriented programming concepts Derived objects define interfaces www.openfabrics.org

Fabric Object Model Fabric Software objects usable across multiple HW providers Fabric Interfaces Boundary of resource sharing Fabric Event Queue Passive Endpoint Domain Domain Event Queue Endpoint Event Queue Endpoint Event Counter Address Service Event Counter Address Service Provider abstracts multiple NICs NIC NIC NIC Physical Fabric www.openfabrics.org

Object / Interface Model Enlarged on following slides www.openfabrics.org

www.openfabrics.org

Endpoints www.openfabrics.org

Application Expectations Progress Ordering - completions and data delivery Multi-threading and locking model Buffering Compatibility www.openfabrics.org

All(?) current solutions require significant software components Progress Ability of the underlying implementation to complete processing of an asynchronous request Need to consider ALL asynchronous requests Connections, address resolution, data transfers, event processing, completions, etc. HW/SW mix All(?) current solutions require significant software components www.openfabrics.org

Progress Support two progress models Automatic and manual Separate operations as belonging to one of two progress domains Data or control Report progress model for each domain SAMPLE Manual Automatic Data Software Hardware offload Control Kernel services www.openfabrics.org

Automatic Progress Ideally hardware offload model Or standard kernel services / threads for control operations Once an operation is initiated, it will complete without further user intervention or calls into the API True if acting as either an initiator or target Automatic progress meets manual model by definition www.openfabrics.org

Manual Progress Implies significant software component Required to occur when reading or waiting on EQ(s) Requirement limited to objects associated with selected EQ(s) Not intended to restrict provider implementation Application can use separate EQs for control and data App can request automatic progress E.g. app wants to wait on native wait object Implies provider allocated threading www.openfabrics.org

Ordering Applies to a single initiator endpoint performing data transfers to one target endpoint over the same data flow Data flow may be a conceptual QoS level or path through the network Separate ordering domains Completions, message, data Fenced ordering may be obtained using fi_sync operation www.openfabrics.org

Completion Ordering Order in which operation completions are reported relative to their submission Unordered or ordered No application defined requirement for ordered completions Default: unordered Ordering may impose significant software overhead / latency dependent on endpoint type www.openfabrics.org

Message Ordering Order in which message (transport) headers are processed I.e. whether transport message are received in or out of order Determined by selection of ordering bits [Read | Write | Send] After [Read | Write | Send] RAR, RAW, RAS, WAR, WAW, WAS, SAR, SAW, SAS Example: fi_order = 0 // unordered fi_order = RAR | RAW | RAS | WAW | WAS | SAW | SAS // IB/iWarp like ordering www.openfabrics.org

Data Ordering Delivery order of transport data into target memory between messages Ordering per byte-addressable location I.e. access to the same byte in memory Ordering constrained by message ordering rules Must at least have message ordering first www.openfabrics.org

Data Ordering Ordering limited to message order size E.g. MTU In order data delivery if transfer <= message order size WAW, RAW, WAR sizes? Message order size = 0 No data ordering Message order size = -1 All data ordered www.openfabrics.org

Other Ordering Rules Ordering to different target endpoints not defined Per message ordering semantics implemented using different data flows Data flows may be less flexible, but easier to optimize for Endpoint aliases may be configured to use different data flows www.openfabrics.org

Multi-threading and Locking Support both thread safe and lockless models Compile time and run time support Run-time limited to compiled support Lockless (based on MPI model) Single – single-threaded app Funneled – only 1 thread calls into interfaces Serialized – only 1 thread at a time calls into interfaces Thread safe Multiple – multi-threaded app, with no restrictions www.openfabrics.org

Buffering Support both application and network buffering Zero-copy for high-performance Network buffering for ease of use Buffering in local memory or NIC In some case, buffered transfers may be higher-performing (e.g. “inline”) Registration option for local NIC access Migration to fabric managed registration Required registration for remote access Specify permissions www.openfabrics.org

Reverse mapping may differ from forward mapping Compatibility Reverse mapping may differ from forward mapping Verbs + RDMA CM Fabric Interface HCA + port Fabric PD Domain QP + cm_id Active EP cm_id Passive EP CQ + [comp_channel] EQ / Counter comp_channel EQ Group cm_channel EQ SRQ Receive EP MR | MW MR AH AV addr | AV index www.openfabrics.org

Sharable Interfaces Verbs + RDMA CM Interface FI Class QP Atomics Active EP QP / SRQ Msg Active / Receive EP QP / Tagged-SRQ Tagged RMA CQ EQ PD MR Domain PD (+ port) AV Proposed new object AV addr == AH www.openfabrics.org