A Study of iSCSI Extensions for RDMA (iSER)

Slides:



Advertisements
Similar presentations
MCT620 – Distributed Systems
Advertisements

Computer Networks TCP/IP Protocol Suite.
© 2003 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Performance Measurements of a User-Space.
Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.
TCP/IP Protocol Suite 1 Chapter 18 Upon completion you will be able to: Remote Login: Telnet Understand how TELNET works Understand the role of NVT in.
A Hybrid MPI Design using SCTP and iWARP Distributed Systems Group Mike Tsai, Brad Penoff, and Alan Wagner Department of Computer Science University of.
Umut Girit  One of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer.
Remote Procedure Call (RPC)
OFED TCP Port Mapper Proposal June 15, Overview Current NE020 Linux OFED driver uses host TCP/IP stack MAC and IP address for RDMA connections Hardware.
CCNA – Network Fundamentals
Guide to TCP/IP, Third Edition
SCSI Command ordering & iSCSI Rob Elliott Mallikarjun Chadalapaka.
Transport Layer – TCP (Part1) Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF.
IWARP Update #OFADevWorkshop.
August 02, 2004Mallikarjun Chadalapaka, HP1 iSCSI/RDMA: Overview of DA and iSER Mallikarjun Chadalapaka HP.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
IP –Based SAN extensions and Performance Thao Pham CS 622 Fall 07.
Semester Copyright USM EEE442 Computer Networks Introduction: Protocols En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK)
SSH : The Secure Shell By Rachana Maheswari CS265 Spring 2003.
RDMA ENABLED WEB SERVER Rajat Sharma. Objective  To implement a Web Server serving HTTP client requests through RDMA replacing the traditional TCP/IP.
EE 4272Spring, 2003 Protocols & Architecture A Protocol Architecture is the layered structure of hardware & software that supports the exchange of data.
Computer Networks Transport Layer. Topics F Introduction  F Connection Issues F TCP.
Research Agenda on Efficient and Robust Datapath Yingping Lu.
Chapter 9 Classification And Forwarding. Outline.
Gursharan Singh Tatla Transport Layer 16-May
IWARP Ethernet Key to Driving Ethernet into the Future Brian Hausauer Chief Architect NetEffect, Inc.
OFED (iWarp) Enhancements Felix Marti, Open Fabrics Alliance Workshop Sonoma, April 2008 Chelsio Communications.
1 Transport Layer Computer Networks. 2 Where are we?
Chapter 17 Networking Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William Stallings.
Slide 1 DESIGN, IMPLEMENTATION, AND PERFORMANCE ANALYSIS OF THE ISCSI PROTOCOL FOR SCSI OVER TCP/IP By Anshul Chadda (Trebia Networks)-Speaker Ashish Palekar.
1 Version 3.0 Module 11 TCP Application and Transport.
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 17 This presentation © 2004, MacAvon Media Productions Multimedia and Networks.
TCP : Transmission Control Protocol Computer Network System Sirak Kaewjamnong.
University of the Western Cape Chapter 12: The Transport Layer.
TCP1 Transmission Control Protocol (TCP). TCP2 Outline Transmission Control Protocol.
ISER Update OpenIB Workshop, Feb 2006 Yaron Haviv, Voltaire John Hufferd, Brocade
ISER on SCTP & IB draft-hufferd-ips-iser-sctp-ib-00.txt Generalizations to iSER specification John Hufferd Mike Ko Yaron Haviv.
Web Services Based on SOA: Concepts, Technology, Design by Thomas Erl MIS 181.9: Service Oriented Architecture 2 nd Semester,
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
IWARP Status Tom Tucker. 2 iWARP Branch Status  OpenFabrics SVN  iWARP in separate branch in SVN  Current with trunk as of SVN 7626  Support for two.
ISCSI Extensions for RDMA (iSER) draft-ko-iwarp-iser-02 Mike Ko IBM August 2, 2004.
InfiniBand support for Socket- based connection model by CM Arkady Kanevsky November 16, 2005 version 4.
Multimedia and Networks. Protocols (rules) Rules governing the exchange of data over networks Conceptually organized into stacked layers – Application-oriented.
Lecture 4 Overview. Ethernet Data Link Layer protocol Ethernet (IEEE 802.3) is widely used Supported by a variety of physical layer implementations Multi-access.
Draft-ietf-rddp-security-02 Summary of outstanding issues August 4, 2004 Jim Pinkerton.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 17 This presentation © 2004, MacAvon Media Productions Multimedia and Networks.
Multipath TCP Signaling Options or Payload? Costin Raiciu
Mr. P. K. GuptaSandeep Gupta Roopak Agarwal
ISER on InfiniBand (and SCTP). Problem Statement Currently defined IB Storage I/O protocol –SRP (SCSI RDMA Protocol) –SRP does not have a discovery or.
TURN Jonathan Rosenberg Cisco Systems. Changes since last version Moved to behave terminology Many things moved into STUN –Basic request/response formation.
Computer Science Lecture 3, page 1 CS677: Distributed OS Last Class: Communication in Distributed Systems Structured or unstructured? Addressing? Blocking/non-blocking?
ISER Draft Status draft-ietf-ips-iser-01 Mike Ko March 8, 2005.
Internet Protocol Storage Area Networks (IP SAN)
August 04, 2004John Carrier, Adaptec1 One-Shot STags John Carrier Adaptec.
4343 X2 – The Transport Layer Tanenbaum Ch.6.
ISER Support Annex Arkady Kanevsky, Ph.D. IBTA SWG San Francisco September 25, 2006.
Chapter 9 The Transport Layer The Internet Protocol has three main protocols that run on top of IP: two are for data, one for control.
Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen
Ch23 Ameera Almasoud 1 Based on Data Communications and Networking, 4th Edition. by Behrouz A. Forouzan, McGraw-Hill Companies, Inc., 2007.
The Transport Layer Implementation Services Functions Protocols
Chapter 9: Transport Layer
Instructor Materials Chapter 9: Transport Layer
5. End-to-end protocols (part 1)
Process-to-Process Delivery, TCP and UDP protocols
Multimedia and Networks
Process-to-Process Delivery:
Lecture 4 Communication Network Protocols
draft-ietf-ips-iser-00 Mike Ko November 8, 2004
Transport Layer 9/22/2019.
Presentation transcript:

A Study of iSCSI Extensions for RDMA (iSER)

Outline Background Motivation and case for iSER The Who, Where Motivation and case for iSER The Why Layering of iSCSI, iSER & iWARP Stack and functionality distribution iSER design features Connection setup, Transformation, Data integriy management Changes/extensions to iSCSI What is changed and why Enhancements in iWARP protocols Automatic invalidation Enhancements to iWARP Verbs Efficient registration of STags Next steps Standardization Questions

Background The authors of this paper are: Mallikarjun Chadalapaka (HP), Uri Elzur (Broadcom), Michael Ko (IBM), Hemal Shah (Intel), and Patricia Thaler (Agilent). The iSER paper is based on a (just concluded) top-to-bottom protocol design work done by contributors from several companies in the RDMA Consortium. In other words, this paper generally belongs to the “Experience” category – the “E” in NICELI. This paper explores the design process of iSCSI Extensions for RDMA (iSER), a protocol that maps the iSCSI protocol over the iWARP protocol suite (RDMA over TCP/IP). The focus of this paper is two-fold in this design exploration : how iSER enables efficient data movement for iSCSI using generic RDMA hardware how/why certain iWARP architectural features were conceived during the iSER design.

iSCSI, TCP and the challenges therein iSCSI is an “application protocol” designed to run on TCP/IP. The iSCSI protocol encapsulates the SCSI protocol exchanges in order to perform SCSI I/Os over TCP/IP. The designers of the iSCSI protocol realized early on that the TCP copy overhead and TCP reassembly buffer requirements with high-speed TCP will become a critical factor in wide acceptance and deployment of iSCSI. The iSCSI protocol for this reason, includes an optional protocol mechanism called “markers”. Markers are a way to delineate iSCSI PDU boundaries via recurring pointers showing up at fixed intervals within the TCP data stream. In other words, the iSCSI markers aid an iSCSI-specific direct data placement mechanism to directly place each iSCSI PDU into its final memory location. iSCSI-specific direct data placement can also be done without employing markers, albeit needing more reassembly memory The immediate consequence of either approach was that one needed an iSCSI-specific NIC to efficiently run iSCSI protocol avoiding TCP data copies.

The case for iSER Considerations the designers of iSCSI and iSER pondered over are - Shouldn’t generic RDMA over TCP/IP technology be sufficient for the data movement needs of iSCSI? When the RDMA technology advances, so does iSCSI. Why tackle fundamental issues such as copy elimination via iSCSI-specific protocol?. Did iWARP say it offers CRC-level reliability on TCP/IP? Let iSCSI take the opportunity to stop playing transport! If nothing else, iSCSI needs iSER to run most efficiently on those (presumed to become) pervasive RNICs (RDMA-enabled NICs) in future. The iSCSI designers were ultimately convinced of the need for iSER, an “extension” to iSCSI to enable it to run on RDMA over TCP/IP (aka iWARP). The iSER protocol thus is designed with the explicit design goal to let iSCSI run on RNICs requiring no greater number of interrupts than an iSCSI NIC does – i.e. run most efficiently on generic RNICs.

iSCSI, iSER and iWARP SCSI iSCSI RNIC iSER RDMAP DDP MPA TCP The iSER protocol is designed to run on RDMAP protocol of the iWARP suite. The paper contains a discussion of why RDMAP was preferred over DDP. The iSER wire protocol is dependent only on RDMAP. However, the “iWARP Verbs” are a crucial part of the solution puzzle. During the iSER design, certain Innovations in iWARP Verbs were also made to best meet the needs of iSER. The first step in the iSER design work was to define an architecture model, called “Datamover Architecture”, that distilled the needs of iSCSI to generic data movement primitives. iSER was then designed as an instantiation of this Datamover Architecture that simply maps the primitives to RDMAP interactions. SCSI iSCSI Datamover Interface iSER iWARP Verbs RDMAP DDP iWARP protocol suite MPA TCP RNIC Generic RDMA over TCP/IP

iSER design iSER protocol uses the well-known TCP port used for iSCSI connection establishment, rather than using a new iSER well-known port. The iSCSI/iSER connection thus always starts in iSCSI “streaming” mode. A new iSCSI login key used for turning the RDMA (iSER) mode on after login. The existing discovery and boot mechanisms work with no changes. Transformation or Encapsulation? A question not traditionally encountered in layered protocols. The iSER protocol simply encapsulates certain iSCSI PDUs (called “control-type” PDUs) in iSER RDMA Send Messages, while it transforms certain other iSCSI PDUs (called “data-type” PDUs) into RDMA Writes or RDMA Reads. The iSER protocol relieves iSCSI of having to play transport role iSER mandates that iSCSI-level PDU digests must not be used because iWARP guarantees CRC-level data integrity. iSCSI CRC generation, checking, retransmission requests, retransmissions, timeout-based retransmissions - a lot of complexity in iSCSI is thus gone!

Changes to iSCSI The biggest set of changes to iSCSI in order to support iSER will be in the area of how iSCSI interfaces to its LLP (lower level protocol). Traditional iSCSI interfaces directly with TCP. Traditional iSCSI is involved in a lot of data movement activity. In the new model, iSCSI simply yields the administration of data movement to iSER, and iSER and iWARP will work together to move the data. Wire protocol iSCSI-level PDU digests (header & data) must not be used ( so, don’t bother to use the PDU level recovery features of iSCSI ). No piggybacking of status on the last read data PDU (the receiving RNIC doesn’t demux during placement! ) Other areas Obviously, iSCSI should know to negotiate the new login key – to turn the RDMA (iSER) mode on after login. iSCSI must “chunk” long unsolicited data sequences into PDUs so that each “mid-PDU” is exactly of negotiated max size.

Enhancement to RDMAP (automatic invalidation) SCSI has a clearly defined transactional model Command (Initiator -> Target) data (either way) status (Target -> Initiator) The initiator iSER layer (client) exposes its STags to the target (server). After receiving the status, initiator iSER layer will invalidate the STag mapping before using those buffers. How about doing this invalidation automatically on receiving the status? That takes one hardware access out from the performance path. iSCSI iSER RNIC iSCSI iSER RNIC Status (SendSE Message) Status (SendSE with Invalidate Message) Invalidate the exposed STag Check the invalidated STag Allow buffer usage Allow buffer usage Note - Red line is crossed only once!

Enhancements to iWARP Verbs (fast register) The initiator iSER layer (client) exposes its STags to the target (server ). The initiator iSER layer must register the Command buffer locally with the RNIC. Registration process yields the STag, so must precede the advertisement. This is a synchronous wait for a hardware response in the performance path. In the fast-register model, the STag is allocated to iSER apriori. It is merely associated with the Command buffer during runtime. The “fast-registration” is now guaranteed to succeed. The initiator iSER layer can post the fast-register and command requests to the hardware back-to-back, no more waiting. The paper also discusses automatic deregistration and Shared Receive Queues. iSCSI iSER RNIC iSCSI iSER RNIC SCSI Command SCSI Command Register the buffer to get STag Fast-Register with a known STag Advertise the same STag in the Command Advertise the STag in the Command

Next Steps The Datamover Architecture for iSCSI (DA) and iSCSI Extensions for RDMA (iSER) specifications were publicly released by the RDMA Consortium on July 21, 2003 (all specs available on www.rdmaconsortium.org). Several member companies are working on productization of the iWARP protocol suite and iSER. Both DA and iSER specs are submitted to IETF as Internet Drafts for pursuing standardization.

Thank you! Questions?