OFED TCP Port Mapper Proposal June 15, 2011. Overview Current NE020 Linux OFED driver uses host TCP/IP stack MAC and IP address for RDMA connections Hardware.

Slides:



Advertisements
Similar presentations
MicroKernel Pattern Presented by Sahibzada Sami ud din Kashif Khurshid.
Advertisements

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Addressing the Network – IPv4 Network Fundamentals – Chapter 6.
CCNA – Network Fundamentals
CCNA2 Module 4. Discovering and Connecting to Neighbors Enable and disable CDP Use the show cdp neighbors command Determine which neighboring devices.
Uncovering Performance and Interoperability Issues in the OFED Stack March 2008 Dennis Tolstenko Sonoma Workshop Presentation.
OPERATING SYSTEMS Threads
August 02, 2004Mallikarjun Chadalapaka, HP1 iSCSI/RDMA: Overview of DA and iSER Mallikarjun Chadalapaka HP.
Module 8: Concepts of a Network Load Balancing Cluster
Chapter 4: Threads. 4.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th edition, Jan 23, 2005 Chapter 4: Threads Overview Multithreading.
Chapter 4: Threads. 4.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th edition, Jan 23, 2005 Chapter 4: Threads Overview Multithreading.
Develop Application with Open Fabrics Yufei Ren Tan Li.
1 Improving Web Servers performance Objectives:  Scalable Web server System  Locally distributed architectures  Cluster-based Web systems  Distributed.
CSCI 4550/8556 Computer Networks Comer, Chapter 19: Binding Protocol Addresses (ARP)
Hands-On Microsoft Windows Server 2003 Networking Chapter 7 Windows Internet Naming Service.
What Is TCP/IP? The large collection of networking protocols and services called TCP/IP denotes far more than the combination of the two key protocols.
 The Open Systems Interconnection model (OSI model) is a product of the Open Systems Interconnection effort at the International Organization for Standardization.
Gursharan Singh Tatla Transport Layer 16-May
Copyright 2003 CCNA 1 Chapter 7 TCP/IP Protocol Suite and IP Addressing By Your Name.
OFED (iWarp) Enhancements Felix Marti, Open Fabrics Alliance Workshop Sonoma, April 2008 Chelsio Communications.
Host Identity Protocol
IB ACM InfiniBand Communication Management Assistant (for Scaling) Sean Hefty.
Hacking the Bluetooth Pairing Authentication Process Graduate Operating System Mini Project Siyuan Jiang and Haipeng Cai.
Port Knocking Software Project Presentation Paper Study – Part 1 Group member: Liew Jiun Hau ( ) Lee Shirly ( ) Ong Ivy ( )
Windows Internet Connection Sharing Dave Eitelbach Program Manager Networking And Communications Microsoft Corporation.
SRP Update Bart Van Assche,.
Sales Kickoff - ARCserve
IPv6 Extensions to RPC Sumandra Majee Sun Microsystems Inc.
Chapter 17 Networking Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William Stallings.
Exploring the Packet Delivery Process Chapter
1 Semester 2 Module 10 Intermediate TCP/IP Yuda college of business James Chen
Institute of Computer and Communication Network Engineering OFC/NFOEC, 6-10 March 2011, Los Angeles, CA Lessons Learned From Implementing a Path Computation.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Lecture 4 Operating Systems.
Chapter 4: Threads. 4.2 Chapter 4: Threads Overview Multithreading Models Threading Issues Pthreads Windows XP Threads Linux Threads Java Threads.
Jaringan Komputer Dasar OSI Transport Layer Aurelio Rahmadian.
Enabling Embedded Systems to access Internet Resources.
Networks – Network Architecture Network architecture is specification of design principles (including data formats and procedures) for creating a network.
Chapter Three Network Protocols By JD McGuire ARP Address Resolution Protocol Address Resolution Protocol The core protocol in the TCP/IP suite that.
Scalable name and address resolution infrastructure -- Ira Weiny/John Fleck #OFADevWorkshop.
1.4 Open source implement. Open source implement Open vs. Closed Software Architecture in Linux Systems Linux Kernel Clients and Daemon Servers Interface.
Chapter 6-2 the TCP/IP Layers. The four layers of the TCP/IP model are listed in Table 6-2. The layers are The four layers of the TCP/IP model are listed.
ISER on SCTP & IB draft-hufferd-ips-iser-sctp-ib-00.txt Generalizations to iSER specification John Hufferd Mike Ko Yaron Haviv.
Chapter 2 Applications and Layered Architectures Sockets.
Scalable RDMA Software Solution Sean Hefty Intel Corporation.
RDMA IP CM Service Annex Arkady Kanevsky, Ph.D. IBTA SWG San Francisco September 25, 2006.
TCP/IP Honolulu Community College Cisco Academy Training Center Semester 2 Version 2.1.
1 CS 4396 Computer Networks Lab TCP/IP Networking An Example.
IWARP Status Tom Tucker. 2 iWARP Branch Status  OpenFabrics SVN  iWARP in separate branch in SVN  Current with trunk as of SVN 7626  Support for two.
ISCSI Extensions for RDMA (iSER) draft-ko-iwarp-iser-02 Mike Ko IBM August 2, 2004.
InfiniBand support for Socket- based connection model by CM Arkady Kanevsky November 16, 2005 version 4.
Interfaces and Services Each layer provides a service to the layer above it. A service is a set of primitive operations. Under UNIX, primitives are implemented.
The Client-Server Model And the Socket API. Client-Server (1) The datagram service does not require cooperation between the peer applications but such.
Advanced UNIX programming Fall 2002, lecture 16 Instructor: Ashok Srinivasan Acknowledgements: The syllabus and power point presentations are modified.
ISER on InfiniBand (and SCTP). Problem Statement Currently defined IB Storage I/O protocol –SRP (SCSI RDMA Protocol) –SRP does not have a discovery or.
Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.
Address Resolution Protocol Yasir Jan 20 th March 2008 Future Internet.
1 Network Communications A Brief Introduction. 2 Network Communications.
SC’13 BoF Discussion Sean Hefty Intel Corporation.
Tgt: Framework Target Drivers FUJITA Tomonori NTT Cyber Solutions Laboratories Mike Christie Red Hat, Inc Ottawa Linux.
SOCKET PROGRAMMING Presented By : Divya Sharma.
Port Scanning James Tate II
OPERATING SYSTEM CONCEPT AND PRACTISE
Module 8: Networking Services
Netconf 2006 Tokyo Paul Moore
Chapter 3: Windows7 Part 4.
TCP/IP Networking An Example
OPERATING SYSTEMS Threads
Chapter 4: Threads.
Implementing an OpenFlow Switch on the NetFPGA platform
Presentation transcript:

OFED TCP Port Mapper Proposal June 15, 2011

Overview Current NE020 Linux OFED driver uses host TCP/IP stack MAC and IP address for RDMA connections Hardware tags packets used for RDMA connection management for easy identification Host TCP/IP stack services used for address resolution and neighbor updates RDMA CM claims TCP port creating a kernel socket when the unified portspace patch is applied and support is enabled via module option: tch;h=cfe f de9b8ba15f2e35b2997;hb=ofed_kernel_1_5 tch;h=cfe f de9b8ba15f2e35b2997;hb=ofed_kernel_1_5 Unified portspace kernel patch is applied only when OFED distribution is used intact At least one OSV is moving to a model where OFED kernel patches will not be applied RedHat starting with RHEL 6.0 iSCSI hardware acceleration has moved to a separate MAC/IP address that is not visible to the linux TCP/IP stack (private interface) Linux community has rejected previous push for including the portspace patch rather violently Suggestion from linux community is to do what iSCSI did Goal of this presentation is to … Describe a solution to the iWARP TCP portspace issue using the Sockets Direct Protocol Port Mapper and Netlink sockets 2

Current OFED iWARP CM Flows (Listen) 3 Application issues rdma_listen In case of userspace application, kernel transition occurs Local IP address is the Linux IP address (IP 0 ) OFED CM selects an interface and selects a local port from the appropriate portspace Simple case (IP 0 and TCP Port 0 ) Local IP can be ANY; CM issues listen to all interfaces Local port can be ANY; CM picks a port IF local IP and Port are any, port must be accepted on all interfaces Portspace patch issues Socket and Bind for iWARP providers This portion has not been accepted to the kernel Patch exists in the OFED package Default just has kernel CM picking a port independent of the host TCP/IP stack 1. Rdma_listen(Local IP 0, Local Port 0 ) 2. Transition to Kernel CM 3. Interface Selected 4. Port Selected 6. create_ listen 7. Setup Hardware 5. Kern_socket, bind

Current OFED iWARP CM Flows (Connect) 4 Application issues rdma_connect In case of userspace application, kernel transition occurs Local and remote IP addresses are the Linux IP addresses (IP 0, IP 2 ) OFED CM selects an interface and selects a local port from the appropriate portspace Local IP can be ANY CM uses the linux stack to pick an interface, this usually handles the Neighbour updated before getting to the provider Portspace patch issues Socket and Bind for iWARP providers Kernel provider is informed (and can trigger) Neighbour updates to stay in sync with the Linux TCP/IP stack Kernel provider mini-cm issues handles TCP/IP three way handshake and MPA exchange through dev_queue_xmit and private receive path 1.Rdma_connect( Local IP 0, Local Port 0, Remote IP 2, Remote Port 2 ) 2. Transition to Kernel CM 3. Interface Selected 4. Port Selected 6. connect 7. Setup Hardware 5. Kern_socket, bind 8. Neighbour Update 9. CM Packets

New OFED iWARP CM Architecture 5 Similar to current flow for CM OFED has new iWARP Port Mapper Daemon in userspace OFED has new netlink interface between user and kernel Introduced for statistics Extended for iWARP providers and new Port Mapper Daemon Netlink interface roughly modeled after iSCSI Supports (but does not require) second MAC/IP addresses on local and on remote peer (soft iWARP) Netlink Messages: Port Mapper Netlink Upcalls: Query PID, Add/Remove Mapping, Query Mapping Provider Netlink Upcalls: Query PID, Connect, Listen, Resolve Provider Netlink Downcalls: Inbound Connect, Operation Complete for upcalls, Interface Down Three RNIC models supported RNICs with CM in Kernel/Adapter RNICs with CM in userspace Hybrid RNICs with userspace CM that requires adapter assistance

iWARP Port Mapper Concept Port Mapper concept was introduced by the RDMA Consortium as part of the Socket Direct Protocol specification Provides a mechanism to have an iWARP port space separate from linux TCP port space iWARP port space can be on an independent IP address or single IP address Port Mapper service runs over TCP on a well known port (3935) on linux IP addresses Listen issued at service startup Port Mapper service rdma_listen steps: Register a mapping between linux IP Address/TCP Port and iWARP IP Address/TCP Port with the Port Mapper service Port Mapper service rdma_connect steps: Receive a query request from a Port Mapper service client Connect to remote peer on well known port Query RDMA peer’s iWARP IP Address/TCP port using the SDP Port Mapper protocol (PMRequest) Return information from the PMAccept message to the client of the Port Mapper service Port Mapper service peer query steps: Accept Port Mapper connection (port 3935 to linux IP address) from node issuing the query Receive the PMRequest message Look up the IP address and Port from the PM request in the local database from the rdma_listen step Return the mapped IP address and port information in a PMAccept message iWARP provider issues iWARP connect using an iWARP local and remote IP Address/TCP port “quad” after receiving the PMAccept message Later slides show more detail 6

Pending Netlink Patch for OFED A patch has been submitted recently to query RDMA connection information via netlink Roland has rolled this patch into the linux-next patch set for late May This patch introduces a single OFED netlink port and an Infiniband netlink infrastructure in ib_core Support for 32 clients within OFED and 1024 operations for each client Only a single client is currently defined (rdma_cm) Components interested in adding netlink capabilities to OFED can register with Infiniband netlink infrastructure The Port Mapper daemon consumes one client Each iWARP provider consumes an additional client The dump netlink operation is used to provide data back to the netlink client 7

New OFED iWARP CM Flows (Listen: Userspace provider CM) 8 1. Rdma_listen(Local IP 0, Local Port 0 ) 2. Transition to Kernel CM 3. Interface Selected 4. Port Selected 5. create_ listen 9. Setup Hardware (IP 1, Port 1 ) 6. Netlink: Listen 8. Netlink: Complete 7. Netlink: Register Port Map IP 0, Port 0 -> IP 1, Port 1 Similar to current flow for CM CM can now independently reserve ports since the Port Mapper allows providers to use any provider managed port number to represent CM port number Netlink message used to issue listen to userspace library Mini-cm or userspace TCP stack manages provider “port space” to get Local TCP port 1 that is related to the CM local Port 0 Userspace library registers local IP 1, Port 1 For compatibilty, bind could also be made on existing MAC/IP stack. Soft iWARP requires this, along with some customers. If userspace provider library issues socket/bind to Linux TCP/IP stack (like soft iWARP would do), then IP 0 = IP 1 and Port 0 != Port 1

New OFED iWARP CM Flows (Connect: Userspace provider CM) 9 3. Interface Selected 4. Port Selected Similar to current flow for CM Netlink used to issue connect to userspace library Mini-cm or userspace TCP stack manages provider “portspace” to get Local TCP port 1 that is related to the CM local Port 0 Userspace library resolves remote IP 2, Port 2 through the Port Mapper and gets remote IP and port number IP 3, Port 3 Userspace provider CM issues iWARP connect to IP 3, Port 3, including MPA handshake Userspace Mini-cm sends Netlink Connect Complete call to the kernel provider indicating the new connection information: IP 1 :Port 1, IP 3 :Port 3 The kernel driver sets up the RNIC hardware including transitioning the QP to RTS Kernel CM Issues Connect Reply Event 1.Rdma_connect( Local IP 0, Local Port 0, Remote IP 2, Remote Port 2 ) 2. Transition to Kernel CM 1.connect 2.Connect Reply Event 10. Setup Hardware 1.Netlink: Connect 2.Netlink: Connect Complete 1.Netlink: Resolve Remote Port IP 2, Port 2 -> IP 3, Port 3 8. SDP Port Mapper Protocol (IP 0 IP 2 )

New OFED iWARP CM Flows (Accept: Userspace provider CM) 10 Userspace provider CM receives a connect request on IP 1, port 1 TCP three-way handshake and MPA request from peer received Userspace library issues Connect Request netlink downcall to kernel provider library Remote iWARP: IP 3, Port 3 (Port Mapped) Remote TCP: Unknown, use Port Mapped IP 3, Port 3 ) Local iWARP: IP 1, Port 1 (Port Mapped) Local TCP: IP 0, Port 0 (from listen) Kernel Mini-cm sends Netlink Connect Request event to the iWARP indicating the new connection information: IP 0 :Port 0, IP 3 :Port 3 Application is notified of the connection request, it turns around with an rdma_accept call The kernel CM issues an accept call to the kernel provider The kernel provider then sets up the RNIC hardware, including sending the MPA response and transitioning the QP to RTS The kernel provider issues an Established CM event 4.Rdma_accept( Local IP 0, Local Port 0, Remote IP 3, Remote Port 3 ) 3. Transition to Userspace CM 5. Transition to Kernel CM 1.Connect Request Event 2.CM Accept 8.Established Event 7. Setup Hardware 1.Netlink: Connect Request

New OFED iWARP CM Flows (kernel provider CM) 11 Changes to RNICs that support kernel only connection management drivers are minimal On listen requests, the kernel provider CM must issue the Register Port Map request to the iWARP Port Mapper Daemon using netlink sockets On connect requests, the kernel provider CM must: Issue the Resolve Remote Port netlink message to the iWARP Port Mapper Daemon On completion, use the local and remove iWARP IP addresses and Port numbers to issue the iWARP connect request (instead of the linux IP addresses and Port numbers from the connect request On Connect Request event and accept request handling, map the local iWARP IP address and Port number to the original listen IP address and port number

New OFED iWARP CM Flows (hybrid provider CM) 12 A hybrid RNIC has a userspace Connection Manager or Private TCP stack that manages the iWARP IP address and port space, but does not get involved with connection setup The Listen flow for a hybrid RNIC is the same as the flow for the userspace stack The Accept flow is the same as the flow for a kernel provider The Connect flow is slightly different and depicted on the following slide.

New OFED iWARP CM Flows (Connect: Hybrid CM) Interface Selected 4. Port Selected Similar to current flow for CM Netlink used to issue resolve message to userspace library Mini-cm or userspace TCP stack manages provider “portspace” to get Local TCP port 1 that is related to the CM local Port 0 Userspace library resolves remote IP 2, Port 2 through the Port Mapper and gets remote IP and port number IP 3, Port 3 This information is returned to the kernel provider CM in a resolve complete netlink message Kernel provider CM issues iWARP connect to IP 3 :Port 3 from IP 1 :Port 1, including MPA handshake The kernel driver sets up the RNIC hardware including transitioning the QP to RTS Kernel CM Issues Connect Reply Event indicating IP 0 :Port 0 and IP 2 :Port 2 as the connection information 1.Rdma_connect( Local IP 0, Local Port 0, Remote IP 2, Remote Port 2 ) 2. Transition to Kernel CM 1.connect 2.Connect Reply Event 10. Setup Hardware 1.Netlink: Resolve 2.Netlink: Resolve Complete 1.Netlink: Resolve Remote Port IP 2, Port 2 -> IP 3, Port 3 8. SDP Port Mapper Protocol (IP 0 IP 2 )

Conclusions/Next Steps This proposal supports moving iWARP traffic to an independent port space from TCP/IP sockets applications transparently to the RDMA verbs consumer The iWARP port space can remain on the same IP address (like soft iWARP) or on a separate IP address (like iSCSI) Three different RNIC connection management models are supported The RDMA Consortium published the wire protocol for mapping TCP port numbers to iWARP port numbers This proposal also resolves a port space issue with iSER targets and iWARP in OFED Backward compatibility can be ensured by using timeouts on the port mapper protocol to fall back to the current behavior 14