RoCEE in OFED Update Liran Liss, Mellanox Technologies March 15, 2010

Slides:



Advertisements
Similar presentations
Computer Networks20-1 Chapter 20. Network Layer: Internet Protocol 20.1 Internetworking 20.2 IPv IPv6.
Advertisements

Socket Programming with IPv6. Why IPv6? Addressing and routing scalability Address space exhaustion Host autoconfiguration QoS of flow using flowlabel.
CS470, A.SelcukIPsec – AH & ESP1 CS 470 Introduction to Applied Cryptography Instructor: Ali Aydin Selcuk.
1 Internet Protocol Version 6 (IPv6) What the caterpillar calls the end of the world, nature calls a butterfly. - Anonymous.
1 Address Resolution Protocol (ARP) Relates to Lab 2. This module is about the address resolution protocol.
OFED TCP Port Mapper Proposal June 15, Overview Current NE020 Linux OFED driver uses host TCP/IP stack MAC and IP address for RDMA connections Hardware.
Uncovering Performance and Interoperability Issues in the OFED Stack March 2008 Dennis Tolstenko Sonoma Workshop Presentation.
© 2002 IBM Corporation IPoIB IETF-60 May 16, 2015 IPoIB Vivek Kashyap
Fibre Channel over InfiniBand Dror Goldenberg Mellanox Technologies.
RoCEv2 Update from the IBTA
1 Application TCPUDP IPICMPARPRARP Physical network Application TCP/IP Protocol Suite.
Chapter 3 Review of Protocols And Packet Formats
An overview of Infiniband Reykjavik, June 24th 2008 R E Y K J A V I K U N I V E R S I T Y Dept. Computer Science Center for Analysis and Design of Intelligent.
虛擬化技術 Virtualization Techniques
IWARP Ethernet Key to Driving Ethernet into the Future Brian Hausauer Chief Architect NetEffect, Inc.
Lesson 6 Neighbor Discovery.
CMPT 471 Networking II Address Resolution IPv6 Neighbor Discovery 1© Janice Regan, 2012.
IB ACM InfiniBand Communication Management Assistant (for Scaling) Sean Hefty.
TCP/IP Networking sections 13.2,3,4,5 Road map: TCP, provide connection-oriented service IP, route data packets from one machine to another (RFC 791) ICMP,
New Direction Proposal: An OpenFabrics Framework for high-performance I/O apps OFA TAC, Key drivers: Sean Hefty, Paul Grun.
Document Number ETH West Diamond Avenue - Third Floor, Gaithersburg, MD Phone: (301) Fax: (301)
SRP Update Bart Van Assche,.
Current major high performance networking technologies InfiniBand 10G-Ethernet.
Enabling Embedded Systems to access Internet Resources.
The Open Fabrics Verbs Working Group Pavel Shamis and Liran Liss.
OFED 1.2 Lessons, 1.3 Planning and Field Support May 07 Tziporet Koren.
1 IP: putting it all together Part 1 G53ACC Chris Greenhalgh.
InfiniBand Routing Solution Approach Yaron Haviv, CTO, Voltaire
10/13/2015© 2008 Raymond P. Jefferis IIILect 07 1 Internet Protocol.
© 2009 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved. © The McGraw-Hill Companies, Inc. IP version 6 Asst. Prof. Chaiporn Jaikaeo,
Fall 2005Computer Networks20-1 Chapter 20. Network Layer Protocols: ARP, IPv4, ICMPv4, IPv6, and ICMPv ARP 20.2 IP 20.3 ICMP 20.4 IPv6.
Scalable name and address resolution infrastructure -- Ira Weiny/John Fleck #OFADevWorkshop.
High Availability through the Linux bonding driver
Internetworking Internet: A network among networks, or a network of networks Allows accommodation of multiple network technologies Universal Service Routers.
Internetworking Internet: A network among networks, or a network of networks Allows accommodation of multiple network technologies Universal Service Routers.
Scalable RDMA Software Solution Sean Hefty Intel Corporation.
RDMA IP CM Service Annex Arkady Kanevsky, Ph.D. IBTA SWG San Francisco September 25, 2006.
RDMA Bonding Liran Liss Mellanox Technologies. Agenda Introduction Transport-level bonding RDMA bonding design Recovering from failure Implementation.
Infiniband and RoCEE Virtualization with SR-IOV
InfiniBand support for Socket- based connection model by CM Arkady Kanevsky November 16, 2005 version 4.
S305 – Network Infrastructure Chapter 5 Network and Transport Layers.
IP addresses IPv4 and IPv6. IP addresses (IP=Internet Protocol) Each computer connected to the Internet must have a unique IP address.
Advanced UNIX programming Fall 2002, lecture 16 Instructor: Ashok Srinivasan Acknowledgements: The syllabus and power point presentations are modified.
Neighbor Discovery. IPv6 Terminology Additional subnets Router Host Neighbors Host Intra-subnet router Switch LAN segment Link Subnet Network.
Linux Operations and Administration Chapter Eight Network Communications.
Transmission Control Protocol (TCP) Internet Protocol (IP)
Virtual NICs and HBAs Implementation update and usage 1 Liran Liss, Mellanox Technologies March 17 th, 2010.
IP Protocol CSE TCP/IP Concepts Connectionless Operation Internetworking involves connectionless operation at the level of the Internet Protocol.
IEEE Inter Networking Shortest Path Bridging Security Audio/Video Bridging DCB (802.1) Task Group PFC 802.1Qbb ETS 802.1Qaz DCBX 802.1Qaz QCN 802.1Qau.
1 COMP 431 Internet Services & Protocols The IP Internet Protocol Jasleen Kaur April 21, 2016.
Cisco Confidential 1 © 2010 Cisco and/or its affiliates. All rights reserved. Fiber Channel over Ethernet Marco Voi – Cisco Systems – Workshop CCR INFN.
InfiniBand Routing in OFA Jason Gunthorpe – Obsidian Sean Hefty – Intel Hal Rosenstock – Voltaire.
SC’13 BoF Discussion Sean Hefty Intel Corporation.
Understand IPv6 Part 2 LESSON 3.3_B Networking Fundamentals.
Network Load Balancing Addressing
Chapter 5 Network and Transport Layers
IP: Addressing, ARP, Routing
Infiniband Architecture
Chapter 5 Network and Transport Layers
Address Resolution Protocol (ARP)
The New Internet Protocol
Multiple Encapsulation Methods
Simple Connectivity Between InfiniBand Subnets
IPSec IPSec is communication security provided at the network layer.
Chapter 6: Network Layer
The New Internet Protocol
Address Resolution Protocol (ARP)
Delivering the Data.
Ch 17 - Binding Protocol Addresses
16EC Computer networks unit II Mr.M.Jagadesh
Presentation transcript:

RoCEE in OFED Update Liran Liss, Mellanox Technologies March 15, 2010 www.openfabrics.org

Agenda What is RoCEE? Verbs implications Connection management Protocol stack Packet format Verbs implications Connection management Enabling RoCEE in OFED Development and Availability RoCEE in action

What is RoCEE? Infiniband transport over Ethernet Efficient, light-weight transport, layered directly over Ethernet L2 FCoE equivalent for high-performance IPC traffic Takes advantage of DCB Ethernet PFC, ETS, and QCN Rich communication services Reliable/unreliable connected/datagram Unicast and multicast Atomics APM

Protocol Stack RDMA applications Socket applications IPoIB RDS SDP ULP Verbs IB transport TCP L4 IB L3 IPv4 L3 IB Ethernet L2 IB (S/D/Q) XAUI XFI SGMII L1

Packet Format LRH GRH BTH+ IB Payload ICRC VCRC Infiniband MAC ET GRH (L2 Hdr) GRH (L3 Hdr) BTH+ (L4 Hdr) IB Payload ICRC VCRC Infiniband MAC ET RoCEE GRH BTH+ IB Payload ICRC FCS RoCEE

Verbs Implications Address Vectors GIDs Special QPs IB compliant syntax GID-based addressing LID field is reserved GIDs Populated with link-local address corresponding to port MAC Special QPs QP0 is reserved QP1 is used for connection management Possibly other mad services in the future

Connection Management SA is out Based on RDMACM OS IP stack used to resolve remote IP to DMAC and bind to outgoing Ethernet interface VLAN determined according to bound netdev RoCEE device selected accordingly Network parameters (MTU, SL, timeout) obtained locally according to kernel policy Connection proceeds with CM as in IB Working only with Verbs also possible

Enabling RoCEE in OFED Application uVerbs uRDMACM libmlx4 RDMA ULPs OFED stack TCP/IP stack Application Address resolution RoCEE device binding + address resolution uVerbs uRDMACM libmlx4 RDMA ULPs TCP/IP Additional RoCEE port transport RDMACM CM Ib_core mlx4_ib mlx4_en Synch state with Eth device mlx4_core Ethernet Hardware

Development and Availability Kernel patches v0: Initial version, RoCEE flows in SA handled locally v3: Separate RoCEE SA emulation code from IB v4: Removed all SA emulation code altogether; CMA enhanced to support RoCEE flows v5: code simplifications; remove user-space MAD interface v7: loopback support; introduce ‘link-layer’ port attribute v8: add VLAN support; rebase to 2.6.33-rc3 OFED Initially in separate branch Now part of OFED-1.5.1 GA quality! Well tested!

RoCEE in Action (1) sw419:~/OFED-1.5.1-20100316-0817 # ibv_devinfo hca_id: mlx4_0         transport:                      InfiniBand (0)         fw_ver:                         2.7.806         node_guid:                      0002:c903:0008:e798         sys_image_guid:                 0002:c903:0008:e79b         vendor_id:                      0x02c9         vendor_part_id:                 26428         hw_ver:                         0xB0         board_id:                       MT_0DD0120009         phys_port_cnt:                  2                 port:   1                         state:                  PORT_INIT (2)                         max_mtu:                2048 (4)                         active_mtu:             2048 (4)                         sm_lid:                 0                         port_lid:               0                         port_lmc:               0x00                         link_layer:             IB                 port:   2                         state:                  PORT_ACTIVE (4)                         max_mtu:                2048 (4)                         active_mtu:             1024 (3)                         sm_lid:                 0                         port_lid:               0                         port_lmc:               0x00                         link_layer:             Ethernet

RoCEE in Action (2) sw419:~ # ifconfig eth2 20.4.3.219 sw419:~ # vconfig add eth2 7 Added VLAN with VID == 7 to IF -:eth2:- sw419:~ # ifconfig eth2.7 20.4.3.219 sw419:~ # cat /sys/class/infiniband/mlx4_0/ports/2/gids/0 fe80:0000:0000:0000:0202:c9ff:fe08:e799 sw419:~ # cat /sys/class/infiniband/mlx4_0/ports/2/gids/1 fe80:0000:0000:0000:0202:c900:0708:e799 sw419:~ # ibv_rc_pingpong -g 0 -i 2 sw420   local address:  LID 0x0000, QPN 0x00004f, PSN 0xef4670, GID fe80::202:c9ff:fe08:e799   remote address: LID 0x0000, QPN 0x00004f, PSN 0xd454d5, GID fe80::202:c9ff:fe08:e811 8192000 bytes in 0.01 seconds = 4807.51 Mbit/sec 1000 iters in 0.01 seconds = 13.63 usec/iter sw419:~ # ibv_rc_pingpong -g 1 -i 2 sw420   local address:  LID 0x0000, QPN 0x04004f, PSN 0xe10208, GID fe80::202:c900:708:e799   remote address: LID 0x0000, QPN 0x04004f, PSN 0x9b281b, GID fe80::202:c900:708:e811 8192000 bytes in 0.01 seconds = 4857.40 Mbit/sec 1000 iters in 0.01 seconds = 13.49 usec/iter

RoCEE in Action (3) sw419:~ # ifconfig eth2 20.4.3.219 [root@mtlsqt124 ~]# rds-stress -s 11.4.5.125 -q 4096 -t 2 -d 2 connecting to 11.4.5.125:4000 negotiated options, tasks will start in 2 seconds Starting up.... tsks   tx/s   rx/s  tx+rx K/s    mbi K/s    mbo K/s tx us/c   rtt us cpu %    2  40137  40126  322928.84       0.00       0.00   10.91   156.89 -0.99    2  39971  39987  324128.14       0.00       0.00   10.03   157.00 -1.00    2  37488  37575  304354.64       0.00       0.00   10.59   168.45 -1.00    2  38581  38604  312945.17       0.00       0.00   10.88   161.39 -1.00    2  38429  38473  311815.57       0.00       0.00   10.54   163.22 -1.00    2  39010  38856  315703.93       0.00       0.00   10.50   163.27 -1.00    2  37104  37167  300838.65       0.00       0.00   10.27   170.97 -1.00    2  39761  39826  322698.14       0.00       0.00   10.78   159.99 -1.00    2  38787  38704  314205.64       0.00       0.00   10.69   161.82 -1.00    2  40924  41002  332171.96       0.00       0.00   11.09   153.17 -1.00    2  38844  39012  315659.80       0.00       0.00   10.53   162.44 -1.00

RoCEE in Action (4) RoCEE really rocks!!!