High Availability through the Linux bonding driver

Slides:



Advertisements
Similar presentations
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
Advertisements

The subnet /28 has been selected to be further subnetted to support point-to-point serial links. What is the maximum number of serial links.
Internet Control Protocols Savera Tanwir. Internet Control Protocols ICMP ARP RARP DHCP.
CCNA 1 v3.1 Module 9 TCP/IP Protocol Suite and IP Addressing.
OFED TCP Port Mapper Proposal June 15, Overview Current NE020 Linux OFED driver uses host TCP/IP stack MAC and IP address for RDMA connections Hardware.
Solaris IPoIB (ibd) Implementation Kanoj Sarcar Sr Staff Engineer Sun Microsystems (presented by Bill Strahm)
1 K. Salah Module 5.1: Internet Protocol TCP/IP Suite IP Addressing ARP RARP DHCP.
Shivkumar KalyanaramanRensselaer Q1-1 ECSE-6600: Internet Protocols Quiz 1 Time: 60 min (strictly enforced) Points: 50 YOUR NAME: Be brief, but DO NOT.
1 27-Jun-15 S Ward Abingdon and Witney College VLAN Trunking protocol CCNA Exploration Semester 3 Chapter 4.
MAC AddressesCS-502 (EMC) Fall Clarification — MAC Addresses and IP Networks CS-502, Operating Systems Fall 2009 (EMC) (Slides include materials.
IP Routing: an Introduction. Quiz
Hardening Linux for Enterprise Applications Peter Knaggs & Xiaoping Li Oracle Corporation Sunil Mahale Network Appliance Session id:
Virtual LANs. VLAN introduction VLANs logically segment switched networks based on the functions, project teams, or applications of the organization regardless.
IB ACM InfiniBand Communication Management Assistant (for Scaling) Sean Hefty.
Chapter 4: Managing LAN Traffic
The Network Layer. Network Projects Must utilize sockets programming –Client and Server –Any platform Please submit one page proposal Can work individually.
ECE 424 Embedded Systems Design Networking Connectivity Chapter 12 Ning Weng.
Introduction to Network Address Translation
1 IP Forwarding Relates to Lab 3. Covers the principles of end-to-end datagram delivery in IP networks.
Example STP runs on bridges and switches that are 802.1D-compliant. There are different flavors of STP, but 802.1D is the most popular and widely implemented.
1-1 Embedded Network Interface (ENI) API Concepts Shared RAM vs. FIFO modes ENI API’s.
CMPT 471 Networking II Address Resolution IPv4 ARP RARP 1© Janice Regan, 2012.
InfiniBand Routing Solution Approach Yaron Haviv, CTO, Voltaire
Cisco 3 - LAN Perrine. J Page 110/20/2015 Chapter 8 VLAN VLAN: is a logical grouping grouped by: function department application VLAN configuration is.
Lesson 5—Networking BASICS1 Networking BASICS Protocols and Network Software Unit 2 Lesson 5.
IWARP Status Tom Tucker. 2 iWARP Branch Status  OpenFabrics SVN  iWARP in separate branch in SVN  Current with trunk as of SVN 7626  Support for two.
RDMA Bonding Liran Liss Mellanox Technologies. Agenda Introduction Transport-level bonding RDMA bonding design Recovering from failure Implementation.
IP1 The Underlying Technologies. What is inside the Internet? Or What are the key underlying technologies that make it work so successfully? –Packet Switching.
 Wind River Systems, Inc Appendix - E Shared Memory Network.
Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) Sriram Gopinath( )
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 9 TCP/IP Protocol Suite and IP Addressing.
1 Kyung Hee University Chapter 8 ARP(Address Resolution Protocol)
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public BSCI Module 8 Lesson 3 1 BSCI Module 8 Lesson 3 Implementing Dynamic IPv6 Addresses.
Neighbor Discovery. IPv6 Terminology Additional subnets Router Host Neighbors Host Intra-subnet router Switch LAN segment Link Subnet Network.
Linux Operations and Administration Chapter Eight Network Communications.
Protocol Layering Chapter 11.
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 3: VLANs Routing & Switching.
Cisco Confidential © 2013 Cisco and/or its affiliates. All rights reserved. 1 Cisco Networking Training (CCENT/CCT/CCNA R&S) Rick Rowe Ron Giannetti.
Routing. Classless Inter-Domain Routing Classful addressing scheme wasteful – IP address space exhaustion – A class B net allocated enough for 65K hosts.
Ad Hoc On-Demand Distance Vector Routing (AODV) ietf
RIP Routing Protocol. 2 Routing Recall: There are two parts to routing IP packets: 1. How to pass a packet from an input interface to the output interface.
© Jörg Liebeherr (modified by M. Veeraraghavan) 1 Point-to-Point Protocol Data Link Layer Loopback ARP and RARP.
1 15-Mar-16 VLAN Trunking protocol CCNA Exploration Semester 3 Chapter 4.
OpenFabrics Developers Summit SC06 QoS Update and Implementation RFC Eitan Zahavi, Mellanox Technologies Nov 2006.
Wrapping up subnetting, mapping IPs to physical ports BSAD 146 Dave Novak Sources: Network+ Guide to Networks, Dean 2013.
IPv6 Security Issues Georgios Koutepas, NTUA IPv6 Technology and Advanced Services Oct.19, 2004.
InfiniBand Routing in OFA Jason Gunthorpe – Obsidian Sean Hefty – Intel Hal Rosenstock – Voltaire.
LAN Switching Virtual LANs. Virtual LAN Concepts A LAN includes all devices in the same broadcast domain. A broadcast domain includes the set of all LAN-connected.
1 K. Salah Module 5.1: Internet Protocol TCP/IP Suite IP Addressing ARP RARP DHCP.
CCNA Practice Exam Questions
Chapter 9: Transport Layer
Exposing Link-Change Events to Applications
Instructor Materials Chapter 9: Transport Layer
Instructor Materials Chapter 5: Ethernet
Chapter 5 Network and Transport Layers
Scaling the Network: The Internet Protocol
Chapter 8 ARP(Address Resolution Protocol)
PART IV Network Layer.
Welcome! Thank you for joining us. We’ll get started in a few minutes.
Simple Connectivity Between InfiniBand Subnets
Computer Networks 9/17/2018 Computer Networks.
Chapter 5 Network and Transport Layers
Ethernet : Framing and Addressing
RoCEE in OFED Update Liran Liss, Mellanox Technologies March 15, 2010
Advanced Network Training
Data and Computer Communications by William Stallings Eighth Edition
Routing and Switching Essentials v6.0
1 ADDRESS RESOLUTION PROTOCOL (ARP) & REVERSE ADDRESS RESOLUTION PROTOCOL ( RARP) K. PALANIVEL Systems Analyst, Computer Centre Pondicherry University,
Ch 17 - Binding Protocol Addresses
16EC Computer networks unit II Mr.M.Jagadesh
Presentation transcript:

High Availability through the Linux bonding driver Or Gerlitz Voltaire ogerlitz@voltaire.com

agenda bonding driver background / concepts bonding driver high availability mode bonding IPoIB devices – status slaves requirements for a bond enabling High-Availability for native IB ULPs bonding IPoIB devices – code changes ipoib HW address bonding driver changes ipoib HW address - revisited ipoib driver changes

bonding driver background bonding (master) device that enslaves other devices the local system/stack (addressing, routing, multicast) interact only with the bond device bonding supports both HA and LB, we focus on HA code path: drivers/net/bonding doc path: Documentation/networking/bonding.txt

bonding driver HA mode called Active-Backup bonding has one active slave, applies link detection mechanisms to trigger fail-over one HW (L2) address is used for the bond typically the one of the first slave, which is then assigned to the other slaves as well

bonding HA mode – cont’ link detection mechanisms local: uses the carrier bit of the slaves path validation: implemented through an ARP target to which probes are sent fail-over bonding sends a Broadcast Gratuitous ARP (originally to update the Ethernet switches tables) bonding does a “replay” of multicast join

bonding of IPoIB devices - status some changes were required in the bonding driver and some in the ipoib driver bonding changes – patch set passed two review cycles at netdev ipoib changes – patch accepted to OFED 1.2 –some issues pending for upstream push configuration issues still persist the solution is integrated into OFED 1.2

slaves requirements for a bond slaves must be of the same ether type you can’t bond ipoib and non-ipoib interfaces slaves must use the same partition (VLAN) you can’t bond ib0.8003 with ib1.8004 slaves can be of different mode (UD vs CM) however, slaves MTU must be normalized

high-availability for native IB ULPs bonding provides HA at the Link (L2) level basically, layer separation means that TCP sessions should not break, but they can HW failure would cause the IB RC session of a native IB ULPs (SDP, RDS, iSER, Lustre, rNFS) to break bonding allows for a new session to be established immediately (as ipoib is the IB stack [rdma_cm] ARP provider) depending on the ULP, this session breakage may not be even seen by the user!

bonding/IPoIB code changes details follow

IPoIB HW address 20 bytes 1 byte - supported IB transports (bitmap) 3 bytes – the UD QP number 16 bytes – the IB port GID (made of an eight bytes subnet prefix & eight bytes port GUID) the GUID is unique and has to be distinct from the view point of the SM the QP is a resource allocated by the HCA and is always distinct

bonding driver changes problem: enslave devices whose HW address can’t be assigned from the outside solution: the bond HW address is the one of the active slave problem: enslave devices whose ether type is not ARPHRD_ETHER solution: override some of ether_setup settings with the slave ones (ether type, broadcast addr, HW addr len, HW header len, neighbour setup function etc)

IPoIB HW address - revisited IB UD L2 address is made of AH & QPN hence the 20 bytes HW neighbour address exposed by ipoib to the stack is not what the driver really uses ipoib uses a two layer neighboring scheme, such that for each struct neighbour there is a struct ipoib_neigh buddy ipoib installs a neighbour cleanup callback used to free the ipoib_neigh buddy resources

IPoIB driver changes under bonding neighbours are created on behalf of the bond device, hence - problem: under bonding the ipoib neighbour destructor can’t assume that n->dev is an ipoib device solution: add pointer to the device in struct ipoib_neigh and use this pointer in the cleanup func

bonding/IPoIB changes - summary bonding: the bond HW address is the one of the active slave (if the slave doesn’t support assignment) bonding: override some of ether_setup settings with the slave ones (if the slave is not of ARPHRD_ETHER type) ipoib: add pointer to the device in struct ipoib_neigh and use this pointer in the cleanup func

open issues upstream push configuration tools neighbour cleanup after slave module unload following a bonding fail over packet xmit over the new active slave, which happens before the old slave flushed the ipoib neighbours configuration tools an old and deprecated user tool named ifenslave is used, which can be now replaced by a script using the bonding sysfs entries