Quality and Service in OFED 3.1

Slides:



Advertisements
Similar presentations
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 3: VLANs Routing & Switching.
Advertisements

CPSC Network Layer4-1 IP addresses: how to get one? Q: How does a host get IP address? r hard-coded by system admin in a file m Windows: control-panel->network->configuration-
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Addressing the Network – IPv4 Network Fundamentals – Chapter 6.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Addressing the Network – IPv4 Network Fundamentals – Chapter 6.
Chapter 9: Access Control Lists
Implementing Inter-VLAN Routing
Uncovering Performance and Interoperability Issues in the OFED Stack March 2008 Dennis Tolstenko Sonoma Workshop Presentation.
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 5: Inter-VLAN Routing Routing & Switching.
1 InfiniBand HW Architecture InfiniBand Unified Fabric InfiniBand Architecture Router xCA Link Topology Switched Fabric (vs shared bus) 64K nodes per sub-net.
MCDST : Supporting Users and Troubleshooting a Microsoft Windows XP Operating System Chapter 13: Troubleshoot TCP/IP.
Oct 21, 2004CS573: Network Protocols and Standards1 IP: Addressing, ARP, Routing Network Protocols and Standards Autumn
Lesson 1: Configuring Network Load Balancing
Virtual LANs. VLAN introduction VLANs logically segment switched networks based on the functions, project teams, or applications of the organization regardless.
IB ACM InfiniBand Communication Management Assistant (for Scaling) Sean Hefty.
SUSE Linux Enterprise Server Administration (Course 3037) Chapter 7 Connect the SUSE Linux Enterprise Server to the Network.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Addressing the Network – IPv4 Network Fundamentals – Chapter 6.
CECS 5460 – Assignment 3 Stacey VanderHeiden Güney.
Copyright © 2007 InfiniBand ® Trade Association. Other names and brands are properties of their respective owners. IB Cross-Subnet Communication OpenFabrics.
SRP Update Bart Van Assche,.
Name Resolution Domain Name System.
Module 13: Network Load Balancing Fundamentals. Server Availability and Scalability Overview Windows Network Load Balancing Configuring Windows Network.
1/29/2002 CS Distributed Systems 1 Infiniband Architecture Aniruddha Bohra.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Introduction to Dynamic Routing Protocol Routing Protocols and Concepts.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Connecting to the Network Networking for Home and Small Businesses.
Infiniband subnet management Discuss the Infiniband subnet management system Discuss fat tree and subnet management in an Infiniband with a fat tree topology.
InfiniBand Routing Solution Approach Yaron Haviv, CTO, Voltaire
© 2007 Cisco Systems, Inc. All rights reserved. 1 Network Addressing Networking for Home and Small Businesses – Chapter 5 Darren Shaver – Modified Fall.
Connecting to a Network Lesson 5. Objectives Understand the OSI Reference Model and its relationship to Windows 7 networking Install and configure networking.
Scalable name and address resolution infrastructure -- Ira Weiny/John Fleck #OFADevWorkshop.
IB Subnet Manager MIB Cheng Yang Carl Yang Edwin Tsang August, 2004.
High Availability through the Linux bonding driver
© 2006 Cisco Systems, Inc. All rights reserved.Cisco PublicITE I Chapter 6 1 Filtering Traffic Using Access Control Lists Introducing Routing and Switching.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Filtering Traffic Using Access Control Lists Introducing Routing and Switching.
Scalable RDMA Software Solution Sean Hefty Intel Corporation.
IPSec ● IP Security ● Layer 3 security architecture ● Enables VPN ● Delivers authentication, integrity and secrecy ● Implemented in Linux, Cisco, Windows.
IP1 The Underlying Technologies. What is inside the Internet? Or What are the key underlying technologies that make it work so successfully? –Packet Switching.
InfiniBand Routers Ian Colloff : QLogic LWG Co-Chair.
OFED 1.3 InfiniBand Management Update Hal Rosenstock.
Chapter 3 - VLANs. VLANs Logical grouping of devices or users Configuration done at switch via software Not standardized – proprietary software from vendor.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
CSIT 220 (Blum)1 ARP Based on Computer Networks and Internets (Comer)
Finish up from Monday:. Today’s Agenda…  Day 1 Switching Technologies (Chapter 2 Sybex)Switching Technologies (Chapter 2 Sybex) VIRTUAL LAN’s [VLAN’S]
1 VLANs Relates to Lab 6. Short module on basics of VLAN switching.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Filtering Traffic Using Access Control Lists Introducing Routing and Switching.
OFED 1.2 Management Update Hal Rosenstock.
+ Routing Concepts 1 st semester Objectives  Describe the primary functions and features of a router.  Explain how routers use information.
IP Over InfiniBand Working Group Management Information Bases 55th IETF Atlanta Sean Harnedy InfiniSwitch Corporation
Reliable Multicast (RMC) Liran Liss Mellanox Technologies Inc.
Linux Management Enhancements Hal Rosenstock.
OpenFabrics Developers Summit SC06 QoS Update and Implementation RFC Eitan Zahavi, Mellanox Technologies Nov 2006.
Quality of Service Support Dror Goldenberg - Mellanox Sean Hefty – Intel.
Wrapping up subnetting, mapping IPs to physical ports BSAD 146 Dave Novak Sources: Network+ Guide to Networks, Dean 2013.
InfiniBand Routing in OFA Jason Gunthorpe – Obsidian Sean Hefty – Intel Hal Rosenstock – Voltaire.
ITMT Windows 7 Configuration Chapter 5 – Connecting to a Network ITMT 1371 – Windows 7 Configuration 1.
LAN Switching Virtual LANs. Virtual LAN Concepts A LAN includes all devices in the same broadcast domain. A broadcast domain includes the set of all LAN-connected.
Enhancements for Voltaire’s InfiniBand simulator
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
IP: Addressing, ARP, Routing
Deterministic Communication with SpaceWire
Instructor Materials Chapter 6: VLANs
Infiniband Architecture
Simple Connectivity Between InfiniBand Subnets
Chapter 4: Routing Concepts
Virtual LANs.
Oracle Solaris Zones Study Purpose Only
GGF15 – Grids and Network Virtualization
Chapter 4: Access Control Lists (ACLs)
RoCEE in OFED Update Liran Liss, Mellanox Technologies March 15, 2010
Routing and Switching Essentials v6.0
EIGRP.
Presentation transcript:

Quality and Service in OFED 3.1 Liran Liss Mellanox Technologies Inc.

Agenda QoS motivation InfiniBand QoS overview Host software support IB stack ULPs QoS manager Programming QoS levels in the fabric Configuring a QoS policy Example configurations Future work

QoS Motivation Multiple data-center traffic types Administrator Storage IPC IB-Ethernet Gateway QoS Manager Servers Filer Block Storage InfiniBand Subnet Net. IB-Fibre Channel Gateway Unified I/O Multiple data-center traffic types Each requires different service properties BW Latency Reliability QoS achieves these requirements on a unified wire All sorts of traffic; different needs, need to provide these needs in a single fabric

QoS in Infiniband – Overview Infiniband fabrics support up to 15 Virtual Lanes (VLs) for data Each virtual lane has dedicated resources Virtual lanes are arbitrated at each host/switch using a dual-priority Weighted Round Robin (WRR) scheme Flows are classified into Service Levels (SLs) at end nodes Each packet sent is marked with the corresponding SL Packets are mapped to VLs in each link according to their SL High Priority WRR Low Priority Priority Select Packets to be Transmitted H/L Weighted Round Robin (WRR) VL Arbitration ConnectX: 8 + vl15? VLs can be configured \\

QoS in Infiniband (IB spec v1.2.1- A13) Administrator configures fabric Fabric QoS levels SL-to-VL mappings High/low VL arbitration QoS policy Applications send PathRecord queries to SA May also include additonal QoS fields ServiceID, QoSClass SA consults QoS manager before replying In Active QoS Management: fabric may be dynamically reconfigured Applications use PathRecord fields for sending traffic Fabric enforces QoS accordingly Can also be reconfigured at runtime SM, SA, and QoS man. Implemented by opensm

QoS in Infiniband (IB spec v1.2.1- A13) Administrator configures fabric Fabric QoS levels SL-to-VL mappings High/low VL arbitration QoS policy Applications send PathRecord queries to SA May also include additonal QoS fields ServiceID, QoSClass SA consults QoS manager before replying In Active QoS Management: fabric may be dynamically reconfigured Clients use PathRecord fields for sending traffic Fabric enforces QoS accordingly We will start with this… Can also be reconfigured at runtime …to know how to do this SM, SA, and QoS man. Implemented by opensm

QoS in IB Stack SA Client RDMA CM Transport neutral interface Fills in QoS related components Pkey, QoS-class, Traffic class, ServiceID Interpretation left to QoS manager (opensm) Returns desired SL, MTU, rate, packet-life time, etc. RDMA CM Transport neutral interface Uses ServiceID, QoS class, and Traffic Class in path queries ServiceID is port-space prefix + port QoS class used for IPv4 – ToS value from ‘rdma_set_service_type()’ Traffic class used for IPv6 – taken from sockaddr_in6 address “multi-cast already support traffic class” !#???? IPv4 TOS, IPv6 TC,  diffserve code point Set_service_type – before address resolution, only affects new connections…

QoS in ULPs SRP IPoIB SDP iSER RDS MPI Based on target port GUID (ServiceID is currently vendor specific) IPoIB Based on global multicast group settings Provides Pkey in each path resolution SDP Uses RDMA CM service – provides ServiceID iSER RDS MPI Currently does not issue PathRecord queries (SM integration planned) Uses SL given at command line directly and exchanges LIDs via TCP Traffic class for IPoIB – good only for QoS based on connected nodes, not flows, etc. *** Does IPoIB take the SL from the pkey

SM Configuration Relevant configuration files Partitions (/etc/ofa/opensm-partitions.conf) SL/VL tables (/var/cache/opensm/opensm.opts) QoS policy (/etc/ofa/opensm-qos-policy.conf)

Configuring SL-to-VL and VL Arbitration Weights are specified in 64 byte credits Use multiples of MTU/64 (e.g., 32 for 2K MTU) VLs with 0 credits are never scheduled Special high-limit values: 0 – single packet, 255 – no limit Device specific configuration CA (_ca_), router (_rtr_), switch port 0 (_sw0), switch external ports (_swe_) # QoS default options qos_max_vls 15 qos_high_limit 0 qos_vlarb_high 0:32,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 qos_vlarb_low 0:0,1:32,2:32,3:32,4:32,5:32,6:32,7:32,8:32,9:32,10:32,11:32,12:32,13:32,14:32 qos_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 # QoS CA options qos_ca_max_vls 15 qos_ca_high_limit 0 qos_ca_vlarb_high 0:32,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 qos_ca_vlarb_low 0:0,1:32,2:32,3:32,4:32,5:32,6:32,7:32,8:32,9:32,10:32,11:32,12:32,13:32,14:32 qos_ca_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7

QoS Policy Configuration File consists of the following optional sections: qos-ulps port-groups qos-levels qos-match-rules Two configuration models Simplified Only qos-ulps section required Advanced Advanced model takes precedence

Simple QoS Policy Assigns SLs according to: IPoIB with default / specified pkey SDP / iSER / RDS with (optional) port ranges SRP with target port guid Any application with specific ServiceId / pkey / target port guid range First rule takes precedence qos-ulps sdp, port-num 10000-20000 : 2 sdp : 0 srp, target-port-guid 0x100000000000FFFF : 4 rds, port-num 25000 : 2 rds : 0 iser : 4 ipoib, pkey 0x0001 : 5 ipoib : 6 any, pkey 0x0ABC : 3 default : 0 end-qos-ulps Order counts – ipoib traffic to the SRP target port will be caught be SRP rule Default must be defined – place doesn’t matter

Advanced QoS Policy Define port groups Define QoS levels A level specifies requirements for SL, MTU, rate, etc. Define matching rules that map PathRecord components to QoS levels Uses port groups and partition names to facilitate syntax

Advanced QoS Policy – port groups name: Storage use: SRP storage targets port-guid: 0x100000000000FFFF port-guid: 0x100000000000FFFA end-port-group name: Virtual Servers use: node desc and IB port num port-name: ws1 HCA-1/P1, ws2 HCA-1/P1 name: Engineering partition: Part1 pkey: 0x1234 name: Switches and SM node-type: SWITCH, SELF end-port-groups Defined based on GUID Node description/port Partition names PKeys Type (CA/Switch/etc.) Identified by ‘name’ field ‘use’ field is for logging only

Advanced QoS Policy – QoS Level qos-levels qos-level name: DEFAULT use: default QoS Level sl: 0 end-qos-level name: Low Priority use: for the lowest prio sl: 14 name: WholeSet sl: 1 mtu-limit: 4 rate-limit: 5 packet-life: 4 end-qos-levels Level = subset of PathRecord attributes SL, MTU, Rate, packet-life Uses standard PathRecord encoding Identified by ‘name’ field ‘use’ field is for logging only

Advanced QoS Policy – Matching Rules A rule maps a subset of Class Source port group Destination port group Service ID Pkey to a QoS level First matched rule wins qos-match-rules qos-match-rule use: by class 7-9 or 11 qos-class: 7-9,11 qos-level-name: WholeSet end-qos-match-rule use: Storage targets destination: Storage service-id: 22,4719-5000 qos-level-name: DEFAULT use: match by all parameters (AND) source: Virtual Servers pkey: 0x0F00-0x0FFF end-qos-match-rules Can we add multiple range lists? Can we use hex/base10 freely?

Usecase 1: HPC QoS Levels MPI Storage Control (Lustre MDS) Separate from I/O load Min BW of 70% Storage Control (Lustre MDS) Low latency Storage Data (Lustre OST) Min BW 30%

HPC QoS Administration MPI mpirun –sl 0 OpenSM QoS policy file: Options file: qos-ulps default :0 # default SL (for MPI) any, target-port-guid OST1,OST2,OST3,OST4 :1 # SL for Lustre OST any, target-port-guid MDS1,MDS2 :2 # SL for Lustre MDS end-qos-ulps qos_max_vls=8 qos_high_limit=0 qos_vlarb_high=2:1 qos_vlarb_low=0:224,1:96 qos_sl2vl=0,1,2,3,4,5,6,7,15,15,15,15,15,15,15,15

Usecase 2: EDC QoS Levels Management traffic (ssh) Application traffic IPoIB management VLAN (partition A) Min BW 10% Application traffic IPoIB application VLAN (partition B) Isolated from storage and database Min BW of 30% Database Cluster traffic RDS SRP Min BW 30% Bottle neck at storage nodes

EDC QoS Administration OpenSM QoS policy file Options file Partition configuration file qos-ulps default : 0 ipoib, pkey 0x8001 : 1 ipoib, pkey 0x8002 : 2 rds : 3 srp, target-port-guid SRP1, SRP2, SRP3 : 4 end-qos-ulps qos_max_vls=8 qos_high_limit=0 qos_vlarb_high=1:32,2:96,3:96,4:96 qos_vlarb_low=0:1, qos_sl2vl=0,1,2,3,4,5,6,7,15,15,15,15,15,15,15,15 SLs in unicast IPoIB do not find their way to mcast group Why do we need to asign SL to partitions??? Default=0x7fff,ipoib: ALL=full; PartA=0x8001, sl=1, ipoib: ALL=full; PartB=0x8002, sl=2, ipoib: ALL=full;

Future Work Configuration file organization Move port groups to a different file Used both by partition and QoS files Move SL/VL configuration to QoS file Remove QoS options from partition file These will be obtained by IPoIB from MGID PathRecord Add wildcards for port-name matching Provide “user friendly” aliases to SA attribute encodings (e.g., MTU256) Add Traffic Class to matching rules Extend host-side QoS BW limiting WRR scheduling between QP groups sharing the same SL

Summary QoS in Infiniband is simple and elegant Centrally managed, consistent throughout the fabric Fully functional in OFED1.3 All ULPs are QoS aware QoS manager integrated in opensm Configuration is a piece of cake Just assign each ULP the desired service level

Thank You !