Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/29/2002 CS 545 - Distributed Systems 1 Infiniband Architecture Aniruddha Bohra.

Similar presentations


Presentation on theme: "1/29/2002 CS 545 - Distributed Systems 1 Infiniband Architecture Aniruddha Bohra."— Presentation transcript:

1 1/29/2002 CS 545 - Distributed Systems 1 Infiniband Architecture Aniruddha Bohra

2 1/29/2002CS 545 - Distributed Systems2 Distributed Applications and Data Transfer Traditional distributed applications Traditional distributed applications Need low latency message delivery Need low latency message delivery Data volume in transfers between nodes not too high Data volume in transfers between nodes not too high Server applications Server applications Need low latency and high bandwidth data transfers Need low latency and high bandwidth data transfers Data volumes in transfers are high e.g. in a cluster based storage or streaming multimedia servers Data volumes in transfers are high e.g. in a cluster based storage or streaming multimedia servers Need Reliable and Available Services Need Reliable and Available Services Need easy maintenance Need easy maintenance

3 1/29/2002CS 545 - Distributed Systems3 Traditional message send One kernel boundary crossing One kernel boundary crossing Two memory copies!! Two memory copies!! To NIC Application Memory buffersSystem Call Kernel TCP sendmsg Copy from user space IP and lower layers Backup buffers

4 1/29/2002CS 545 - Distributed Systems4 Lessons from parallel computing Co-processors that can access memory directly used for communication Co-processors that can access memory directly used for communication FLASH, J-Machine, Alewife FLASH, J-Machine, Alewife User level networking User level networking Virtual Memory Mapped Communication Virtual Memory Mapped Communication Unet Unet VMMC VMMC VIA VIA

5 1/29/2002CS 545 - Distributed Systems5 Interconnect bottleneck Servers require high data transfer rate Servers require high data transfer rate CPUs operate at GHz speed CPUs operate at GHz speed Gigabit ethernet is commonly used in cluster based servers Gigabit ethernet is commonly used in cluster based servers Data volumes are high Data volumes are high PCI bus is much slower PCI bus is much slower operates at 32 bit/33 MHz or 64 bit/66 MHz operates at 32 bit/33 MHz or 64 bit/66 MHz the next generation bus PCI-X operates at 133 MHz the next generation bus PCI-X operates at 133 MHz

6 1/29/2002CS 545 - Distributed Systems6 Some solutions HyperTransport HyperTransport Runs at 800MHz full duplex Runs at 800MHz full duplex Bridges with current buses and other HyperTransport buses Bridges with current buses and other HyperTransport buses 3GIO 3GIO Switch based Switch based Provides a layered implementation Provides a layered implementation Promises more than 40 Gb/s transfer rate Promises more than 40 Gb/s transfer rate

7 1/29/2002CS 545 - Distributed Systems7 More problems with bus based interconnects Cannot keep up with the increasing CPU and peripheral speed Cannot keep up with the increasing CPU and peripheral speed Bus is shared between all peripherals Bus is shared between all peripherals The pin count is high – PCB space is limited! The pin count is high – PCB space is limited! Buses are not able to extend to long distances Buses are not able to extend to long distances Do not support a large number of devices Do not support a large number of devices

8 1/29/2002CS 545 - Distributed Systems8 Outline Motivation and background Motivation and background Infiniband architecture Infiniband architecture Infiniband components Infiniband components Infiniband operation Infiniband operation Other Infiniband features Other Infiniband features Status Status Summary Summary

9 1/29/2002CS 545 - Distributed Systems9 Infiniband Architecture Provides switch based interconnect Provides switch based interconnect Increased reliability Increased reliability Scalable and easily maintainable Scalable and easily maintainable Supports memory to memory communication Supports memory to memory communication Low latency communication Low latency communication Provides support for “out of box” components Provides support for “out of box” components Scalable Scalable Easier to manage and operate Easier to manage and operate Is complimentary to the 3GIO and HyperTransport Buses Is complimentary to the 3GIO and HyperTransport Buses

10 1/29/2002CS 545 - Distributed Systems10 What is Infiniband? Infiniband Architecture(IBA) defines a System Area Network (SAN) Infiniband Architecture(IBA) defines a System Area Network (SAN) IBA SAN is a communications and management infrastructure for I/O and IPC IBA SAN is a communications and management infrastructure for I/O and IPC IBA defines a switched communications fabric IBA defines a switched communications fabric high bandwidth and low latency high bandwidth and low latency protected, remotely managed environment. protected, remotely managed environment. IBA hardware off-loads from the CPU much of the I/O communications operation. IBA hardware off-loads from the CPU much of the I/O communications operation.

11 1/29/2002CS 545 - Distributed Systems11 An IBA SAN

12 1/29/2002CS 545 - Distributed Systems12 Outline Motivation and background Motivation and background Infiniband architecture Infiniband architecture Infiniband components Infiniband components Infiniband operation Infiniband operation Other Infiniband features Other Infiniband features Status Status Summary Summary

13 1/29/2002CS 545 - Distributed Systems13 Topologies and components IBA serves as an interconnect for endnodes IBA serves as an interconnect for endnodes A node can be a processor node, an I/O unit and/or a router to another network A node can be a processor node, an I/O unit and/or a router to another network Infiniband Fabric Node

14 1/29/2002CS 545 - Distributed Systems14 Topologies and Components An IBA network is subdivided into subnets interconnected by routers An IBA network is subdivided into subnets interconnected by routers Endnodes can attach to a single or multiple subnets Endnodes can attach to a single or multiple subnets An IBA subnet is composed of endnodes, switches, routers and subnet managers An IBA subnet is composed of endnodes, switches, routers and subnet managers Each IBT device may attach to a single switch or multiple switches and/or directly with each other Each IBT device may attach to a single switch or multiple switches and/or directly with each other

15 1/29/2002CS 545 - Distributed Systems15 Verbs IBT device – processor node Channel Adapter (endnode) Port Channel Adapter (endnode) Port Message and Data Service Consumer

16 1/29/2002CS 545 - Distributed Systems16 Processor node Each channel adapter constitutes a node on the fabric Each channel adapter constitutes a node on the fabric Architecture supports multiple channel adapters per unit with each adapter providing one or more ports to the fabric Architecture supports multiple channel adapters per unit with each adapter providing one or more ports to the fabric Message and Data service is an OS component Message and Data service is an OS component Verbs describe the functions to configure, manage and operate a host channel adapter Verbs describe the functions to configure, manage and operate a host channel adapter Verbs are not API but provide the framework for OS to specify it Verbs are not API but provide the framework for OS to specify it

17 1/29/2002CS 545 - Distributed Systems17 Channel Adapter An IBA channel adapter(CA) is a programmable DMA engine with special protection features that allow DMA operations to be initiated locally and remotely. An IBA channel adapter(CA) is a programmable DMA engine with special protection features that allow DMA operations to be initiated locally and remotely. Host Channel Adapter(HCA) provides a consumer interface providing the functions specified by IBA verbs. Host Channel Adapter(HCA) provides a consumer interface providing the functions specified by IBA verbs. Target Channel Adapter(TCA) provides an interface to the device Target Channel Adapter(TCA) provides an interface to the device

18 1/29/2002CS 545 - Distributed Systems18 Channel Adapter

19 1/29/2002CS 545 - Distributed Systems19 Addressing in IBA Each endnode has one or more CAs and each CA has one or more ports Each endnode has one or more CAs and each CA has one or more ports Each Queue Pair (QP) has a QP number (QPN) assigned by the CA Each Queue Pair (QP) has a QP number (QPN) assigned by the CA Each port has a unique Local ID (LID) and at least one IPv6 address – Global ID (GID) Each port has a unique Local ID (LID) and at least one IPv6 address – Global ID (GID)

20 1/29/2002CS 545 - Distributed Systems20 Switches Do not generate or consume packets – pass them along based on the destination address Do not generate or consume packets – pass them along based on the destination address Are the routing components for intra-subnet routing – support uni or multicast Are the routing components for intra-subnet routing – support uni or multicast Every destination is configured with one or more unique Local IDs (LIDs) Every destination is configured with one or more unique Local IDs (LIDs) Subnet manager configures switches including loading their forwarding tables Subnet manager configures switches including loading their forwarding tables

21 1/29/2002CS 545 - Distributed Systems21 Routers Routers are inter-subnet routing elements Routers are inter-subnet routing elements Routers forward packets based on the packet’s global route header Routers forward packets based on the packet’s global route header Routers expose one or more ports between which packets are relayed Routers expose one or more ports between which packets are relayed IPv6 specifies the protocol performed between routers to derive their routing tables IPv6 specifies the protocol performed between routers to derive their routing tables

22 1/29/2002CS 545 - Distributed Systems22 Subnet Managers An Subnet Manager(SM) is an entity attached to a subnet responsible for its management An Subnet Manager(SM) is an entity attached to a subnet responsible for its management Tasks Tasks Discover topology Discover topology Configure the CA port with a range of LIDs, GIDs, subnet prefix and Partition_Keys Configure the CA port with a range of LIDs, GIDs, subnet prefix and Partition_Keys Maintains LID/GID resolution tables Maintains LID/GID resolution tables

23 1/29/2002CS 545 - Distributed Systems23 Outline Motivation and background Motivation and background Infiniband architecture Infiniband architecture Infiniband components Infiniband components Infiniband operation Infiniband operation Other Infiniband features Other Infiniband features Status Status Summary Summary

24 1/29/2002CS 545 - Distributed Systems24 Communication Queuing Queuing Consumer queues up a set of instructions for hardware to execute (Work queue). Consumer queues up a set of instructions for hardware to execute (Work queue). Work queues are created in pairs(Queue pairs – QP) for send and receive operations Work queues are created in pairs(Queue pairs – QP) for send and receive operations Each Work Queue has corresponding Completion Queue Each Work Queue has corresponding Completion Queue

25 1/29/2002CS 545 - Distributed Systems25 Work Queue Operations Send operations Send operations SEND SEND Block in memory space to send to destination Block in memory space to send to destination RDMA RDMA RDMA_READ, RDMA_WRITE, ATOMIC RDMA_READ, RDMA_WRITE, ATOMIC Memory Binding Memory Binding Alters the memory binding relationship – gives the R_KEY to components which allows secure DMA Alters the memory binding relationship – gives the R_KEY to components which allows secure DMA Receive operation Receive operation Specifies a receive data buffer Specifies a receive data buffer

26 1/29/2002CS 545 - Distributed Systems26 Work Queue Operations

27 1/29/2002CS 545 - Distributed Systems27 Communication Stack

28 1/29/2002CS 545 - Distributed Systems28 Keys Keys are used to provide isolation and protection Keys are used to provide isolation and protection M_KEY M_KEY Enforces the control of a master Subnet Manager Enforces the control of a master Subnet Manager B_KEY B_KEY Enforces control of a baseboard Subnet Manager Enforces control of a baseboard Subnet Manager P_KEY P_KEY Enforces membership in a subnet Enforces membership in a subnet Q_KEY Q_KEY Enforces access rights for reliable or unreliable service Enforces access rights for reliable or unreliable service L_KEY and R_KEY L_KEY and R_KEY Provide access rights to Remote registered memory Provide access rights to Remote registered memory

29 1/29/2002CS 545 - Distributed Systems29 Outline Motivation and background Motivation and background Infiniband architecture Infiniband architecture Infiniband components Infiniband components Infiniband operation Infiniband operation Other Infiniband features Other Infiniband features Status Status Summary Summary

30 1/29/2002CS 545 - Distributed Systems30 Virtual Lanes A virtual lane represents a set of transmit and receive buffers in a port A virtual lane represents a set of transmit and receive buffers in a port VL15 is used for subnet management VL15 is used for subnet management Each port must have at least one data VL Each port must have at least one data VL Separate flow control is maintained over each VL Separate flow control is maintained over each VL

31 1/29/2002CS 545 - Distributed Systems31 Service Levels Service levels(SLs) are maintained by attaching a VL to a SL Service levels(SLs) are maintained by attaching a VL to a SL IBA does not specify any QoS levels(e.g. best effort) IBA does not specify any QoS levels(e.g. best effort) The SMA must keep a mapping of Service Level to Virtual Lane and propagate it through the switch The SMA must keep a mapping of Service Level to Virtual Lane and propagate it through the switch

32 1/29/2002CS 545 - Distributed Systems32 Status Intel Developer Forum had several status talks Intel Developer Forum had several status talks http://www.intel.com/idf/us http://www.intel.com/idf/us http://www.intel.com/idf/us IBA enabled network storage has been demonstrated at industry shows IBA enabled network storage has been demonstrated at industry shows Banderacom Banderacom Windriver Windriver The first products are expected to be in the market by middle of 2002 The first products are expected to be in the market by middle of 2002

33 1/29/2002CS 545 - Distributed Systems33 Summary Future bandwidth requirements for servers would lead to the interconnect becoming a bottleneck – IBA is an attempt to alleviate the problem Future bandwidth requirements for servers would lead to the interconnect becoming a bottleneck – IBA is an attempt to alleviate the problem IBA provides a thorough migration from a bus based to a switch based architecture while maintaining interoperability IBA provides a thorough migration from a bus based to a switch based architecture while maintaining interoperability Further deployment is needed to realize other issues that would arise in operation Further deployment is needed to realize other issues that would arise in operation


Download ppt "1/29/2002 CS 545 - Distributed Systems 1 Infiniband Architecture Aniruddha Bohra."

Similar presentations


Ads by Google