Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory.

Slides:



Advertisements
Similar presentations
Introduction To Network. Overview Whats Network ? Types of Networks Open System Interconnection Reference Model (OSI / RM) Transmission Control Protocol.
Advertisements

Head-to-TOE Evaluation of High Performance Sockets over Protocol Offload Engines P. Balaji ¥ W. Feng α Q. Gao ¥ R. Noronha ¥ W. Yu ¥ D. K. Panda ¥ ¥ Network.
Chabot College Chapter 2 Review Questions Semester IIIELEC Semester III ELEC
04/25/06Pavan Balaji (The Ohio State University) Asynchronous Zero-copy Communication for Synchronous Sockets in the Sockets Direct Protocol over InfiniBand.
1 Data Link Protocols Relates to Lab 2. This module covers data link layer issues, such as local area networks (LANs) and point-to-point links, Ethernet,
Evaluation of ConnectX Virtual Protocol Interconnect for Data Centers Ryan E. GrantAhmad Afsahi Pavan Balaji Department of Electrical and Computer Engineering,
Performance Characterization of a 10-Gigabit Ethernet TOE W. Feng ¥ P. Balaji α C. Baron £ L. N. Bhuyan £ D. K. Panda α ¥ Advanced Computing Lab, Los Alamos.
Performance Evaluation of RDMA over IP: A Case Study with the Ammasso Gigabit Ethernet NIC H.-W. Jin, S. Narravula, G. Brown, K. Vaidyanathan, P. Balaji,
Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.
Lab Practical 2 Study about different types of Networking Device
CSC 450/550 Part 3: The Medium Access Control Sublayer More Contents on the Engineering Side of Ethernet.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
Department of Computer Engineering University of California at Santa Cruz Networking Systems (1) Hai Tao.
Protocols and the TCP/IP Suite
Review on Networking Technologies Linda Wu (CMPT )
An overview of Infiniband Reykjavik, June 24th 2008 R E Y K J A V I K U N I V E R S I T Y Dept. Computer Science Center for Analysis and Design of Intelligent.
5/8/2006 Nicole SAN Protocols 1 Storage Networking Protocols Nicole Opferman CS 526.
Storage area network and System area network (SAN)
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji  Hemal V. Shah ¥ D. K. Panda 
CECS 474 Computer Network Interoperability Tracy Bradley Maples, Ph.D. Computer Engineering & Computer Science Cal ifornia State University, Long Beach.
IWARP Ethernet Key to Driving Ethernet into the Future Brian Hausauer Chief Architect NetEffect, Inc.
Supporting iWARP Compatibility and Features for Regular Network Adapters P. BalajiH. –W. JinK. VaidyanathanD. K. Panda Network Based Computing Laboratory.
P. Balaji, S. Bhagvat, D. K. Panda, R. Thakur, and W. Gropp
Connecting LANs, Backbone Networks, and Virtual LANs
Chapter 4 Queuing, Datagrams, and Addressing
IP Network Basics. For Internal Use Only ▲ Internal Use Only ▲ Course Objectives Grasp the basic knowledge of network Understand network evolution history.
Protocols and the TCP/IP Suite Chapter 4. Multilayer communication. A series of layers, each built upon the one below it. The purpose of each layer is.
Chapter 6 High-Speed LANs Chapter 6 High-Speed LANs.
Protocols for Wide-Area Data-intensive Applications: Design and Performance Issues Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi, Brian.
Network Technologies & Principles 1 Communication Subsystem. Types of Network. Principles of Network. Distributed Protocols.
1 March 2010 A Study of Hardware Assisted IP over InfiniBand and its Impact on Enterprise Data Center Performance Ryan E. Grant 1, Pavan Balaji 2, Ahmad.
ACM 511 Chapter 2. Communication Communicating the Messages The best approach is to divide the data into smaller, more manageable pieces to send over.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.
Connectivity Devices Hakim S. ADICHE, MSc
 Network Segments  NICs  Repeaters  Hubs  Bridges  Switches  Routers and Brouters  Gateways 2.
© McLean HIGHER COMPUTER NETWORKING Lesson 1 – Protocols and OSI What is a network protocol Description of the OSI model.
The NE010 iWARP Adapter Gary Montry Senior Scientist
InfiniSwitch Company Confidential. 2 InfiniSwitch Agenda InfiniBand Overview Company Overview Product Strategy Q&A.
Gigabit Ethernet.
CS/IS 465: Data Communication and Networks 1 CS/IS 465 Data Communications and Networks Lecture 28 Martin van Bommel.
An Analysis of 10-Gigabit Ethernet Protocol Stacks in Multi-core Environments G. Narayanaswamy, P. Balaji and W. Feng Dept. of Comp. Science Virginia Tech.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
Chapter 7 Backbone Network. Announcements and Outline Announcements Outline Backbone Network Components  Switches, Routers, Gateways Backbone Network.
Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.
William Stallings Data and Computer Communications
NETWORK HARDWARE CABLES NETWORK INTERFACE CARD (NIC)
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2004 Connecting Devices CORPORATE INSTITUTE OF SCIENCE & TECHNOLOGY, BHOPAL Department of Electronics and.
Sockets Direct Protocol Over InfiniBand in Clusters: Is it Beneficial? P. Balaji, S. Narravula, K. Vaidyanathan, S. Krishnamoorthy, J. Wu and D. K. Panda.
Ethernet. Ethernet standards milestones 1973: Ethernet Invented 1983: 10Mbps Ethernet 1985: 10Mbps Repeater 1990: 10BASE-T 1995: 100Mbps Ethernet 1998:
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage April 2010.
CHAPTER 4 PROTOCOLS AND THE TCP/IP SUITE Acknowledgement: The Slides Were Provided By Cory Beard, William Stallings For Their Textbook “Wireless Communication.
1 Chapter 4. Protocols and the TCP/IP Suite Wen-Shyang Hwang KUAS EE.
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Chapter 13 TCP Implementation.
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage September 2010 Brandon.
Sem1 - Module 7 Ethernet Technologies. This module introduces the specifics of the most important varieties of Ethernet.
Computer Network Architecture Lecture 3: Network Connectivity Devices.
Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.
CHAPTER -II NETWORKING COMPONENTS CPIS 371 Computer Network 1 (Updated on 3/11/2013)
Welcome Back Cisco Semester 1 & 2 Review. Why are Networks Complex Environments?? They involve: They involve: –Interconnections to networks outside an.
Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen
Spring Ch 17 Internetworking Concepts, Architecture, and Protocols Part III Internetworking.
Graciela Perera Department of Computer Science and Information Systems Slide 1 of 18 INTRODUCTION NETWORKING CONCEPTS AND ADMINISTRATION CSIS 3723 Graciela.
Chapter 3 Computer Networking Hardware
IS3120 Network Communications Infrastructure
Chapter 7 Backbone Network
Internetworking: Hardware/Software Interface
Networks Networking has become ubiquitous (cf. WWW)
ECE 671 – Lecture 8 Network Adapters.
Presentation transcript:

Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory

2 Overview Cluster Computing vs. Distributed Grids InfiniBand –IB for WAN IP and Ethernet –Improving performance Other LAN/WAN Options Summary

3 Cluster Computing vs. Distributed Grids Typical clusters –Homogenous architecture –Dedicated environments Compatibility is not a concern –Clusters can use high-speed LAN networks E.g., VIA, Quadrics, Myrinet, InfiniBand –And specific hardware accelerators E.g., Protocol offload, RDMA

4 Cluster Computing vs. Distributed Grids cont’ed Distributed environments –Heterogeneous architecture –Communication over WAN –Multiple administrative domains Compatibility is critical –Most WAN stacks are IP/Ethernet –Popular grid communication protocols TCP/IP/Ethernet UDP/IP/Ethernet But what about performance? –TCP/IP/Ethernet latency: 10s of µs –InfiniBand latency: 1s of µs How do you maintain high intra-cluster performance while enabling inter- cluster communication?

5 Solutions Use one network for LAN and another for WAN –You need to manage two networks –Your communication library needs to be multi-network capable May have impact on performance or resource utilization Maybe a better solution: A common network subsystem –One network for both LAN and WAN –Two popular network families InfiniBand Ethernet

6 InfiniBand Initially introduced as a LAN –Now expanding onto WAN Issues with using IB on the WAN –IB copper cables have limited lengths –IB uses end-to-end credit-based flow control

7 Cable Lengths IB copper cabling –Signal integrity decreases with length and data rate –IB 4x-QDR (32Gbps) max cable length is < 1m Solution: optical cabling for IB E.g., Intel Connects Cables –Optical cables –Electrical-to-optical converters at ends ~50 ps conversion delay –Plug into existing copper-based adapters

8 End-to-End Flow Control IB uses end-to-end credit-based flow control –One credit corresponds to one buffer unit at receiver –Sender can send one unit of data per credit –Long one-way latencies impact achievable throughput WAN latencies are on the order of ms Solution: Hop-by-hop flow control –E.g., Obsidian Networks Longbow switches –Switches have internal buffering –Link-level flow control is performed between node and switch

9 Effect of Delay on Bandwidth Distance (km) Delay (µs) Source: S. Narravula, et. al., Performance of HPC Middleware over InfiniBand WAN, Ohio State Technical Report, OSU-CISRC-12/07-TR77

10 IP and Ethernet Traditionally –IP/Ethernet is used for WAN –and for low-cost alternative on LAN –Software-based TCP/IP stack implementation Software overhead limits performance Performance limitations –Small 1500-byte maximum transfer unit (MTU) –TCP/IP software stack overhead

11 Increasing Maximum Transfer Unit Ethernet standard specifies 1500-byte MTU –Each packet requires hardware and software processing –Is considerable at gigabit speeds MTU can be increased –9K Jumbo frames –Reduce per-byte processing overhead Not compatible on WAN

12 Large Segment Offload Engine on NIC a.k.a. Virtual MTU Introduced by Intel and Broadcom Allow TCP/IP software stack to use 9K or 16K MTUs –Reducing software overhead Fragmentation performed by NIC Standard 1500-byte MTU on the wire –Compatible with upstream switches and routers

13 Offload Protocol Processing to NIC Handling packets at gigabit speeds requires considerable processing –Even with large MTU –Uses CPU time that would otherwise be used by application Protocol Offload Engines (POE) –Perform communication processing on NIC –Myrinet, Quadrics, IB TCP Offload Engines (TOE) is a specific kind of POE –Chelsio, NetEffect

14 TOE vs Non-TOE: Latency Source: P. Balaji, W. Feng and D. K. Panda, Bridging the Ethernet-Ethernot Performance Gap. IEEE Micro Journal Special Issue on High-Performance Interconnects, pp , May/June Volume, Issue 3, 2006.

15 TOE vs Non-TOE: Bandwidth and CPU Utilization

16 TOE vs Non-TOE: Bandwidth and CPU Utilization (9K MTU)

17 Other LAN/WAN Options iWARP protocol offload –Runs over IP –Has functionality similar to TCP –Adds RDMA Myricom –Myri-10G adapter –Uses 10G Ethernet physical layer –POE –Can handle both TCP/IP and MX Mellanox –ConnectX adapter –Has multiple ports that can be configured for IB or Ethernet –POE –Can handle both TCP/IP and IB Convergence in software stack: OpenFabrics –Supports IB and Ethernet adapters –Provides a common API to upper layer

18 Summary Clusters can take advantage of high-performance LAN NICs –E.g., InfiniBand Grids need interoperability –TCP/IP is ubiquitous Performance gap Bridging the gap –IB over the WAN –POE for Ethernet Alternatives –iWarp, Myricom’s Myri-10G, Mellanox ConnectX