Evaluation of ConnectX Virtual Protocol Interconnect for Data Centers Ryan E. GrantAhmad Afsahi Pavan Balaji Department of Electrical and Computer Engineering,

Slides:



Advertisements
Similar presentations
Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.
Advertisements

Natively Supporting True One-sided Communication in MPI on Multi-core Systems with InfiniBand G. Santhanaraman, P. Balaji, K. Gopalakrishnan, R. Thakur,
The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
Head-to-TOE Evaluation of High Performance Sockets over Protocol Offload Engines P. Balaji ¥ W. Feng α Q. Gao ¥ R. Noronha ¥ W. Yu ¥ D. K. Panda ¥ ¥ Network.
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula,
04/25/06Pavan Balaji (The Ohio State University) Asynchronous Zero-copy Communication for Synchronous Sockets in the Sockets Direct Protocol over InfiniBand.
Performance Analysis of Virtualization for High Performance Computing A Practical Evaluation of Hypervisor Overheads Matthew Cawood University of Cape.
Performance Characterization of a 10-Gigabit Ethernet TOE W. Feng ¥ P. Balaji α C. Baron £ L. N. Bhuyan £ D. K. Panda α ¥ Advanced Computing Lab, Los Alamos.
Performance Evaluation of RDMA over IP: A Case Study with the Ammasso Gigabit Ethernet NIC H.-W. Jin, S. Narravula, G. Brown, K. Vaidyanathan, P. Balaji,
Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.
On the Provision of Prioritization and Soft QoS in Dynamically Reconfigurable Shared Data-Centers over InfiniBand P. Balaji, S. Narravula, K. Vaidyanathan,
RDS and Oracle 10g RAC Update Paul Tsien, Oracle.
1 May 2011 RDMA Capable iWARP over Datagrams Ryan E. Grant 1, Mohammad J. Rashti 1, Pavan Balaji 2, Ahmad Afsahi 1 1 Department of Electrical and Computer.
Analyzing the Impact of Supporting Out-of-order Communication on In-order Performance with iWARP P. Balaji, W. Feng, S. Bhagvat, D. K. Panda, R. Thakur.
1 Migratory TCP: Connection Migration for Service Continuity in the Internet* Florin Sultan, Kiran Srinivasan, Deepa Iyer, Liviu Iftode Department of Computer.
Federated DAFS: Scalable Cluster-based Direct Access File Servers Murali Rangarajan, Suresh Gopalakrishnan Ashok Arumugam, Rabita Sarker Rutgers University.
An overview of Infiniband Reykjavik, June 24th 2008 R E Y K J A V I K U N I V E R S I T Y Dept. Computer Science Center for Analysis and Design of Intelligent.
RDMA in Virtualized and Cloud Environments #OFADevWorkshop Aaron Blasius, ESXi Product Manager Bhavesh Davda, Office of CTO VMware.
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji  Hemal V. Shah ¥ D. K. Panda 
IWARP Ethernet Key to Driving Ethernet into the Future Brian Hausauer Chief Architect NetEffect, Inc.
P. Balaji, S. Bhagvat, D. K. Panda, R. Thakur, and W. Gropp
Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications Sep 2009 High Performance Interconnects for Distributed Computing (HPI-DC)
Infiniband enables scalable Real Application Clusters – Update Spring 2008 Sumanta Chatterjee, Oracle Richard Frank, Oracle.
Week #10 Objectives: Remote Access and Mobile Computing Configure Mobile Computer and Device Settings Configure Remote Desktop and Remote Assistance for.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.
Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,
Supporting Strong Cache Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, S. Krishnamoorthy,
Designing Efficient Systems Services and Primitives for Next-Generation Data-Centers K. Vaidyanathan, S. Narravula, P. Balaji and D. K. Panda Network Based.
IWARP Redefined: Scalable Connectionless Communication Over High-Speed Ethernet M. J. Rashti, R. E. Grant, P. Balaji and A. Afsahi.
Roland Dreier Technical Lead – Cisco Systems, Inc. OpenIB Maintainer Sean Hefty Software Engineer – Intel Corporation OpenIB Maintainer Yaron Haviv CTO.
1 March 2010 A Study of Hardware Assisted IP over InfiniBand and its Impact on Enterprise Data Center Performance Ryan E. Grant 1, Pavan Balaji 2, Ahmad.
Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
Impact of Network Sharing in Multi-core Architectures G. Narayanaswamy, P. Balaji and W. Feng Dept. of Comp. Science Virginia Tech Mathematics and Comp.
HPCS Lab. High Throughput, Low latency and Reliable Remote File Access Hiroki Ohtsuji and Osamu Tatebe University of Tsukuba, Japan / JST CREST.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
The NE010 iWARP Adapter Gary Montry Senior Scientist
Workload-driven Analysis of File Systems in Shared Multi-Tier Data-Centers over InfiniBand K. Vaidyanathan P. Balaji H. –W. Jin D.K. Panda Network-Based.
InfiniSwitch Company Confidential. 2 InfiniSwitch Agenda InfiniBand Overview Company Overview Product Strategy Q&A.
Electronic visualization laboratory, university of illinois at chicago A Case for UDP Offload Engines in LambdaGrids Venkatram Vishwanath, Jason Leigh.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
An Analysis of 10-Gigabit Ethernet Protocol Stacks in Multi-core Environments G. Narayanaswamy, P. Balaji and W. Feng Dept. of Comp. Science Virginia Tech.
2006 Sonoma Workshop February 2006Page 1 Sockets Direct Protocol (SDP) for Windows - Motivation and Plans Gilad Shainer Mellanox Technologies Inc.
Sandor Acs 05/07/
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
Impact of High Performance Sockets on Data Intensive Applications Pavan Balaji, Jiesheng Wu, D.K. Panda, CIS Department The Ohio State University Tahsin.
Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan,
Srihari Makineni & Ravi Iyer Communications Technology Lab
OFED Usage in VMware Virtual Infrastructure Anne Marie Merritt, VMware Tziporet Koren, Mellanox May 1, 2007 Sonoma Workshop Presentation.
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
Integrating New Capabilities into NetPIPE Dave Turner, Adam Oline, Xuehua Chen, and Troy Benjegerdes Scalable Computing Laboratory of Ames Laboratory This.
Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.
Non-Data-Communication Overheads in MPI: Analysis on Blue Gene/P P. Balaji, A. Chan, W. Gropp, R. Thakur, E. Lusk Argonne National Laboratory University.
Sockets Direct Protocol Over InfiniBand in Clusters: Is it Beneficial? P. Balaji, S. Narravula, K. Vaidyanathan, S. Krishnamoorthy, J. Wu and D. K. Panda.
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage April 2010.
Mr. P. K. GuptaSandeep Gupta Roopak Agarwal
Barriers to IB adoption (Storage Perspective) Ashish Batwara Software Solution Architect May 01, 2007.
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage September 2010 Brandon.
Use case of RDMA in Symantec storage software stack Om Prakash Agarwal Symantec.
Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.
Remigius K Mommsen Fermilab CMS Run 2 Event Building.
A Practical Evaluation of Hypervisor Overheads Matthew Cawood Supervised by: Dr. Simon Winberg University of Cape Town Performance Analysis of Virtualization.
Intra-Socket and Inter-Socket Communication in Multi-core Systems Roshan N.P S7 CSB Roll no:29.
Network Processing Systems Design
Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen
Enhancements for Voltaire’s InfiniBand simulator
Balazs Voneki CERN/EP/LHCb Online group
Presentation transcript:

Evaluation of ConnectX Virtual Protocol Interconnect for Data Centers Ryan E. GrantAhmad Afsahi Pavan Balaji Department of Electrical and Computer Engineering, Queen’s University Mathematics and Computer Science, Argonne National Laboratory

Pavan Balaji, Argonne National Laboratory Data Centers: Towards a unified network stack  High End Computing (HEC) systems proliferating into all domains –Scientific Computing has been the traditional “big customer” –Enterprise Computing (large data centers) is increasingly becoming a competitor as well Google’s data centers Oracle’s investment in high speed networking stacks (mainly through DAPL and SDP) Investment from financial institutes such as Credit Suisse in low-latency networks such as InfiniBand  A change of domain always brings new requirements with it –A single unified network stack is the holy grail! –Maintaining density and power, while achieving high performance ICPADS (12/09/2009), Shenzhen, China

Pavan Balaji, Argonne National Laboratory InfiniBand and Ethernet in Data Centers  Ethernet has been the network of choice for data centers –Ubiquitous connectivity to all external clients due to backward compatibility Internal communication, external communication and management are all unified on to a single network There has also been a push for power to be distributed on the same channel as well (using Power over Ethernet), but that’s still not a reality  InfiniBand (IB) in data centers –Ethernet is (arguably) lagging behind with respect to some of the features provided by other high-speed networks such as IB Bandwidth (32 Gbps vs. 10 Gbps today), features (scalability features such as shared queues while using zero-copy communication and RDMA) The point of this paper is not about which is better, but to deal with the fact that data centers are looking for ways to converge both technologies ICPADS (12/09/2009), Shenzhen, China

Pavan Balaji, Argonne National Laboratory Convergence of InfiniBand and Ethernet  Researchers have been looking at different ways for a converged InfiniBand/Ethernet fabric –Virtual Protocol Interconnect (VPI) –InfiniBand over Ethernet (or RDMA over Ethernet) –InfiniBand over Converged Enhanced Ethernet (or RDMA over CEE)  VPI is the first convergence model introduced by Mellanox Technologies, and will be the focus of study in this paper ICPADS (12/09/2009), Shenzhen, China

Pavan Balaji, Argonne National Laboratory  Single network firmware to support both IB and Ethernet  Autosensing of layer-2 protocol –Can be configured to automatically work with either IB or Ethernet networks  Multi-port adapters can use one port on IB and another on Ethernet  Multiple use modes: –Data centers with IB inside the cluster and Ethernet outside –Clusters with IB network and Ethernet management Virtual Protocol Interconnect (VPI) IB Link Layer IB PortEthernet Port Hardware TCP/IP support Ethernet Link Layer Ethernet Link Layer IB Network Layer IB Network Layer IP IB Transport Layer IB Transport Layer TCP IB Verbs Sockets Applications ICPADS (12/09/2009), Shenzhen, China

Pavan Balaji, Argonne National Laboratory Goals of this paper  To understand the performance and capabilities of VPI  Comparison of VPI-IB with VPI-Ethernet with different software stacks –Openfabrics Verbs –TCP/IP sockets (both traditional and through the Sockets Direct Protocol)  Detailed studies with micro-benchmarks and a Enterprise Data center setup ICPADS (12/09/2009), Shenzhen, China

Pavan Balaji, Argonne National Laboratory Presentation Roadmap  Introduction  Micro-benchmark based Performance Evaluation  Performance Analysis of Enterprise Data Centers  Concluding Remarks and Future Work ICPADS (12/09/2009), Shenzhen, China

Pavan Balaji, Argonne National Laboratory Software Stack Layout ICPADS (12/09/2009), Shenzhen, China Sockets Application Sockets API Kernel TCP/IP Sockets Provider TCP/IP Transport Driver Driver User VPI capable Network Adapter Sockets Direct Protocol (Possible) Kernel Bypass RDMA Semantics Verbs Application Verbs API Ethernet InfiniBand Zero-copy Communication

Pavan Balaji, Argonne National Laboratory Software Stack Layout (details)  Three software stacks: TCP/IP, SDP and native verbs –VPI-Ethernet can only use TCP/IP –VPI-IB can use any one of the three  TCP/IP and SDP provide transparent portability for existing data center applications over IB –TCP/IP is more mature (preferable for conservative data centers) –SDP can (potentially) provide better performance: Can internally use more of IB features than TCP/IP, since it natively utilizes IB’s hardware implemented protocol (network and transport) But is not as mature: parts of the stack not as optimized as TCP/IP  Native verbs is also a possibility, but requires modifications to existing data center applications (studies by Panda’s group) ICPADS (12/09/2009), Shenzhen, China

Pavan Balaji, Argonne National Laboratory Experimental Setup  Four Dell PowerEdge R805 SMP servers  Each server has two quad-core 2.0 GHz AMD Opteron processors –12 KB instruction cache and 16 KB L1 data cache on each core –512 KB L2 cache for each core –2MB L3 cache on chip  8 GB DDR2 SDRAM on an 1800 MHz memory controller  Each node has one ConnectX VPI capable adapter (4X DDR IB and 10Gbps Ethernet) on a PCIe x8 bus  Fedora Core 5 (linux kernel ) was used with OFED 1.4  Compiler: gcc ICPADS (12/09/2009), Shenzhen, China

Pavan Balaji, Argonne National Laboratory One-way Latency and Bandwidth ICPADS (12/09/2009), Shenzhen, China

Pavan Balaji, Argonne National Laboratory Multi-stream Bandwidth ICPADS (12/09/2009), Shenzhen, China

Pavan Balaji, Argonne National Laboratory Simultaneous IB/10GE Communication ICPADS (12/09/2009), Shenzhen, China

Pavan Balaji, Argonne National Laboratory Presentation Roadmap  Introduction  Micro-benchmark based Performance Evaluation  Performance Analysis of Enterprise Data Centers  Concluding Remarks and Future Work ICPADS (12/09/2009), Shenzhen, China

Pavan Balaji, Argonne National Laboratory Data Center Setup  Three-tier data center –Apache 2 web server for static content –JBoss 5 application server for server- side java processing –MySQL database system  Trace workload: TPC-W benchmark representing a real web-based bookstore ICPADS (12/09/2009), Shenzhen, China Client Web Server (Apache) Web Server (Apache) Application Server (JBoss) Application Server (JBoss) Database Server (MySQL) Database Server (MySQL) 10GE 10GE/IPoIB/SDP

Pavan Balaji, Argonne National Laboratory Data Center Throughput ICPADS (12/09/2009), Shenzhen, China Average Average Average 85.08

Pavan Balaji, Argonne National Laboratory Data Center Response Time (Itemized) ICPADS (12/09/2009), Shenzhen, China 10GE 10GE/IPoIB 10GE/SDP

Pavan Balaji, Argonne National Laboratory Presentation Roadmap  Introduction  Micro-benchmark based Performance Evaluation  Performance Analysis of Enterprise Data Centers  Concluding Remarks and Future Work ICPADS (12/09/2009), Shenzhen, China

Pavan Balaji, Argonne National Laboratory Concluding Remarks  Increasing push for a converged network fabric –Enterprise data centers in HEC: power, density and performance  Different convergence technologies upcoming: VPI was one of the first such technology introduced by Mellanox  We studied the performance and capabilities of VPI with micro-benchmarks and an enterprise data center setup –Performance numbers indicate that VPI can give a reasonable performance boost to data centers without overly complicating the network infrastructure –What’s still needed? Self-adapting switches Current switches either do IB or 10GE, not both On the roadmap for several switch vendors ICPADS (12/09/2009), Shenzhen, China

Pavan Balaji, Argonne National Laboratory Future Work  Improvements to SDP (of course)  We need to look at other convergence technologies as well –RDMA over Ethernet (or CEE) is upcoming Already accepted into the Open Fabrics Verbs True convergence with respect to verbs –InfiniBand features such as RDMA will automatically migrate to 10GE –All the SDP benefits will translate to 10GE as well ICPADS (12/09/2009), Shenzhen, China

Pavan Balaji, Argonne National Laboratory Funding Acknowledgments  Natural Sciences and Engineering Research Council of Canada  Canada Foundation of Innovation and Ontario Innovation Trust  US Office of Advanced Scientific Computing Research (DOE ASCR)  US National Science Foundation (NSF)  Mellanox Technologies ICPADS (12/09/2009), Shenzhen, China

Thank you! Contacts: Ryan Grant: Ahmad Afsahi: Pavan Balaji:

Backup Slides

Pavan Balaji, Argonne National Laboratory Data Center Response Time (itemized) ICPADS (12/09/2009), Shenzhen, China