1 Performance Evaluation of Gigabit Ethernet & Myrinet- 2000.

Slides:

Advertisements

Similar presentations

Providing Fault-tolerance for Parallel Programs on Grid (FT-MPICH) Heon Y. Yeom Distributed Computing Systems Lab. Seoul National University.

Advertisements

CoMPI: Enhancing MPI based applications performance and scalability using run-time compression. Rosa Filgueira, David E.Singh, Alejandro Calderón and Jesús.

Head-to-TOE Evaluation of High Performance Sockets over Protocol Offload Engines P. Balaji ¥ W. Feng α Q. Gao ¥ R. Noronha ¥ W. Yu ¥ D. K. Panda ¥ ¥ Network.

♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.

Protocols and software for exploiting Myrinet clusters Congduc Pham and the main contributors P. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin.

Performance Analysis of Virtualization for High Performance Computing A Practical Evaluation of Hypervisor Overheads Matthew Cawood University of Cape.

Performance Characterization of a 10-Gigabit Ethernet TOE W. Feng ¥ P. Balaji α C. Baron £ L. N. Bhuyan £ D. K. Panda α ¥ Advanced Computing Lab, Los Alamos.

Performance Evaluation of RDMA over IP: A Case Study with the Ammasso Gigabit Ethernet NIC H.-W. Jin, S. Narravula, G. Brown, K. Vaidyanathan, P. Balaji,

Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.

Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?

SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.

Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.

A Comparative Study of Network Protocols & Interconnect for Cluster Computing Performance Evaluation of Fast Ethernet, Gigabit Ethernet and Myrinet.

Implementing an OpenMP Execution Environment on InfiniBand Clusters Jie Tao ¹, Wolfgang Karl ¹, and Carsten Trinitis ² ¹ Institut für Technische Informatik.

1 Parallel Computing—Introduction to Message Passing Interface (MPI)

Nor Asilah Wati Abdul Hamid, Paul Coddington. School of Computer Science, University of Adelaide PDCN FEBRUARY 2007 AVERAGES, DISTRIBUTIONS AND SCALABILITY.

A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat Presented by Gregory Peaker and Tyler Maclean.

An overview of Infiniband Reykjavik, June 24th 2008 R E Y K J A V I K U N I V E R S I T Y Dept. Computer Science Center for Analysis and Design of Intelligent.

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.

Storage area network and System area network (SAN)

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji  Hemal V. Shah ¥ D. K. Panda 

P. Balaji, S. Bhagvat, D. K. Panda, R. Thakur, and W. Gropp

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

Protocol-Dependent Message-Passing Performance on Linux Clusters Dave Turner – Xuehua Chen – Adam Oline This work is funded by the DOE MICS office.

User-Level Process towards Exascale Systems Akio Shimada [1], Atsushi Hori [1], Yutaka Ishikawa [1], Pavan Balaji [2] [1] RIKEN AICS, [2] Argonne National.

Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.

A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu.

Roland Dreier Technical Lead – Cisco Systems, Inc. OpenIB Maintainer Sean Hefty Software Engineer – Intel Corporation OpenIB Maintainer Yaron Haviv CTO.

2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology.

Ishikawa, The University of Tokyo1 GridMPI ： Grid Enabled MPI Yutaka Ishikawa University of Tokyo and AIST.

Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory.

High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.

Basic LAN techniques IN common with all other computer based systems networks require both HARDWARE and SOFTWARE to function. Networks are often explained.

A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.

Computer Concepts 2014 Chapter 5 Local Area Networks.

The NE010 iWARP Adapter Gary Montry Senior Scientist

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

Module 2: Planning and Optimizing a TCP/IP Physical and Logical Network.

Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.

Example: Sorting on Distributed Computing Environment Apr 20,

Impact of High Performance Sockets on Data Intensive Applications Pavan Balaji, Jiesheng Wu, D.K. Panda, CIS Department The Ohio State University Tahsin.

Srihari Makineni & Ravi Iyer Communications Technology Lab

Integrating New Capabilities into NetPIPE Dave Turner, Adam Oline, Xuehua Chen, and Troy Benjegerdes Scalable Computing Laboratory of Ames Laboratory This.

Michihiro Koibuchi(NII, Japan ） Tomohiro Otsuka(Keio U, Japan ） Hiroki Matsutani （ U of Tokyo, Japan ） Hideharu Amano （ Keio U/ NII, Japan ） An On/Off.

Latest news on JXTA and JuxMem-C/DIET Mathieu Jan GDS meeting, Rennes, 11 march 2005.

1 Coscheduling in Clusters: Is it a Viable Alternative? Gyu Sang Choi, Jin-Ha Kim, Deniz Ersoz, Andy B. Yoo, Chita R. Das Presented by: Richard Huang.

Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.

University of Mannheim1 ATOLL ATOmic Low Latency – A high-perfomance, low cost SAN Patrick R. Haspel Computer Architecture Group.

An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.

1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07.

Beowulf – Cluster Nodes & Networking Hardware Garrison Vaughan.

Sockets Direct Protocol Over InfiniBand in Clusters: Is it Beneficial? P. Balaji, S. Narravula, K. Vaidyanathan, S. Krishnamoorthy, J. Wu and D. K. Panda.

Using Heterogeneous Paths for Inter-process Communication in a Distributed System Vimi Puthen Veetil Instructor: Pekka Heikkinen M.Sc.(Tech.) Nokia Siemens.

Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences.

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 April 11, 2006 Session 23.

By Chi-Chang Chen.  Cluster computing is a technique of linking two or more computers into a network (usually through a local area network) in order.

Performance Evaluation of JXTA-* Communication Layers Mathieu Jan PARIS Research Group Paris, November 2004.

CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State.

On Optimizing Collective Communication UT/Texas Advanced Computing Center UT/Computer Science Avi Purkayastha Ernie Chan, Marcel Heinrich Robert van de.

Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

1 Hierarchical Parallelization of an H.264/AVC Video Encoder A. Rodriguez, A. Gonzalez, and M.P. Malumbres IEEE PARELEC 2006.

A Practical Evaluation of Hypervisor Overheads Matthew Cawood Supervised by: Dr. Simon Winberg University of Cape Town Performance Analysis of Virtualization.

G. Russo, D. Del Prete, S. Pardi Frascati, 2011 april 4th-7th The Naples' testbed for the SuperB computing model: first tests G. Russo, D. Del Prete, S.

High Speed Interconnect Project

Community Grids Laboratory

Pluggable Architecture for Java HPC Messaging

Hybrid Programming with OpenMP and MPI

Myrinet 2Gbps Networks (

Cluster Computers.

Presentation transcript:

1 Performance Evaluation of Gigabit Ethernet & Myrinet- 2000

2 Performance Evaluation Methodology Measurement Based 3 classes of benchmarks used 1. Protocol Independent micro-benchmarks (NetPIPE) 2. Message Passing Interface (MPI) micro-benchmarks (SKaMPI) 3. Parallel applications using MPI communication (NAS PB)  Each class respectively targets 1. Raw latency & bandwidth 2. Latency & bandwidth with MPI added overhead 3. Overall effect on parallel applications performance

3 Test Setup 9 nodes cluster with 1 NFS node Dell 2.2 GHz Intel Xeon 3Com 3C996B Gigabit Ethernet NIC (copper) Cisco Catalyst T Switch Myrinet 1.2 Gbps LANai9 adaptor Myrinet port crossbar switch

4 Raw Latency & Bandwidth Results Myrinet saturates at 1100 Mbps Ethernet saturates at 930 Mbps (900 Mbps with IC) Ethernet NIC Interrupt Coalescing feature reduces latency by 30  s

5 MPI Latency & Bandwidth Results TCP-MPI 50  s slower than Myrinet TCP-ED-MPI 65  s slower than Myrinet

6 NAS Parallel Benchmark Results Name format: benchmark.data size.processors Runtimes normalized to Myrinet Runtimes

7 Analysis of Results Simple micro-benchmarks show Myrinet consistently enables lower latency and higher bandwidth MPI library using TCP-ED messaging outperform or match Myrinet library version on 6/15 benchmark configurations TCP-ED outperforms Myrinet by effectively overlapping communication and computation

8 Conclusions Optimized TCP/IP can outperform raw performance of Myrinet in some cases Optimized MPI libraries with OS support can achieve better performance than MPI over user-level libraries such as Myrinet GM Gigabit Ethernet can serve as a cost-effective cluster computing solution IF aggressive TCP/IP optimizations are implemented for cluster computing