Protocols and software for exploiting Myrinet clusters Congduc Pham and the main contributors P. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin.

Slides:



Advertisements
Similar presentations
PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya
Advertisements

Multiple Processor Systems
Distributed Processing, Client/Server and Clusters
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
SE-292 High Performance Computing
CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Supporting Parallel Applications on Clusters of Workstations: The Intelligent Network Interface Approach.
PARAM Padma SuperComputer
2. Computer Clusters for Scalable Parallel Computing
Beowulf Supercomputer System Lee, Jung won CS843.
AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, Tyson Condie,
Distributed Processing, Client/Server, and Clusters
History of Distributed Systems Joseph Cordina
Chapter 16 Client/Server Computing Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
1 BGL Photo (system) BlueGene/L IBM Journal of Research and Development, Vol. 49, No. 2-3.
A Comparative Study of Network Protocols & Interconnect for Cluster Computing Performance Evaluation of Fast Ethernet, Gigabit Ethernet and Myrinet.
ATI GPUs and Graphics APIs Mark Segal. ATI Hardware X1K series 8 SIMD vertex engines, 16 SIMD fragment (pixel) engines 3-component vector + scalar ALUs.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Router Architectures An overview of router architectures.
Router Architectures An overview of router architectures.
Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Maria Athanasaki, Evangelos Koukis, Nectarios Koziris National Technical.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)
1 In Summary Need more computing power Improve the operating speed of processors & other components constrained by the speed of light, thermodynamic laws,
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
A brief overview about Distributed Systems Group A4 Chris Sun Bryan Maden Min Fang.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.
PARMON A Comprehensive Cluster Monitoring System A Single System Image Case Study Developer: PARMON Team Centre for Development of Advanced Computing,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Introduction  Client/Server technology is seen by many as the solution to the difficulty of linking together the various departments of corporation.
1 SIAC 2000 Program. 2 SIAC 2000 at a Glance AMLunchPMDinner SunCondor MonNOWHPCGlobusClusters TuePVMMPIClustersHPVM WedCondorHPVM.
Jump to first page One-gigabit Router Oskar E. Bruening and Cemal Akcaba Advisor: Prof. Agarwal.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.
University of Mannheim1 ATOLL ATOmic Low Latency – A high-perfomance, low cost SAN Patrick R. Haspel Computer Architecture Group.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
CS 4396 Computer Networks Lab Router Architectures.
A record and replay mechanism using programmable network interface cards Laurent Lefèvre INRIA / LIP (UMR CNRS, INRIA, ENS, UCB)
The influence of system calls and interrupts on the performances of a PC cluster using a Remote DMA communication primitive Olivier Glück Jean-Luc Lamotte.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 April 11, 2006 Session 23.
1Thu D. NguyenCS 545: Distributed Systems CS 545: Distributed Systems Spring 2002 Communication Medium Thu D. Nguyen
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
1 Distributed Processing Chapter 1 : Introduction.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Experiences with VI Communication for Database Storage Yuanyuan Zhou, Angelos Bilas, Suresh Jagannathan, Cezary Dubnicki, Jammes F. Philbin, Kai Li.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
High Performance and Reliable Multicast over Myrinet/GM-2
Current Generation Hypervisor Type 1 Type 2.
Myrinet 2Gbps Networks (
LHCb Online Meeting November 15th, 2000
ECE 671 – Lecture 8 Network Adapters.
Cluster Computers.
Presentation transcript:

Protocols and software for exploiting Myrinet clusters Congduc Pham and the main contributors P. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin

Parallel machines and clusters Cplant Standalone workstation

Pros for clusters n Large supercomputers are expensive and suffer from a short useful life span n Performance of workstations and PCs is rapidly improving n The communications bandwidth between workstations is increasing as new networking technologies and protocols are implemented in LANs and WANs. n Workstation clusters are easier to integrate into existing networks than special parallel computers. n Use of clusters of workstations as a distributed computing resource is very cost effective - incremental growth or update of system!!!

No polemical discussion, just statement… Mainframe Vector Supercomputer Mini Computer Workstation PC 1984 from R. Buyya GigaEthernet Giganet SCI Myrinet …

The Myrinet technology n Switch –full crossbar –wormhole source routing –small latency n Network interface –embedded RISC processor –programmable –local memory –several DMA engines Current specifications: Up to 200Mhz processor Up to 8MB local memory 64bit/66Mhz PCI bus (528 MB/s peak) 250 MB/s full duplex links

The raw performance is here, but… n the traditional communication software fail to bring the hardware performance to the applications Myrinet Traditional communication layers Optimized communication layers 200mph40mph 180mph 35mph 175mph

Going faster by taking shortcuts

Our communication architecture n Provides a complete suite for high-performance communications.Focus on Myrinet-based clusters n Viewed as layers, but by-passes as much as possible the OS Myrinet physical layer BIP BIP-SMP MPI-BIP programmable NICs break the traditional spatial distribution of tasks

BIP, the lowest protocol level n Basic Interface for Parallelism –very basic API –provides a library, a kernel module and a MCP –definitely not for the end-user n Optimizations for –latency –maximum throughput –the throughput increase n The implementation performs –reduction of the data critical path –distinction between small and large messages –burst or write combining for host  NIC –optimal cache usage –cache snooping for NIC  host (monitoring of the PCI bus) –buffer alignment –optimal fragment size… Myrinet BIP BIP-SMP MPI-BIP

n Avoids handshakes between the host and the NIC n Uses PIO to a NIC FIFO on the sending side and an extra memory copy on the receiving side BIP, small message strategy

n Use DMA both on the send side and receive side: higher bandwidth, offload the CPU n Zero-copy mechanism, pipelined transmission BIP, large message strategy

BIP-SMP: a low level for SMP machines n SMP viewed as best performance/price ratio architectures (2 or 4 proc.) n BIP-SMP provides –manage concurrent accesses to the NIC –low latency intra-node communications –BIP equivalent inter-node communication –total transparency for the applications and end-users

BIP-SMP: Moving data between processes

MPI-BIP: the communication middleware n MPI-BIP adds high-level features to BIP –based on the MPICH implementation –provides a portable and widely-used API –implements a credit-based flow control for small messages –request FIFO for multiple non-blocking operations –provides segmentation/reassembly features to avoid timeouts

Working with the BIP software suite n installation –run configure n compilation and linkage –several libraries: bip, bip-smp, mpi –compile with bipcc n Submitting jobs and monitoring nodes –run myristat to know which nodes are available –run bipconf to configure the virtual machine –use bipload to lunch programs

WebCM: a high level management tool n web-based management tool n integrates existing solutions into a common framework

The WebCM user interface graphical interface for myristat and bipconf n allows submission of jobs through batch packages n shows the user's virtual machine definition and the user's runnning processes n addition of fonctionnalities is performed by incorporating new software packages

Latency: BIP and MPI-BIP

Throughput: BIP and MPI-BIP

BIP-SMP: intra-node communications

BIP-SMP: inter-node communications

What run on our clusters? n Genomic simulation n Fluid dynamic n Discrete Event Parallel Simulation n Distributed Shared Memory System n Want to know more? –getting the distribution –getting the documentation