Protocols and software for exploiting Myrinet clusters Congduc Pham and the main contributors P. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin
Parallel machines and clusters Cplant Standalone workstation
Pros for clusters n Large supercomputers are expensive and suffer from a short useful life span n Performance of workstations and PCs is rapidly improving n The communications bandwidth between workstations is increasing as new networking technologies and protocols are implemented in LANs and WANs. n Workstation clusters are easier to integrate into existing networks than special parallel computers. n Use of clusters of workstations as a distributed computing resource is very cost effective - incremental growth or update of system!!!
No polemical discussion, just statement… Mainframe Vector Supercomputer Mini Computer Workstation PC 1984 from R. Buyya GigaEthernet Giganet SCI Myrinet …
The Myrinet technology n Switch –full crossbar –wormhole source routing –small latency n Network interface –embedded RISC processor –programmable –local memory –several DMA engines Current specifications: Up to 200Mhz processor Up to 8MB local memory 64bit/66Mhz PCI bus (528 MB/s peak) 250 MB/s full duplex links
The raw performance is here, but… n the traditional communication software fail to bring the hardware performance to the applications Myrinet Traditional communication layers Optimized communication layers 200mph40mph 180mph 35mph 175mph
Going faster by taking shortcuts
Our communication architecture n Provides a complete suite for high-performance communications.Focus on Myrinet-based clusters n Viewed as layers, but by-passes as much as possible the OS Myrinet physical layer BIP BIP-SMP MPI-BIP programmable NICs break the traditional spatial distribution of tasks
BIP, the lowest protocol level n Basic Interface for Parallelism –very basic API –provides a library, a kernel module and a MCP –definitely not for the end-user n Optimizations for –latency –maximum throughput –the throughput increase n The implementation performs –reduction of the data critical path –distinction between small and large messages –burst or write combining for host NIC –optimal cache usage –cache snooping for NIC host (monitoring of the PCI bus) –buffer alignment –optimal fragment size… Myrinet BIP BIP-SMP MPI-BIP
n Avoids handshakes between the host and the NIC n Uses PIO to a NIC FIFO on the sending side and an extra memory copy on the receiving side BIP, small message strategy
n Use DMA both on the send side and receive side: higher bandwidth, offload the CPU n Zero-copy mechanism, pipelined transmission BIP, large message strategy
BIP-SMP: a low level for SMP machines n SMP viewed as best performance/price ratio architectures (2 or 4 proc.) n BIP-SMP provides –manage concurrent accesses to the NIC –low latency intra-node communications –BIP equivalent inter-node communication –total transparency for the applications and end-users
BIP-SMP: Moving data between processes
MPI-BIP: the communication middleware n MPI-BIP adds high-level features to BIP –based on the MPICH implementation –provides a portable and widely-used API –implements a credit-based flow control for small messages –request FIFO for multiple non-blocking operations –provides segmentation/reassembly features to avoid timeouts
Working with the BIP software suite n installation –run configure n compilation and linkage –several libraries: bip, bip-smp, mpi –compile with bipcc n Submitting jobs and monitoring nodes –run myristat to know which nodes are available –run bipconf to configure the virtual machine –use bipload to lunch programs
WebCM: a high level management tool n web-based management tool n integrates existing solutions into a common framework
The WebCM user interface graphical interface for myristat and bipconf n allows submission of jobs through batch packages n shows the user's virtual machine definition and the user's runnning processes n addition of fonctionnalities is performed by incorporating new software packages
Latency: BIP and MPI-BIP
Throughput: BIP and MPI-BIP
BIP-SMP: intra-node communications
BIP-SMP: inter-node communications
What run on our clusters? n Genomic simulation n Fluid dynamic n Discrete Event Parallel Simulation n Distributed Shared Memory System n Want to know more? –getting the distribution –getting the documentation