Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences
2 Graham_OpenMPI_SC07 Why does Open MPI exist? Maximize all MPI expertise: research/academia, industry, …elsewhere. Capitalize on (literally) years of MPI research and implementation experience. The sum is greater than the parts. Research/ academia Industry
3 Graham_OpenMPI_SC07 Current membership 14 members, 6 contributors 4 US DOE labs7 vendors 8 universities1 individual
4 Graham_OpenMPI_SC07 Key design feature: Components Formalized interfaces Specifies “black box” implementation Different implementations available at run-time Can compose different systems on the fly Interface 3Interface 2Interface 1 Caller
5 Graham_OpenMPI_SC07 Point-to-point architecture BTL-GM MPool-GM Rcache BTL-OpenIB MPool-OpenIB Rcache BML-R2 PML-OB1/DR MPI MTL-MX (Myrinet) PML-CM MTL- Portals MTL-PSM (QLogic)
6 Graham_OpenMPI_SC07 Portals port: OB1 vs. CM OB1 Matching in main-memory Short message: eager, buffer on receive Long message: rendezvous Rendezvous packet: 0 byte payload Get message after match CM Matching maybe on NIC Short message: eager, buffer on receive Long message: eager Send all data If Match: deliver directly to user buffer No Match: discard payload, and get() user data after match
7 Graham_OpenMPI_SC07 Collective communications component structure PML OB1 CM DR CRCPW Allocator Basic Bucket BTL TCP Shared Mem. Infiband MTL Myrinet MX Portals PSM Topology Basic Utility Collective Basic Tuned Hierarchical Intercomm. Shared Mem. Non-blocking I/O Portals MPI Component Architecture (MCA) MPI API User application
8 Graham_OpenMPI_SC07 Benchmark Results
9 Graham_OpenMPI_SC07 NetPipe bandwidth data (MB/sec) Data Size (KBytes) Open MPI—CM Open MPI—OB1 Cray MPI Bandwidth (MBytes/sec)
10 Graham_OpenMPI_SC07 Zero byte ping-pong latency Open MPI—CM 4.91 µsec Open—OB µsec Cray MPI 4.78 µsec
11 Graham_OpenMPI_SC07 VH1—Total runtime Log 2 Processor Count VH-1 Wall Clock Time (sec) Open MPI—CM Open MPI—OB1 Cray MPI
12 Graham_OpenMPI_SC07 GTC—Total runtime Log 2 Processor Count Open MPI—CM Open MPI—OB1 Cray MPI GTC Wall Clock Time (sec)
13 Graham_OpenMPI_SC07 POP—Step runtime Log 2 Processor Count POP Time Step Wall Clock Time (sec) 10 Open MPI—CM Open MPI—OB1 Cray MPI
14 Graham_OpenMPI_SC07 Summary and future directions Support for XT (Catamount and Compute Node Linux) within standard distribution Performance (application and micro-benchmarks) comparable to that of Cray MPI Support for recovery from process failure is being added
15 Graham_OpenMPI_SC07 Contact Richard L. Graham Tech Integration National Center for Computational Sciences (865) Graham_OpenMPI_SC07