Presentation is loading. Please wait.

Presentation is loading. Please wait.

Open MPI - A High Performance Fault Tolerant MPI Library Richard L. Graham Advanced Computing Laboratory, Group Leader (acting)

Similar presentations


Presentation on theme: "Open MPI - A High Performance Fault Tolerant MPI Library Richard L. Graham Advanced Computing Laboratory, Group Leader (acting)"— Presentation transcript:

1 Open MPI - A High Performance Fault Tolerant MPI Library Richard L. Graham Advanced Computing Laboratory, Group Leader (acting)

2 Overview Open MPI Collaboration MPI Run-time Future directions

3 Collaborators Los Alamos National Laboratory (LA-MPI) Sandia National Laboratory Indiana University (LAM/MPI) The University of Tennessee (FT-MPI) High Performance Computing Center, Stuttgart (PACX-MPI) University of Houston Cisco Systems Mellanox Voltaire Sun Myricom IBM QLogic URL: www.open-mpi.org

4 A Convergence of Ideas Robustness (CSU) PACX-MPI (HLRS) LAM/MPI (IU) LA-MPI (LANL) FT-MPI (U of TN) Open MPI Fault Detection (LANL, Industry) Grid (many) Autonomous Computing (many) FDDP (Semi. Mfg. Industry) ResilientComputingSystems OpenRTE

5 Components Formalized interfaces  Specifies “black box” implementation  Different implementations available at run-time  Can compose different systems on the fly Interface 1Interface 2Interface 3 Caller

6 Performance Impact

7 MPI

8 Two Sided Communications

9 P2P Component Frameworks

10 Shared Memory - Bandwidth

11 Shared Memory - Latency

12 IB Performance Latency Message SizeLatency - Open MPILatency - MVAPICH 03.099.6 (anomaly?) 13.483.09 323.603.30 1284.484.16 20487.938.67 819215.7222.86 1638427.1429.37

13 IB Performance Bandwidth

14 GM Performance Data Ping-Pong Latency (usec) Data SizeOpen MPIMPICH-GM 0 Byte8.138.07 8 Byte8.328.22 64 Byte8.688.65 256 Byte12.5212.11

15 GM Performance Data Ping-Pong Latency (usec) - Data FT Data SizeOpen MPI - OB1 Open MPI - FT LA-MPI - FT 0 Byte5.248.659.2 8 Byte5.508.679.26 64 Byte6.009.079.45 256 Byte8.5213.0113.54

16 GM Performance Data Ping-Pong Bandwidth

17 MX Ping-Pong Latency (usec) Message SizeOpen MPI - MTL MPICH - MX 03.142.87 83.222.89 643.913.6 2565.765.25

18 MX Performance Data Ping-Pong Bandwidth (MB/sec)

19 XT3 Performance Latency Implementation1 Byte Latency Native Portals5.30us MPICH-27.14us Open MPI8.50us

20 XT3 Performance Bandwidth

21 Collective Operations

22 MPI Reduce - Performance

23 MPI Broadcast - Performance

24 MPI Reduction - II

25 Open RTE

26 Seamless, transparent environment for high- performance applications Inter-process communications within and across cells Distributed publish/subscribe registry Supports event-driven logic across applications, cells Persistent, fault tolerant Dynamic “spawn” of processes, applications both within and across cells Grid Single Computer Cluster Open RTE - Design Overview

27 Grid Single Computer Cluster Open RTE - Components

28 General Purpose Registry Cached, distributed storage/retrieval system  All common data types plus user-defined  Heterogeneity between storing process and recipient automatically resolved Publish/subscribe  Support event-driven coordination and notification  Subscribe to individual data elements, groups of elements, wildcard collections  Specify actions that trigger notifications

29 Subscription Services Subscribe to container and/or keyval entry  Can be entered before data arrives  Specifies data elements to be monitored Container tokens and/or data keys Wildcards supported  Specifies action that generates event Data entered, modified, deleted Number of matching elements equals, exceeds, is less than specified level Number of matching elements transitions (increases/decreases) through specified level Events generate message to subscriber  Includes specified data elements  Asynchronously delivered to specified callback function on subscribing process

30 Future Directions

31 Revise MPI Standard Clarify standard Standardized the interface Simplify standard Make the standard more “H/W Friendly”

32 Beyond Simple Performance Measures Performance and scalability are important, but What about future HPC systems  Heterogeneity Multi-core Mix of processors Mix of networks  Fault-tolerance

33 Focus on Programmability Performance and Scalability are important, but what about  Programmability


Download ppt "Open MPI - A High Performance Fault Tolerant MPI Library Richard L. Graham Advanced Computing Laboratory, Group Leader (acting)"

Similar presentations


Ads by Google