Open MPI - A High Performance Fault Tolerant MPI Library Richard L. Graham Advanced Computing Laboratory, Group Leader (acting)

Slides:



Advertisements
Similar presentations
Presented by Fault Tolerance and Dynamic Process Control Working Group Richard L Graham.
Advertisements

Providing Fault-tolerance for Parallel Programs on Grid (FT-MPICH) Heon Y. Yeom Distributed Computing Systems Lab. Seoul National University.
Presented by Dealing with the Scale Problem Innovative Computing Laboratory MPI Team.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
Harness and H2O Alternative approaches to metacomputing Distributed Computing Laboratory Emory University, Atlanta, USA
Event Services for High Performance Computing Peng Zheng.
Big Data Open Source Software and Projects ABDS in Summary XVI: Layer 13 Part 1 Data Science Curriculum March Geoffrey Fox
Ensuring Non-Functional Properties. What Is an NFP?  A software system’s non-functional property (NFP) is a constraint on the manner in which the system.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Principles for Collaboration Systems Geoffrey Fox Community Grids Laboratory Indiana University Bloomington IN 47404
Hossein Bastan Isfahan University of Technology 1/23.
Priority Research Direction Key challenges Fault oblivious, Error tolerant software Hybrid and hierarchical based algorithms (eg linear algebra split across.
Parallel and Distributed Simulation FDK Software.
SensIT PI Meeting, January 15-17, Self-Organizing Sensor Networks: Efficient Distributed Mechanisms Alvin S. Lim Computer Science and Software Engineering.
PVM and MPI What is more preferable? Comparative analysis of PVM and MPI for the development of physical applications on parallel clusters Ekaterina Elts.
Checkpoint & Restart for Distributed Components in XCAT3 Sriram Krishnan* Indiana University, San Diego Supercomputer Center & Dennis Gannon Indiana University.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco PublicOFA Open MPI 1 Open MPI Progress Jeff Squyres.
Open Resilient Cluster Manager: A Distributed Approach to a Resilient Router Manager Ralph H. Castain, Ph.D. Cisco Systems, Inc.
Architecting Web Services Unit – II – PART - III.
22 April 2005EPSRC e-Science Meeting AMUSE Autonomic Management of Ubiquitous Systems for e-Health Prof. J. Sventek University of Glasgow
MPICH2 – A High-Performance and Widely Portable Open- Source MPI Implementation Darius Buntinas Argonne National Laboratory.
An Analysis of 10-Gigabit Ethernet Protocol Stacks in Multi-core Environments G. Narayanaswamy, P. Balaji and W. Feng Dept. of Comp. Science Virginia Tech.
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
Rakhi Anand Optimizing the Execution of Parallel Applications in Volunteer Environments Parallel Software Technologies Laboratory Department of Computer.
August 22, 2005Page 1 of (#) Datacenter Fabric Workshop Open MPI Overview and Current Status Tim Woodall - LANL Galen Shipman - LANL/UNM.
PMI: A Scalable Process- Management Interface for Extreme-Scale Systems Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Jayesh Krishna, Ewing.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
A Job Pause Service under LAM/MPI+BLCR for Transparent Fault Tolerance Chao Wang, Frank Mueller North Carolina State University Christian Engelmann, Stephen.
InfiniBand at Sun Carl Hensler Distinguished Engineer Solaris Engineering Sun Microsystems.
Enabling Self-management of Component-based High-performance Scientific Applications Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory.
Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences.
Types of Operating Systems 1 Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
1 ©2004 Board of Trustees of the University of Illinois Computer Science Overview Laxmikant (Sanjay) Kale ©
Grid Programming Models: Author: Craig Lee and Domenico Talia Presenter: Tuan Cameron Author: Craig Lee and Domenico Talia Presenter: Tuan Cameron Current.
GT3 Index Services Lecture for Cluster and Grid Computing, CSCE 490/590 Fall 2004, University of Arkansas, Dr. Amy Apon.
DataTAG is a project funded by the European Union International School on Grid Computing, 23 Jul 2003 – n o 1 GridICE The eyes of the grid PART I. Introduction.
Presented by Fault Tolerance Challenges and Solutions Al Geist Network and Cluster Computing Computational Sciences and Mathematics Division Research supported.
HPC University Requirements Analysis Team Training Analysis Summary Meeting at PSC September Mary Ann Leung, Ph.D.
Orion Contextbroker PROF. DR. SERGIO TAKEO KOFUJI PROF. MS. FÁBIO H. CABRINI PSI – 5120 – TÓPICOS EM COMPUTAÇÃO EM NUVEM
Robust Task Scheduling in Non-deterministic Heterogeneous Computing Systems Zhiao Shi Asim YarKhan, Jack Dongarra Followed by GridSolve, FT-MPI, Open MPI.
Computer Networks Laboratory project. In cooperation with Mellanox Technologies Ltd. Guided by: Crupnicoff Diego. Gurewitz Omer. Students: Cohen Erez.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Pervasive Computing MIT SMA 5508 Spring 2006 Larry Rudolph 1 Publish & Subscribe Larry Rudolph May 3, 2006 SMA 5508 & MIT
A Parallel Communication Infrastructure for STAPL
These slides are based on the book:
Productive Performance Tools for Heterogeneous Parallel Computing
Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016
Jack Dongarra University of Tennessee
Self Healing and Dynamic Construction Framework:
For Massively Parallel Computation The Chaotic State of the Art
Open Source distributed document DB for an enterprise
FT-MPI Survey Alan & Nathan.
Software Connectors.
CHAPTER 3 Architectures for Distributed Systems
The Power Of Generic Infrastructure
#01 Client/Server Computing
Design and Implementation of Audio/Video Collaboration System Based on Publish/subscribe Event Middleware CTS04 San Diego 19 January 2004 PTLIU Laboratory.
Replication Middleware for Cloud Based Storage Service
Inventory of Distributed Computing Concepts
The Narada Event Brokering System: Overview and Extensions
IDSS Lab – research directions Sept 6, 2002
Gateway and Web Services
Remarks on Peer to Peer Grids
Hybrid Programming with OpenMP and MPI
MPJ: A Java-based Parallel Computing System
Indirect Communication Paradigms (or Messaging Methods)
Indirect Communication Paradigms (or Messaging Methods)
Design.
#01 Client/Server Computing
Presentation transcript:

Open MPI - A High Performance Fault Tolerant MPI Library Richard L. Graham Advanced Computing Laboratory, Group Leader (acting)

Overview Open MPI Collaboration MPI Run-time Future directions

Collaborators Los Alamos National Laboratory (LA-MPI) Sandia National Laboratory Indiana University (LAM/MPI) The University of Tennessee (FT-MPI) High Performance Computing Center, Stuttgart (PACX-MPI) University of Houston Cisco Systems Mellanox Voltaire Sun Myricom IBM QLogic URL:

A Convergence of Ideas Robustness (CSU) PACX-MPI (HLRS) LAM/MPI (IU) LA-MPI (LANL) FT-MPI (U of TN) Open MPI Fault Detection (LANL, Industry) Grid (many) Autonomous Computing (many) FDDP (Semi. Mfg. Industry) ResilientComputingSystems OpenRTE

Components Formalized interfaces  Specifies “black box” implementation  Different implementations available at run-time  Can compose different systems on the fly Interface 1Interface 2Interface 3 Caller

Performance Impact

MPI

Two Sided Communications

P2P Component Frameworks

Shared Memory - Bandwidth

Shared Memory - Latency

IB Performance Latency Message SizeLatency - Open MPILatency - MVAPICH (anomaly?)

IB Performance Bandwidth

GM Performance Data Ping-Pong Latency (usec) Data SizeOpen MPIMPICH-GM 0 Byte Byte Byte Byte

GM Performance Data Ping-Pong Latency (usec) - Data FT Data SizeOpen MPI - OB1 Open MPI - FT LA-MPI - FT 0 Byte Byte Byte Byte

GM Performance Data Ping-Pong Bandwidth

MX Ping-Pong Latency (usec) Message SizeOpen MPI - MTL MPICH - MX

MX Performance Data Ping-Pong Bandwidth (MB/sec)

XT3 Performance Latency Implementation1 Byte Latency Native Portals5.30us MPICH-27.14us Open MPI8.50us

XT3 Performance Bandwidth

Collective Operations

MPI Reduce - Performance

MPI Broadcast - Performance

MPI Reduction - II

Open RTE

Seamless, transparent environment for high- performance applications Inter-process communications within and across cells Distributed publish/subscribe registry Supports event-driven logic across applications, cells Persistent, fault tolerant Dynamic “spawn” of processes, applications both within and across cells Grid Single Computer Cluster Open RTE - Design Overview

Grid Single Computer Cluster Open RTE - Components

General Purpose Registry Cached, distributed storage/retrieval system  All common data types plus user-defined  Heterogeneity between storing process and recipient automatically resolved Publish/subscribe  Support event-driven coordination and notification  Subscribe to individual data elements, groups of elements, wildcard collections  Specify actions that trigger notifications

Subscription Services Subscribe to container and/or keyval entry  Can be entered before data arrives  Specifies data elements to be monitored Container tokens and/or data keys Wildcards supported  Specifies action that generates event Data entered, modified, deleted Number of matching elements equals, exceeds, is less than specified level Number of matching elements transitions (increases/decreases) through specified level Events generate message to subscriber  Includes specified data elements  Asynchronously delivered to specified callback function on subscribing process

Future Directions

Revise MPI Standard Clarify standard Standardized the interface Simplify standard Make the standard more “H/W Friendly”

Beyond Simple Performance Measures Performance and scalability are important, but What about future HPC systems  Heterogeneity Multi-core Mix of processors Mix of networks  Fault-tolerance

Focus on Programmability Performance and Scalability are important, but what about  Programmability