Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences.

Slides:



Advertisements
Similar presentations
Presented by Structure of MPI-3 Rich Graham. 2 Current State of MPI-3 proposals Many working groups have several proposal being discussed ==> standard.
Advertisements

© 2003 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Performance Measurements of a User-Space.
1 Tuning for MPI Protocols l Aggressive Eager l Rendezvous with sender push l Rendezvous with receiver pull l Rendezvous blocking (push or pull)
Distributed System Services Prepared By:- Monika Patel.
Presented by Dealing with the Scale Problem Innovative Computing Laboratory MPI Team.
Proprietary or Commodity? Interconnect Performance in Large-scale Supercomputers Author: Olli-Pekka Lehto Supervisor: Prof. Jorma Virtamo Instructor: D.Sc.
1Managed by UT-Battelle for the Department of Energy Graham_OpenMPI_SC08 1Managed by UT-Battelle for the Department of Energy Richard L. Graham Computer.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
MPI and C-Language Seminars Seminar Plan  Week 1 – Introduction, Data Types, Control Flow, Pointers  Week 2 – Arrays, Structures, Enums, I/O,
1 Performance Modeling l Basic Model »Needed to evaluate approaches »Must be simple l Synchronization delays l Main components »Latency and Bandwidth »Load.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
Performance Analysis of MPI Communications on the SGI Altix 3700 Nor Asilah Wati Abdul Hamid, Paul Coddington, Francis Vaughan Distributed & High Performance.
Low Overhead Fault Tolerant Networking (in Myrinet)
1 Parallel Computing—Introduction to Message Passing Interface (MPI)
1 Message protocols l Message consists of “envelope” and data »Envelope contains tag, communicator, length, source information, plus impl. private data.
Nor Asilah Wati Abdul Hamid, Paul Coddington, Francis Vaughan School of Computer Science, University of Adelaide IPDPS - PMEO April 2006 Comparison of.
AMMPI - Summary Active Messages–2 (AM) implementation over MPI version 1.1 –Porting is trivial - works on virtually any platform that has MPI 1.1 –Often.
High Performance Communication using MPJ Express 1 Presented by Jawad Manzoor National University of Sciences and Technology, Pakistan 29 June 2015.
Nor Asilah Wati Abdul Hamid, Paul Coddington. School of Computer Science, University of Adelaide PDCN FEBRUARY 2007 AVERAGES, DISTRIBUTIONS AND SCALABILITY.
1 Performance Evaluation of Gigabit Ethernet & Myrinet
1 What is message passing? l Data transfer plus synchronization l Requires cooperation of sender and receiver l Cooperation not always apparent in code.
Parallel Programming with Java
Protocol-Dependent Message-Passing Performance on Linux Clusters Dave Turner – Xuehua Chen – Adam Oline This work is funded by the DOE MICS office.
New Direction Proposal: An OpenFabrics Framework for high-performance I/O apps OFA TAC, Key drivers: Sean Hefty, Paul Grun.
A Framework for Collective Personalized Communication Laxmikant V. Kale, Sameer Kumar, Krishnan Varadarajan.
SCTP versus TCP for MPI Brad Penoff, Humaira Kamal, Alan Wagner Department of Computer Science University of British Columbia Distributed Research Group.
Optimizing Threaded MPI Execution on SMP Clusters Hong Tang and Tao Yang Department of Computer Science University of California, Santa Barbara.
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
1 Choosing MPI Alternatives l MPI offers may ways to accomplish the same task l Which is best? »Just like everything else, it depends on the vendor, system.
The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.
The NE010 iWARP Adapter Gary Montry Senior Scientist
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Process Management Working Group Process Management “Meatball” Dallas November 28, 2001.
MPICH2 – A High-Performance and Widely Portable Open- Source MPI Implementation Darius Buntinas Argonne National Laboratory.
HPCA2001HPCA Message Passing Interface (MPI) and Parallel Algorithm Design.
MIMD Distributed Memory Architectures message-passing multicomputers.
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
August 22, 2005Page 1 of (#) Datacenter Fabric Workshop Open MPI Overview and Current Status Tim Woodall - LANL Galen Shipman - LANL/UNM.
1 Timing MPI Programs The elapsed (wall-clock) time between two points in an MPI program can be computed using MPI_Wtime : double t1, t2; t1 = MPI_Wtime();...
Integrating New Capabilities into NetPIPE Dave Turner, Adam Oline, Xuehua Chen, and Troy Benjegerdes Scalable Computing Laboratory of Ames Laboratory This.
Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.
Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.
University of Mannheim1 ATOLL ATOmic Low Latency – A high-perfomance, low cost SAN Patrick R. Haspel Computer Architecture Group.
Supporting Systolic and Memory Communication in iWarp CS258 Paper Summary Computer Science Jaein Jeong.
Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
ProActive components and legacy code Matthieu MOREL.
SLAAC/ACS API: A Status Report Virginia Tech Configurable Computing Lab SLAAC Retreat March 1999.
1 Lecture 4: Part 2: MPI Point-to-Point Communication.
Distributed systems (NET 422) Prepared by Dr. Naglaa Fathi Soliman Princess Nora Bint Abdulrahman University College of computer.
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture FIT5174 Distributed & Parallel Systems Lecture 5 Message Passing and MPI.
Interconnection network network interface and a case study.
Performance Evaluation of JXTA-* Communication Layers Mathieu Jan PARIS Research Group Paris, November 2004.
A uGNI-Based Asynchronous Message- driven Runtime System for Cray Supercomputers with Gemini Interconnect Yanhua Sun, Gengbin Zheng, Laximant(Sanjay) Kale.
Presented by NCCS Hardware Jim Rogers Director of Operations National Center for Computational Sciences.
Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
MPI Adriano Cruz ©2003 NCE/UFRJ e Adriano Cruz NCE e IM - UFRJ Summary n References n Introduction n Point-to-point communication n Collective.
1 Advanced MPI William D. Gropp Rusty Lusk and Rajeev Thakur Mathematics and Computer Science Division Argonne National Laboratory.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Presented by SciDAC-2 Petascale Data Storage Institute Philip C. Roth Computer Science and Mathematics Future Technologies Group.
Open MPI - A High Performance Fault Tolerant MPI Library Richard L. Graham Advanced Computing Laboratory, Group Leader (acting)
Oracle Architecture Overview
Cluster Computers.
Presentation transcript:

Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences

2 Graham_OpenMPI_SC07 Why does Open MPI exist?  Maximize all MPI expertise:  research/academia,  industry,  …elsewhere.  Capitalize on (literally) years of MPI research and implementation experience.  The sum is greater than the parts. Research/ academia Industry

3 Graham_OpenMPI_SC07 Current membership 14 members, 6 contributors 4 US DOE labs7 vendors 8 universities1 individual

4 Graham_OpenMPI_SC07 Key design feature: Components Formalized interfaces  Specifies “black box” implementation  Different implementations available at run-time  Can compose different systems on the fly Interface 3Interface 2Interface 1 Caller

5 Graham_OpenMPI_SC07 Point-to-point architecture BTL-GM MPool-GM Rcache BTL-OpenIB MPool-OpenIB Rcache BML-R2 PML-OB1/DR MPI MTL-MX (Myrinet) PML-CM MTL- Portals MTL-PSM (QLogic)

6 Graham_OpenMPI_SC07 Portals port: OB1 vs. CM OB1  Matching in main-memory  Short message: eager, buffer on receive  Long message: rendezvous  Rendezvous packet: 0 byte payload  Get message after match CM  Matching maybe on NIC  Short message: eager, buffer on receive  Long message: eager  Send all data  If Match: deliver directly to user buffer  No Match: discard payload, and get() user data after match

7 Graham_OpenMPI_SC07 Collective communications component structure PML OB1 CM DR CRCPW Allocator Basic Bucket BTL TCP Shared Mem. Infiband MTL Myrinet MX Portals PSM Topology Basic Utility Collective Basic Tuned Hierarchical Intercomm. Shared Mem. Non-blocking I/O Portals MPI Component Architecture (MCA) MPI API User application

8 Graham_OpenMPI_SC07 Benchmark Results

9 Graham_OpenMPI_SC07 NetPipe bandwidth data (MB/sec) Data Size (KBytes) Open MPI—CM Open MPI—OB1 Cray MPI Bandwidth (MBytes/sec)

10 Graham_OpenMPI_SC07 Zero byte ping-pong latency Open MPI—CM 4.91 µsec Open—OB µsec Cray MPI 4.78 µsec

11 Graham_OpenMPI_SC07 VH1—Total runtime Log 2 Processor Count VH-1 Wall Clock Time (sec) Open MPI—CM Open MPI—OB1 Cray MPI

12 Graham_OpenMPI_SC07 GTC—Total runtime Log 2 Processor Count Open MPI—CM Open MPI—OB1 Cray MPI GTC Wall Clock Time (sec)

13 Graham_OpenMPI_SC07 POP—Step runtime Log 2 Processor Count POP Time Step Wall Clock Time (sec) 10 Open MPI—CM Open MPI—OB1 Cray MPI

14 Graham_OpenMPI_SC07 Summary and future directions  Support for XT (Catamount and Compute Node Linux) within standard distribution  Performance (application and micro-benchmarks) comparable to that of Cray MPI  Support for recovery from process failure is being added

15 Graham_OpenMPI_SC07 Contact Richard L. Graham Tech Integration National Center for Computational Sciences (865) Graham_OpenMPI_SC07