Implementation and Optimization of MPI point-to-point communications on SMP-CMP clusters with RDMA capability.

Slides:



Advertisements
Similar presentations
1 Tuning for MPI Protocols l Aggressive Eager l Rendezvous with sender push l Rendezvous with receiver pull l Rendezvous blocking (push or pull)
Advertisements

1 What is message passing? l Data transfer plus synchronization l Requires cooperation of sender and receiver l Cooperation not always apparent in code.
Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,
1 Buffers l When you send data, where does it go? One possibility is: Process 0Process 1 User data Local buffer the network User data Local buffer.
Toward Efficient Support for Multithreaded MPI Communication Pavan Balaji 1, Darius Buntinas 1, David Goodell 1, William Gropp 2, and Rajeev Thakur 1 1.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
1 Parallel Computing—Introduction to Message Passing Interface (MPI)
1 Message protocols l Message consists of “envelope” and data »Envelope contains tag, communicator, length, source information, plus impl. private data.
A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing.
1 What is message passing? l Data transfer plus synchronization l Requires cooperation of sender and receiver l Cooperation not always apparent in code.
Message-Oriented Communication Synchronous versus asynchronous communications Message-Queuing System Message Brokers Example: IBM MQSeries 02 – 26 Communication/2.4.
1 TRAPEZOIDAL RULE IN MPI Copyright © 2010, Elsevier Inc. All rights Reserved.
Profile Guided MPI Protocol Selection for Point-to-Point Communication Calls 5/9/111 Aniruddha Marathe, David K. Lowenthal Department of Computer Science.
Optimizing Threaded MPI Execution on SMP Clusters Hong Tang and Tao Yang Department of Computer Science University of California, Santa Barbara.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
LECTURE9 NET301. DYNAMIC MAC PROTOCOL: CONTENTION PROTOCOL Carrier Sense Multiple Access (CSMA): A protocol in which a node verifies the absence of other.
1 Choosing MPI Alternatives l MPI offers may ways to accomplish the same task l Which is best? »Just like everything else, it depends on the vendor, system.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
1 Using HPS Switch on Bassi Jonathan Carter User Services Group Lead NERSC User Group Meeting June 12, 2006.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
MPI Communications Point to Point Collective Communication Data Packaging.
August 22, 2005Page 1 of (#) Datacenter Fabric Workshop Open MPI Overview and Current Status Tim Woodall - LANL Galen Shipman - LANL/UNM.
Charm Workshop CkDirect: Charm++ RDMA Put Presented by Eric Bohm CkDirect Team: Eric Bohm, Sayantan Chakravorty, Pritish Jetley, Abhinav Bhatele.
1 Overview on Send And Receive routines in MPI Kamyar Miremadi November 2004.
Integrating New Capabilities into NetPIPE Dave Turner, Adam Oline, Xuehua Chen, and Troy Benjegerdes Scalable Computing Laboratory of Ames Laboratory This.
The influence of system calls and interrupts on the performances of a PC cluster using a Remote DMA communication primitive Olivier Glück Jean-Luc Lamotte.
Optimizing Charm++ Messaging for the Grid Gregory A. Koenig Parallel Programming Laboratory Department of Computer.
1 Lecture 4: Part 2: MPI Point-to-Point Communication.
13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.
MPI implementation – collective communication MPI_Bcast implementation.
Scalable Data Aggregation for Dynamic Events in Sensor Networks Kai-Wei Fan, Sha Liu, Prasun Sinha Computer Science and Engineering, Ohio State University.
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
1 Advanced MPI William D. Gropp Rusty Lusk and Rajeev Thakur Mathematics and Computer Science Division Argonne National Laboratory.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Alternative system models
Software Connectors.
Fabric Interfaces Architecture – v4
MPI Point to Point Communication
Last Class: RPCs and RMI
MPI Message Passing Interface
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Blocking / Non-Blocking Send and Receive Operations
Transactional Memory Coherence and Consistency
تعريف التواصــل يرجع أصل كلمة التواصل Communicationإلي الكلمة اللاتينية communes ومعناها common أي " مشترك " أو "عام" وبالتالي فإن الاتصال كعملية.
DISTRIBUTED COMPUTING
MPI-Message Passing Interface
Message Passing Models
COP 4600 Operating Systems Spring 2011
Inter Process Communication (IPC)
Data Link Layer (cont’d)
Operating System Concepts
Switching Techniques.
COT 5611 Operating Systems Design Principles Spring 2012
Communication Two Way Street.
User-level Distributed Shared Memory
Channels.
Indiana University, Bloomington
Approximating the Buffer Allocation Problem Using Epochs
Send and Receive.
Typically for using the shared memory the processes should:
Channels.
Data Link Layer (cont’d)
Non blocking communications in RK dynamics
More Quiz Questions Parallel Programming MPI Non-blocking, synchronous, asynchronous message passing routines ITCS 4/5145 Parallel Programming, UNC-Charlotte,
Channels.
Myrinet 2Gbps Networks (
More Quiz Questions Parallel Programming MPI Collective routines
Programming Parallel Computers
Presentation transcript:

Implementation and Optimization of MPI point-to-point communications on SMP-CMP clusters with RDMA capability

MPI point-to-point communication Pairing MPI_Send with MPI_Recv or MPI_Isend/MPI_Irecv/MPI_Wait There is an implicit synchronization – Receiver can complete only after sender performs the send; the communication operation cannot complete until both sender and receiver are ready.

MPI point-to-point communication Use different protocol for large and small messages Eager protocol for small messages Low latency communication Sender not depending on receiver Rendevuous protocols for large messages No message copy

Eager protocol

Rendezvous protocol

Existing RDMA based small message channel – the MVAPICH design [Liu03]

Our improved design – eliminating persistent buffer association

Further improvement – node-shared Small message channels

Optimizing Rendezvous protocol – ideal rendezvous protocol SS – Send start, SW – Send wait, RS– Receive start, RW – Receive wait. When both sender and receiver have initiate the communication, data transfer should start

Optimizing Rendezvous protocol – the problem Poor progress

Optimizing Rendezvous protocol – the problem The performance is heavily affected by the timing of the events? Is it possible to have near optimal performance for all timing situations?

How to use these protocols Dynamic protocol selection – design maga-protocol that combines multiple of these protocols. Profile-guided optimization – use profiling to determine the timing information, and use the timing information to select the protocol. Compiler-assisted optimization – use compiler analysis to determine the timing information, and use the timing information to select the best performing protocol.