Endpoints Plenary James Dinan Hybrid Working Group December 10, 2013.

Slides:



Advertisements
Similar presentations
MPI Message Queue Debugging Interface Chris Gottbrath Director, Product Management.
Advertisements

Threads, SMP, and Microkernels
MPI Message Passing Interface
Endpoints Proposal Update Jim Dinan MPI Forum Hybrid Working Group June, 2014.
Enabling MPI Interoperability Through Flexible Communication Endpoints
1 Non-Blocking Communications. 2 #include int main(int argc, char **argv) { int my_rank, ncpus; int left_neighbor, right_neighbor; int data_received=-1;
03/20/2003Parallel IR1 Papers on Parallel IR Agenda Introduction Paper 1:Inverted file partitioning schemes in multiple disk systems Paper 2: Parallel.
A component- and message-based architectural style for GUI software
Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.
Exploiting Spatial Locality in Data Caches using Spatial Footprints Sanjeev Kumar, Princeton University Christopher Wilkerson, MRL, Intel.
Group-Collective Communicator Creation Ticket #286 Non-Collective Communicator Creation in MPI. Dinan, et al., Euro MPI ‘11.
Toward Efficient Support for Multithreaded MPI Communication Pavan Balaji 1, Darius Buntinas 1, David Goodell 1, William Gropp 2, and Rajeev Thakur 1 1.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Extensibility, Safety and Performance in the SPIN Operating System Department of Computer Science and Engineering, University of Washington Brian N. Bershad,
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Lock Inference for Systems Software John Regehr Alastair Reid University of Utah March 17, 2003.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
MPI Point-to-Point Communication CS 524 – High-Performance Computing.
A. Frank - P. Weisberg Operating Systems Threads Implementation.
Measuring Network Performance of Multi-Core Multi-Cluster (MCMCA) Norhazlina Hamid Supervisor: R J Walters and G B Wills PUBLIC.
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.
1 I/O Management in Representative Operating Systems.
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing.
The hybird approach to programming clusters of multi-core architetures.
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
Processes Part I Processes & Threads* *Referred to slides by Dr. Sanjeev Setia at George Mason University Chapter 3.
User-Level Process towards Exascale Systems Akio Shimada [1], Atsushi Hori [1], Yutaka Ishikawa [1], Pavan Balaji [2] [1] RIKEN AICS, [2] Argonne National.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Non-Collective Communicator Creation in MPI James Dinan 1, Sriram Krishnamoorthy 2, Pavan Balaji 1, Jeff Hammond 1, Manojkumar Krishnan 2, Vinod Tipparaju.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
MPI3 Hybrid Proposal Description
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
Protocols and the TCP/IP Suite
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Threads, Thread management & Resource Management.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Threads and Processes.
© 2010 IBM Corporation Enabling Concurrent Multithreaded MPI Communication on Multicore Petascale Systems Gabor Dozsa 1, Sameer Kumar 1, Pavan Balaji 2,
Threads by Dr. Amin Danial Asham. References Operating System Concepts ABRAHAM SILBERSCHATZ, PETER BAER GALVIN, and GREG GAGNE.
User Datagram Protocol (UDP) Chapter 11. Know TCP/IP transfers datagrams around Forwarded based on destination’s IP address Forwarded based on destination’s.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Rebooting the “Persistent” Working Group MPI-3.next Tony Skjellum December 5, 2012.
Scalable RDMA Software Solution Sean Hefty Intel Corporation.
Minimizing Communication Latency to Maximize Network Communication Throughput over InfiniBand Design and Implementation of MPICH-2 over InfiniBand with.
1 Overview on Send And Receive routines in MPI Kamyar Miremadi November 2004.
Processes Introduction to Operating Systems: Module 3.
PMI: A Scalable Process- Management Interface for Extreme-Scale Systems Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Jayesh Krishna, Ewing.
Non-Collective Communicator Creation Tickets #286 and #305 int MPI_Comm_create_group(MPI_Comm comm, MPI_Group group, int tag, MPI_Comm *newcomm) Non-Collective.
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
Efficient Multithreaded Context ID Allocation in MPI James Dinan, David Goodell, William Gropp, Rajeev Thakur, and Pavan Balaji.
A new thread support level for hybrid programming with MPI endpoints EASC 2015 Dan Holmes, Mark Bull, Jim Dinan
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
1 Lecture 4: Part 2: MPI Point-to-Point Communication.
Chapter 4 Version 1 Virtual LANs. Introduction By default, switches forward broadcasts, this means that all segments connected to a switch are in one.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
Persistent Collective WG December 9, Basics Mirror regular nonblocking collective operations For each nonblocking MPI collective (including neighborhood.
MPI Communicator Assertions Jim Dinan Point-to-Point WG March 2015 MPI Forum Meeting.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Non-Collective Communicator Creation Ticket #286 int MPI_Comm_create_group(MPI_Comm comm, MPI_Group group, int tag, MPI_Comm *newcomm) Non-Collective Communicator.
Parallel Computing Presented by Justin Reschke
Contents 1.Overview 2.Multithreading Model 3.Thread Libraries 4.Threading Issues 5.Operating-system Example 2 OS Lab Sun Suk Kim.
SC’13 BoF Discussion Sean Hefty Intel Corporation.
Introduction to threads
Chapter 4: Threads.
Parallel Programming By J. H. Wang May 2, 2017.
Chapter 2 Processes and Threads Today 2.1 Processes 2.2 Threads
Chapter 15, Exploring the Digital Domain
A Message Passing Standard for MPP and Workstations
Multithreaded Programming
CS510 - Portland State University
Presentation transcript:

Endpoints Plenary James Dinan Hybrid Working Group December 10, 2013

Status of Endpoints 1.Proposal text #380 ready 2.Explored interoperability story [EuroMPI ‘13] 3.Exploring performance story [IJHPCA in prep] 4.Working on implementation in MPICH – Will be open source 5.Target: Formal reading in March

Motivation for Endpoints 1.Interoperability argument – On-node programming model – Multi-node models that use threads 2.Performance argument – Increase communication concurrency Preserve shared memory/node-level programming Make number of VA spaces free parameter – Reduce synchronization penalties Privatize thread communication state and resources 3

Achievable Network Performance (Dramatization) Network endpoint design evolving to support many cores Not real data, represents my personal views Gathering real data for paper, will present at next meeting 4

Impact of Queue Depth on Message Rate Brian Barrett, et al. [EuroMPI ‘13] Threads sharing a rank increase posted receive queue depth (x-axis) 5

Mapping of Ranks to Processes MPI provides a 1-to-1 mapping of ranks to processes This was good in the past Usage models and systems have evolved – Hybrid MPI+Threads programming – Ratio of core to network endpoint performance decreasing 6 Rank TTT Conventional Communicator Process Rank TT Process …

Endpoints Model Many-to-one mapping of ranks to processes – Threads act as first-class participants in MPI operations – Improve programmability of MPI + X – Threads drive independent network endpoints Endpoint: Set of resources that supports the independent execution of MPI communications – Endpoints have process semantics 7 Endpoints Communicator … Rank TTT Process Rank TTT Process Rank TTT Process Rank

Current THREAD_MULTIPLE Usage MPI message matching space: Two approaches to using THREAD_MULTIPLE 1.Match specific thread using the tag: – Partition the tag space to address individual threads – Limitations: Collectives – Multiple threads at a process can’t participate concurrently Wildcards – Multiple threads concurrently requires care 2.Match specific thread using the communicator: – Split threads across different communicators (e.g. Dup and assign) – Can use wildcards and collectives – However, limits connectivity of threads with each other 8

Implementation of Endpoints Two implementation strategies 1.Each rank is a network endpoint 2.Ranks are multiplexed on endpoints Effectively adds destination rank to matching 3.Combination of the above Potential to reduce threading overheads – Separate resources per thread Rank can represent distinct network resources Increase HFI/NIC concurrency – Separate software state per thread Per-endpoint message queues/matching Enable per-communicator threading levels FG-MPI implements “static” endpoints – A little different, still demonstrates implementation and performance benefits 9 Rank TTT Process Rank

Endpoints Interface int MPI_Comm_create_endpoints( MPI_Comm parent_comm, int my_num_ep, MPI_Info info, MPI_Comm *out_comm_hdls[]) – Out handle array takes TLS out of the implementation and off the critical path – Each process requests an independent number of endpoints – MPI_ERR_ENDPOINTS – Endpoints could not be created my_ep_comm MPI_COMM_WORLD Rank T Process Rank T T Process Rank T

Endpoints Proposal web/ticket/380 11