Aamir Shafi http://mpj-express.org http://acet.rdg.ac.uk/projects/mpj MPJ Express: An Implementation of Message Passing Interface (MPI) in Java Aamir Shafi.

Slides:

Advertisements

Similar presentations

MPI Message Passing Interface Portable Parallel Programs.

Advertisements

MPI Message Passing Interface

1 Introduction to Collective Operations in MPI l Collective operations are called by all processes in a communicator. MPI_BCAST distributes data from one.

Its.unc.edu 1 Collective Communication University of North Carolina - Chapel Hill ITS - Research Computing Instructor: Mark Reed

Programming Parallel Hardware using MPJ Express

1 Parallel Computing—Higher-level concepts of MPI.

Message-Passing Programming and MPI CS 524 – High-Performance Computing.

1 Parallel Computing—Introduction to Message Passing Interface (MPI)

The environment of the computation Declarations introduce names that denote entities. At execution-time, entities are bound to values or to locations:

Communication in Distributed Systems –Part 2

High Performance Communication using MPJ Express 1 Presented by Jawad Manzoor National University of Sciences and Technology, Pakistan 29 June 2015.

1 What is message passing? l Data transfer plus synchronization l Requires cooperation of sender and receiver l Cooperation not always apparent in code.

Parallel Programming with Java

Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.

1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Programming and Algorithms – MPI Collective Operations David Monismith CS599 Feb. 10, 2015 Based upon MPI: A Message-Passing Interface Standard.

HPCA2001HPCA Message Passing Interface (MPI) and Parallel Algorithm Design.

Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.

Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.

MPJ Express Alon Vice Ayal Ofaim. Contributors 2 Aamir Shafi Jawad Manzoor Kamran Hamid Mohsan Jameel Rizwan Hanif Amjad Aziz Bryan Carpenter Mark Baker.

MPI (continue) An example for designing explicit message passing programs Advanced MPI concepts.

Chapter 4 Message-Passing Programming. The Message-Passing Model.

Core Java Introduction Byju Veedu Ness Technologies httpdownload.oracle.com/javase/tutorial/getStarted/intro/definition.html.

Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.

1 BİL 542 Parallel Computing. 2 Message Passing Chapter 2.

1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.

CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.

MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.

Programming Parallel Hardware using MPJ Express By A. Shafi.

1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.

Parallel and Distributed Programming: A Brief Introduction Kenjiro Taura.

Introduction to Operating Systems Concepts

OPERATING SYSTEM CONCEPT AND PRACTISE

Module 3: Operating-System Structures

Last Class: Introduction

Introduction to MPI Programming Ganesh C.N.

Distributed Shared Memory

A Closer Look at Instruction Set Architectures

MPI: Portable Parallel Programming for Scientific Computing

UNIT – Microcontroller.

MatLab Programming By Kishan Kathiriya.

3- Parallel Programming Models

MPI Message Passing Interface

University of Technology

An Introduction to Parallel Programming with MPI

MPJ: The second generation ‘MPI for Java’

Pluggable Architecture for Java HPC Messaging

More on MPI Nonblocking point-to-point routines Deadlock

MPI-Message Passing Interface

Message Passing Models

Chapter 2: System Structures

Chapter 3: Operating-System Structures

MPJ (Message Passing in Java): The past, present, and future

MPI: Message Passing Interface

Threads Chapter 4.

Introduction to parallelism and the Message Passing Interface

Chapter 2: Operating-System Structures

More on MPI Nonblocking point-to-point routines Deadlock

Introduction to Operating Systems

MPJ: A Java-based Parallel Computing System

Outline Chapter 2 (cont) OS Design OS structure

(Computer fundamental Lab)

Chapter 2: Operating-System Structures

5- Message-Passing Programming

MPI Message Passing Interface

Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.

Presentation transcript:

Aamir Shafi http://mpj-express.org http://acet.rdg.ac.uk/projects/mpj MPJ Express: An Implementation of Message Passing Interface (MPI) in Java Aamir Shafi http://mpj-express.org http://acet.rdg.ac.uk/projects/mpj December 7, 2018

Writing Parallel Software There are mainly two approaches for writing parallel software: Software that can be executed on parallel hardware to exploit computational and memory resources The first approach is to use messaging libraries (packages) written in already existing languages like C, Fortran, and Java: Message Passing Interface (MPI) Parallel Virtual Machine (PVM) The second and more radical approach is to provide new languages: HPC has a history of novel parallel languages High Performance Fortran (HPF) Unified Parallel C (UPC) In this talk we talk about an implementation of MPI in Java called MPJ Express December 7, 2018

Introduction to Java for HPC Java was released by Sun in 1996: A mainstream language in software industry, Attractive features include: Portability, Automatic garbage collection, Type-safety at compile time and runtime, Built-in support for multi-threading: A possible option to provide nested parallelism on multi-core systems, Performance: Just-In-Time compilers convert source code to byte code, Modern JVMs perform compilation from byte code to native machine code on the fly But Java has safety features that may limit performance. December 7, 2018

Introduction to Java for HPC Three existing approaches to Java messaging: Pure Java (Sockets based), Java Native Interface (JNI), and Remote Method Invocation (RMI), mpiJava has been perhaps the most popular Java messaging system mpiJava (http://www.hpjava.org/mpiJava.html) MPJ/Ibis (http://www.cs.vu.nl/ibis/mpj.html) Motivation for a new Java messaging system: Maintain compatibility with Java threads by providing thread-safety, Handle contradicting issues of high-performance and portability. Outline the project December 7, 2018

Distributed Memory Cluster CPU Memory Proc 1 Proc 2 Proc 0 message LAN Ethernet Myrinet Infiniband etc Proc 3 Proc 7 Proc 6 Proc 4 Proc 5 December 7, 2018

December 7, 2018

Write machines files December 7, 2018

Bootstrap MPJ Express runtime December 7, 2018

Write Parallel Program December 7, 2018

Compile and Execute December 7, 2018

Introduction to MPJ Express MPJ Express is an implementation of a Java messaging system, based on Java bindings: Will eventually supersede mpiJava. Aamir Shafi, Bryan Carpenter, and Mark Baker Thread-safe communication devices using Java NIO and Myrinet: Maintain compatibility with Java threads, The buffering layer provides explicit memory management instead of relying on the garbage collector, Runtime system for portable bootstrapping MPJ Express has 80K lines of source-code including test-cases. First messaging system that provides “full” implementation of mpiJava 1.2 bindings, for examples, communicators, topologies, derived datatypes. Thread-safety cannot be implemented by using “synchronized” keyword of putting locks around send and/or recv methods. Careful analysis and fine-grain locking is required to implement thread-safety. December 7, 2018

James Gosling Says… December 7, 2018

Who is using MPJ Express? First released in September 2005 under LGPL (an open-source licence): Approximately 1000 users all around the world Some projects using this software: Cartablanca is a simulation package that uses Jacobian-Free-Newton-Krylov (JFNK) methods to solve non-linear problems The project is done at Los Alamos National Lab (LANL) in the US Researchers at University of Leeds, UK have used this software in Modelling and Simulation in e-Social Science (MoSeS) project Teaching Purposes: Parallel Programming using Java (PPJ): http://www.sc.rwth-aachen.de/Teaching/Labs/PPJ05/ Parallel Processing SS 2006: http://tramberend.inform.fh-hannover.de/ December 7, 2018

MPJ Express Design December 7, 2018

Presentation Outline Implementation Details: Point-to-point communication Communicators, groups, and contexts Process topologies Derived datatypes Collective communications MPJ Express Buffering Layer Runtime System Performance Evaluation December 7, 2018

Java NIO Device Uses non-blocking I/O functionality, Implements two communication protocols: Eager-send protocol for small messages, Rendezvous protocol for large messages, Locks around communication methods results in deadlocks: In Java, the keyword synchronized ensures that only one object can call synchronized method at a time, A process sending a message to itself using synchronous send, Locks for thread-safety: Writing messages: A lock for send-communication-sets, Locks for destination channels: One for every destination process, Obtained one after the other, Reading messages: A lock for receive-communication-sets. December 7, 2018

Standard mode with eager send protocol (small messages) December 7, 2018

Standard mode with rendezvous protocol (large messages) December 7, 2018

MPJ Express Buffering Layer MPJ Express requires a buffering layer: To use Java NIO: SocketChannels use byte buffers for data transfer, To use proprietary networks like Myrinet efficiently, Implement derived datatypes, Various implementations are possible based on actual storage medium, Direct or indirect ByteBuffers, An mpjbuf buffer object consists of: A static buffer to store primitive datatypes, A dynamic buffer to store serialized Java objects, Creating ByteBuffers on the fly is costly: Memory management is based on Knuth’s buddy algorithm, Two implementations of memory management. December 7, 2018

MPJ Express Buffering Layer Frequent creation and destruction of communication buffers hurts performance. To tackle this, MPJ Express requires a buffering layer: Provides two implementations of Knuth’s buddy algorithm, To use Java NIO and proprietary networks: Direct ByteBuffers, Implement derived datatypes Performance analysis revealed “choking behaviour”. We found out that creating intermediate buffers hurt the performance. There is no point to exhaust garbage collector -- Java provides automatic memory management but it requires careful programming to get best performance. December 7, 2018

Presentation Outline Implementation Details: Point-to-point communication Communicators, groups, and contexts Process topologies Derived datatypes Collective communications MPJ Express Buffering Layer Runtime System Performance Evaluation December 7, 2018

Communicators, groups, and contexts MPI provides a higher level abstraction to create parallel libraries: Safe communication space Group scope for collective operations Process Naming Communicators + Groups provide: Process Naming (instead of IP address + ports) Contexts: Safe communication December 7, 2018

What is a group? A data-structure that contains processes Main functionality: Keep track of ranks of processes Explanation of figure Group A contains eight processes Group B and C are created from Group A All group operations are local (no communication with remote processes) December 7, 2018

Example of a group operation(Union) Explanation of union operation Two processes a and d are in both groups: Thus, six processes are executing this operation Each group has its own view of this group operations: Apply theory of relativity Re-assigning ranks in new groups: Process 0 in group A is re-assigned rank 0 in Group C Process 0 in group B is re-assigned rank 4 in Group C If any existing process does not make it into the new group, it returns MPI.GROUP_EMPTY December 7, 2018

What are communicators? A data-structure that contains groups (and thus processes) Why is it useful: Process naming, ranks are names for application programmers Easier than IPaddress + ports Group communications as well as point to point communication There are two types of communicators, Intracommunicators: Communication within a group Intercommunicators: Communication between two groups (must be disjoint) December 7, 2018

What are contexts? An unique integer: An additional tag on the messages Each communicator has a distinct context that provides a safe communication universe: A context is agreed upon by all processes when a communicator is built Intracommunicators has two contexts: One for point-to-point communications One for collective communications, Intercommunicators has two contexts: Explained in the coming slides December 7, 2018

Process topologies Used to specify processes in a geometric shape Virtual topologies: have no connection with the physical layout of machines: Its possible to make use of underlying machine architecture These virtual topologies can be assigned to processes in an Intracommunicator MPI provides: Cartesian topology Graph topology December 7, 2018

Cartesian topology: Mapping four processes onto 2x2 topology Each process is assigned a coordinate: Rank 0: (0,0) Rank 1: (1,0) Rank 2: (0,1) Rank 3: (1,1) Uses: Calculate rank by knowing grid (not globus one!) position Calculate grid positions from ranks Easier to locate rank of neighbours Applications may have communication patterns: Lots of messaging with immediate neighbours December 7, 2018

Periods in cartesian topology Axis 1 (y-axis is periodic): Processes in top and bottom rows have valid neighbours towards top and bottom respectively Axis 0 (x-axis is non-periodic): Processes in right and left column have undefined neighbour towards right and left respectively December 7, 2018

Derived datatypes Besides, basic datatypes, it is possible to communicate heterogeneous, non-contiguous data. Contiguous Indexed Vector Struct December 7, 2018

Indexed datatype The elements that may form this datatype should be: Same types At non-contiguous locations Add flexibility by specifying displacements int SIZE = 4; int [] blklen = new int[DIM],displ = new int[DIM]; for(i=0 ; i<DIM ; i++) { blklen[i]=DIM-i; displ[i]=(i*DIM)+i; } double[] params = new double[SIZE*SIZE]; double[] rparams = new double[SIZE*SIZE]; Datatype i = Datatype.Indexed(blklen, displ, MPI.INT); //array_of_block_lengths, array_displacements Send(params,0,1,i,dst,tag); //0 is offset, 1 is count Recv(rparams,0,1,i,src,tag); December 7, 2018

December 7, 2018

Presentation Outline Implementation Details: Runtime System Point-to-point communication Communicators, groups, and contexts Process topologies Derived datatypes Collective communications Runtime System Thread-safety in MPJ Express Performance Evaluation December 7, 2018

Collective communications Provided as a convenience for application developers: Save significant development time Efficient algorithms may be used Stable (tested) Built on top of point-to-point communications, These operations include: Broadcast, Barrier, Reduce, Allreduce, Alltoall, Scatter, Scan, Allscatter Versions that allows displacements between the data December 7, 2018

Broadcast, scatter, gather, allgather, alltoall December 7, 2018 Image from MPI standard doc

Reduce collective operations MPI.PROD MPI.SUM MPI.MIN MPI.MAX MPI.LAND MPI.BAND MPI.LOR MPI.BOR MPI.LXOR MPI.BXOR MPI.MINLOC MPI.MAXLOC December 7, 2018

Barrier with Tree Algorithm December 7, 2018

Execution of barrier with eight processes Eight processes, thus forms only one group Each process exchanges an integer 4 times Overlaps communications well December 7, 2018

Intracomm.Bcast( … ) Sends data from a process to all the other processes Code from adlib: A communication library for HPJava The current implementation is based on n-ary tree: Limitation: broadcasts only from rank=0 Generated dynamically Cost: O( log2(N) ) MPICH1.2.5 uses linear algorithm: Cost O(N) MPICH2 has much improved algorithms LAM/MPI uses n-ary trees: Limitation, broadcast from rank=0 December 7, 2018

Broadcasting algorithm, total processes=8, root=0 December 7, 2018

Presentation Outline Implementation Details: Runtime System Point-to-point communication Communicators, groups, and contexts Process topologies Derived datatypes Collective communications Runtime System Thread-safety in MPJ Express Performance Evaluation December 7, 2018

The Runtime System December 7, 2018

Thread-safety in MPI The MPI 2.0 specification introduced the notion of thread-compliant MPI implementation, Four levels of thread-safety: MPI_THREAD_SINGLE, MPI_THREAD_FUNNELED, MPI_THREAD_SERIALIZED, MPI_THREAD_MULTIPLE, A blocked thread should not halt the execution of other threads, “Issues in Developing Thread-Safe MPI Implementation” by Gropp et al. December 7, 2018

Presentation Outline Implementation Details: Runtime System Point-to-point communication Communicators, groups, and contexts Process topologies Derived datatypes Collective communications Runtime System Thread-safety in MPJ Express Performance Evaluation December 7, 2018

Latency on Fast Ethernet December 7, 2018

Throughput on Fast Ethernet December 7, 2018

Latency on Gigabit Ethernet December 7, 2018

Throughput on GigE December 7, 2018

Choking experience 1 December 7, 2018

Latency on Myrinet December 7, 2018

Throughput on Myrinet December 7, 2018

Questions ? December 7, 2018