multiprocessing and mpi4py Python for computational science

Slides:



Advertisements
Similar presentations
Parallel Processing with OpenMP
Advertisements

Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
Contemporary Languages in Parallel Computing Raymond Hummel.
CSE 1301 J Lecture 2 Intro to Java Programming Richard Gesick.
Hossein Bastan Isfahan University of Technology 1/23.
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
Distributed-Memory Programming Using MPIGAP Vladimir Janjic International Workhsop “Parallel Programming in GAP” Aug 2013.
Distributed shared memory. What we’ve learnt so far  MapReduce/Dryad as a distributed programming model  Data-flow (computation as vertex, data flow.
Python Concurrency Threading, parallel and GIL adventures Chris McCafferty, SunGard Global Services.
Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov.
GPU Architecture and Programming
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Chapter 4 – Threads (Pgs 153 – 174). Threads  A "Basic Unit of CPU Utilization"  A technique that assists in performing parallel computation by setting.
Chapter 4 Message-Passing Programming. The Message-Passing Model.
Threaded Programming Lecture 2: Introduction to OpenMP.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Exploring Parallelism with Joseph Pantoga Jon Simington.
MULTIPROCESSING MODULE OF PYTHON. CPYTHON  CPython is the default, most-widely used implementation of the Python programming language.  CPython - single-threaded.
NUMA Control for Hybrid Applications Kent Milfeld TACC May 5, 2015.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Parallel OpenFOAM CFD Performance Studies Student: Adi Farshteindiker Advisors: Dr. Guy Tel-Zur,Prof. Shlomi Dolev The Department of Computer Science Faculty.
General Purpose Grid Computing LCA. Specification The system will provide a multi-threaded, shared memory environment that is distributed across a loosely.
Invitation to Computer Science 6th Edition
C++ Exceptions.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
MET4750 Techniques for Earth System Modeling
Processes and threads.
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.
ITCS-3190.
The Mach System Sri Ramkrishna.
ECRG High-Performance Computing Seminar
SHARED MEMORY PROGRAMMING WITH OpenMP
Day 12 Threads.
Parallel Programming By J. H. Wang May 2, 2017.
Chapter 2 Processes and Threads Today 2.1 Processes 2.2 Threads
Introduction to MPI.
Computer Engg, IIT(BHU)
Running R in parallel — principles and practice
Topic: Functions – Part 2
Team 1 Aakanksha Gupta, Solomon Walker, Guanghong Wang
Exploring Parallelism in
Threads and Cooperation
Teaching Computing to GCSE
PH2150 Scientific Computing Skills
Parallel Programming in Contemporary Programming Languages (Part 2)
Operation System Program 4
Torch 02/27/2018 Hyeri Kim Good afternoon, everyone. I’m Hyeri. Today, I’m gonna talk about Torch.
Learning to Program in Python
Threads and Data Sharing
Topics Introduction to File Input and Output
Pluggable Architecture for Java HPC Messaging
MPI-Message Passing Interface
Chapter 2: System Structures
Operating System Concepts
CS703 - Advanced Operating Systems
Shared Memory Programming
Lecture Topics: 11/1 General Operating System Concepts Processes
Parallel computation with R & Python on TACC HPC server
Introduction to parallelism and the Message Passing Interface
CSE 451: Operating Systems Spring 2012 Module 22 Remote Procedure Call (RPC) Ed Lazowska Allen Center
MPJ: A Java-based Parallel Computing System
(Computer fundamental Lab)
CSC 253 Lecture 2.
Chapter 3: Process Concept
Topics Introduction to File Input and Output
Emulating Massively Parallel (PetaFLOPS) Machines
Question 1 How are you going to provide language and/or library (or other?) support in Fortran, C/C++, or another language for massively parallel programming.
Module 12: I/O Systems I/O hardwared Application I/O Interface
Python Reserved Words Poster
Presentation transcript:

multiprocessing and mpi4py Python for computational science 16 - 18 October 2017 CINECA m.cestari@cineca.it

multiprocessing mpi4py Bibliography http://docs.python.org/library/multiprocessing.html http://www.doughellmann.com/PyMOTW/multiprocessing/ mpi4py http://mpi4py.scipy.org/docs/usrman/index.html

Introduction (1) Global Interpreter Lock, GIL, allows only a single thread to run in the interpreter this clearly kills the performance There are different ways to overcome this limit and enhance the overall performance of the program multiprocessing (python 2.6) mpi4py (SciPy)

multiprocessing (1) Basically it works by forking new processes and dividing the work among them exploiting (all) the cores of the system import multiprocessing as mp def worker(num): """thread worker function""" print 'Worker:', num return if __name__ == '__main__': jobs = [] for i in range(5): p = mp.Process(target=worker, args=(i,)) jobs.append(p) p.start()

multiprocessing (2) processes can communicate to each other via queue, pipes Poison pills to stop the process Let's see an example...

multiprocessing (3) import multiprocessing as mp import os def worker(num, input_queue): """thread worker function""" print 'Worker:', num for inp in iter(input_queue.get, 'stop'): print 'executing %s' % inp os.popen('./mmul.x < '+inp+' >'+inp+'.out' ) return if __name__ == '__main__': input_queue = mp.Queue() # queue to allow communication for i in range(4): input_queue.put('in'+str(i)) # the queue contains the name of the inputs input_queue.put('stop') # add a poison pill for each process p = mp.Process(target=worker, args=(i, input_queue)) p.start()

Introduction (1) mpi4py allows Python to sneak (its way) into HPC field For example: the infrastructure of the program (MPI, error handling, ...) can be written in Python Resource-intensive kernels can still be programmed in compiled languages (C, Fortran) new-generation massive parallel HPC systems (like bluegene) already have the Python interpreter on their thousands of compute nodes

Introduction (2) Some codes already have this infrastructure: i.e. GPAW

Introduction (2)

What is MPI the Message Passing Interface, MPI, is a standardized and portable message-passing system designed to function on a wide variety of parallel computers. Since its release, the MPI specification has become the leading standard for message-passing libraries for parallel computers. mpi4py wraps the native MPI library

Performance Message passing with mpi4py generally is close to C performance from medium to long size arrays the overhead is about 5 % (near C-speed) This performance may entitle you to skip C/Fortran code in favor of Python To reach this communication performance you need use special syntax

Performance (2) t1 = MPI.Wtime() - t0 t0 = MPI.Wtime() data = numpy.empty(10**7, dtype=float) if rank == 0: data.fill(7) # with 7 else: pass MPI.COMM_WORLD.Bcast([data, MPI.DOUBLE], root=0) t1 = MPI.Wtime() - t0 t0 = MPI.Wtime() if rank == 0: data = numpy.empty(10**7, dtype=float) data.fill(7) else: data = None data=MPI.COMM_WORLD.bcast(data, root=0) t1 = MPI.Wtime() - t0

Performance (3) numpy array comm tempo 0.068648815155 numpy array comm tempo 0.0634961128235 numpy array cPickle tempo 0.197563886642 numpy array cPickle tempo 0.18874502182 example on the left is about 3 times faster than that one on the right The faster example exploits direct array data communication of buffer-provider objects (e.g., NumPy arrays) The slower example employs a pickle-based communication of generic Python object

Error handling Error handling is supported. Errors originated in native MPI calls will throw an instance of the exception class Exception, which derives from standard exception RuntimeError

mpi4py API http://mpi4py.scipy.org/docs/apiref/index.html mpi4py API libraries can be found the web site http://mpi4py.scipy.org/docs/apiref/index.html

To summarize mpi4py and multiprocessing can be a viable option to make your serial code parallel Message passing performance are close to compiled languages (like C, Fortran) Massive parallel systems, like bluegene, already have Python on their compute nodes