Download presentation
Presentation is loading. Please wait.
1
Exploring Parallelism in
Joseph Pantoga Jon Simington
2
Why bring parallelism to Python ?
We love Python (and you should, too!) Interacts very well with C / C++ via python.h and CPython Rapid development thanks to its interpreted nature Many open source packages from the community via PyPi (compare to C++ O.S.S. list) Easy-to-read syntax The SciPy family provides us with really cool scientific tools for easy data manipulation “If your language can do it, so can mine!”
3
Initial Problems Python is inherently slower than C
Especially using libraries that take advantage of Python’s relationship with C / C++ code Thanks interpreter & dynamic typing scheme Python 3 can be comparable to C in some respects, but still slower on the average case (we use Python 2.7.6) Python too popular? So many devs with so many ideas leads to many incomplete projects, but plenty of room for contribution Python’s Global Interpreter Lock (GIL) Prevents more than 1 thread from running at a time
4
The Global Interpreter Lock
def countdown(n): while n > 0: n -= 1 Sequential count = countdown(count) 7.8s t1 = Thread(target=countdown, args=(count//2,)) t2 = Thread(target=countdown, args=(count//2,)) t1.start(); t2.start() t1.join(); t2.join() 2 Threads - The GIL ruins everything! - Thread-based Parallelism is often not worth it 15.4s t1 = Thread(target=countdown, args=(count//4,)) t2 = Thread(target=countdown, args=(count//4,)) t3 = Thread(target=countdown, args=(count//4,)) t4 = Thread(target=countdown, args=(count//4,)) t1.start(); t2.start(); t3.start(); t4.start() t1.join(); t2.join(); t3.join(); t4.join() 4 Threads 15.7s
5
Different Parallel Approaches
Message Passing Interface (MPI) pyMPI mpi4py - uses the C MPI library directly Pypar Scientific Python (MPIlib) MYMPI Bulk Synchronous Parallel (BSP) Scientific Python (BSPlib)
6
pyMPI Pros: Almost-full MPI instruction set Fairly easy to use
Allows for ‘interactive parallel runs’ Cons: Not clear if it’s still maintained Does not support numeric arrays Requires a modified Python interpreter
7
mpi4py Pros: Still being maintained on Bitbucket
Attempts to borrow ideas from other popular modules Cons: Requires an installation of an MPI library on the machine.
8
pypar Pros: No modified interpreter needed! Numeric arrays supported
Still maintained on GitHub Cons: Minimal interface Can’t handle topologies well
9
Scientific Python Pros: Detailed documentation
Supports both MPI and BSP Cons: No support for numerical arrays Requires the most extra libraries to install out of the other options Including an MPI library and a BSP library
10
MYMPI Pros: Very Small & lightweight Cons: In-house module only
Can’t be found on PyPi Only contains commonly used MPI calls No support to point-point communication
11
How does Python compare to C?
The following was tested on the Beowulf class cluster `Geronimo` at CIMEC with ten Intel P4 2.4GHz processors, each equipped with 1GB DDR 333MHz RAM connected together on a 100Mbps ethernet switch. The mpi4py library was compiled with MPICH 1.2.6 from mpi4py import mpi import numarray as na sbuff = na.array(shape=2**20,type=na.Float64) wt = mpi.Wtime() if mpi.even: mpi.WORLD.Send(buffer, mpi.rank + 1) rbuff = mpi.WORLD.Recv(mpi.rank + 1) else: rbuff = mpi.WORLD.Recv(mpi.rank - 1) mpi.WORLD.Send(buffer, mpi.rank - 1) wt = mpi.Wtime() - wt tp = mpi.WORLD.Gather(wt, root=0) if mpi.zero: print tp
12
How does Python compare to C?
The rest of the graphs display time analysis from similar programs, with only the MPI instruction differing.
13
How does Python compare to C?
14
How does Python compare to C?
For large data sets, Python performs very similarly to C Python has less bandwidth available as mpi4py uses an MPI library from C to perform general networking calls But, in general, Python is slower than C
15
Is Parallelism Fully Implemented?
From our research so far, we have not found a publically-available Python package that fully implements the full MPI instruction set Not all popular languages have complete and extensive libraries for every task or use case!
16
Conclusion Is parallelism in Python something the community should bother with? Should they try to keep it in future versions or even maintain the current implementations? From our research it seems like the community has done just about all they could do to bring parallelism to Python but some sacrifices have to be made, mainly a restriction on what data types can and can’t be supported The most ‘successful’ module(s) to date?
17
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.