Parallel Programming Languages Andrew Rau-Chaplin
Sources D. Skillicorn, D. Talia, “Models and Languages for Parallel Computation”, ACM Comp. Surveys. Warning: This is very much ONE practitioners viewpoint! Little attempt has been made to capture the conventional wisdom.
Outline Introduction to parallel programming Example Languages Message Passing in MPI Data parallel programming in *Lisp Shared address space programming in OpenMP CILK
Historically Supercomputers Highly structured numerical programs Parallelization of loops Multicomputers Each machine had its languages/compilers/libraries optimized for its architecture Parallel computing for REAL computer scientist, “Parallel programming is tough, but worth it” Mostly numerical/scientific applications written using Fortran and parallel numerical libraries Little other parallel software was written!
Needed Parallel programming abstractions that where Easy – provide help managing programming complexity But general! Portable – across machines But efficient!
Application Software System Software SIMD Message Passing Shared Memory Dataflow Systolic Arrays Generic Parallel Architecture Solution: Yet another layer of abstraction! Parallel Model/ Language
Layered Perspective CAD MultiprogrammingShared address Message passing Data parallel DatabaseScientific modeling Parallel applications Programming models Communication abstraction User/system boundary Compilation or library Operating systems support Communication hardware Physical communication medium Hardware/software boundary [Language = Library = Model ]
Programming Model Conceptualization of the machine that programmer uses in coding applications How parts cooperate and coordinate their activities Specifies communication and synchronization operations Multiprogramming no communication or synch. at program level Shared address space like bulletin board Message passing like letters or phone calls, explicit point to point Data parallel: more regimented, global actions on data Implemented with shared address space or message passing
What does parallelism add? Decomposition How is the work divided into distinct parallel threads? Mapping Which thread should be executed on which processor? Communication How is non-local data acquired? Synchronization When must threads know that they have reached a common state?
Skillicorn’s Wish list What properties should a good model of parallel computation have? Note: desired properties may be conflicting Themes What does the programming model handle for the programmer? How abstract can the model be and still realize efficient programs? Six Desirable Features
1) Easy to program Should conceal as much detail as possible Example of 100 proc., each with 5 threads, each thread potential communicated with any other = possible communication states! Hide: Decomposition, Mapping, Communications, and Synchronization As much as possible, rely on translation process to produce exact structure of parallel program
2) Software development methodology Firm semantic foundation to permit reliable transformation Issues: Correctness Efficiency Deadlock free Parallel Model/ Language Parallel Architecture
3) Architecture-Independent Should be able to migrate code easily to next generation of an architecture Short cycle-times Should be able to migrate code easily from one architecture to another Need to share code Even in this space, people are more expensive and harder to maintain than hardware
4) Easy to understand For parallel computing to be main stream Easy to go from sequential Parallel Easy to teach Focus on easy-to-understand tools with clear, if limited, goals over, complex ones that may be powerful but are hard to use/master!
5) Guaranteed performance Guaranteed performance on a useful variety of real machines If T(n,p) = c f(n,p) + low order terms Preserve the Order of the complexity Keep the constants small A model that is good (not necessarily great) on a range of architectures is attractive!
6) Provide Cost Measures Cost measures are need to drive algorithmic design choices Estimated execution time Processor utilization Development costs In sequential, executions times between machines proportional (Machine A is 5 times faster than Machine B) Two step model: Optimize algorithmically then code and tune.
6) Provide Cost Measures cont. In Parallel, Not so simple, no two step model Costs associated with decomposition, Mapping, Communications, and Synchronization may vary independently! model must make estimated cost of operations available at design time Need an accounting scheme or cost model! Example: How should an algorithm trade- off communication vs. local computation?
Summary: Desired Features Often contradictory Some features more realistic on some architectures Room for more than one Language/Model!
Six Classification of Parallel Models 1) Nothing Explicit, Parallelism Implicit 2) Parallelism Explicit, Decomposition Implicit 3) Decomposition Explicit, Mapping Implicit 4) Mapping Explicit, Communications Implicit 5) Communications Explicit, Synchronization Implicit 6) Everything Explicit More Abstract, Less Efficient (?) Less Abstract, More Efficient (?)
Within Each Classification Dynamic Structure Allows dynamic thread creation Unable to restrict communications May overrun communication capacity Static Structure No dynamic thread creation May overrun communication capacity, cut Static structure supports cost models for prediction of communication Static and Communication Limited Structure No dynamic thread creation Can guarantee performance by limiting frequency and size of communications
Cilk Where should OpenMP go ??? *Lisp Models Languages Libraries
Recent Languages/systems Cilk plus plus MapReduce html html
Recent Languages GPUs: OpenCL & CUDA zone zone Grid Programming ammingPrimer.pdf ammingPrimer.pdf
Recent Languages Cloud Computing solution-providers/redhat/ solution-providers/redhat/ Cycle Scavenging