Parallel and Distributed Programming: A Brief Introduction Kenjiro Taura
Various environments we are talking about... Multi-core processors Multiprocessors Cluster of PCs (Workstations) Hosts connected by Internet Cloud (Amazon EC2)
Multicore processors Nowadays very common desktop/laptop have dual cores (e.g., Intel Core2Duo) Undoubtfully, future processors have more (Intel announced eight cores Nehalem. Sun T2 support 32 way threads,...) l.htm
Multiprocessors A single box hosts more than one proessors (each of which is likely to be multicore) It has been common in servers (much before emergence of multiprocessors)
Cluster of PCs PCs (Servers) connected by standard interconnect Cost-effective ways to build high-performance machines
Cloud (Amazon EC2) ”Pay-per-use” computing environments You can buy clusters ”on-demand” For normal users who need extensive computation only occassionally, even cheaper to buy clusters
Elements of parallel/distributed programming environments Processors (CPU) executing instructions individually Some ways for processors to communicate There are several ways for HW to implement this
Communication implemented in HW (I) Shared memory Shared memory Processors just issue regular load/store instructions If processor writes a value to a location and another processor reads it, it is communication memory Write X to A Read from A (get X) A
Shared memory Communication is somewhat implicit (it happens as a by-product of regular load/stores), and instruction is indirect what kind of communication takes place at each load/store is a complex function of processor states Implementation in HW needs a close cooporation with CPUs (e.g., invalidating caches of other processors upon writes), so it is normally implemented among CPUs within a box, not those across them
Shared memory HW you can buy today Multi-core laptops (2 cores) Multi-core, multiprocessors servers (32 cores or more) Amazon EC2 sells 8 core virtual machines
Communication implemented in HW (2) Messages Processors issue explicit ”send” instructions (normally IO instructions to send commands to device drivers) Each message normally has a destination ”network address” specificying which processor should receive it NI C
Messages Communication is explicit and instruction is direct (specifies destination of messages) Much simpler to implement than shared memory By design, it can be used to talk to processors outside the box (even to the other side of the earth)
Message HW you can buy today Ethernet (you surely will have one) Infiniband, Myrinet Plug your PCs to Internet and you can talk to any host connectd to the Internet (at least technically) Two PCs connected to the Internet is a parallel programming environment
Parallel/Distributed Programming Models
What are programming models anyway? Programming ”models” generally refer to rules/specifications to reason about correctness and performance of programs In some sense, each programming language or API defines its own programming model But usually, a programming ”model” ignores some details specific to a particular language (e.g., syntax)
Models of sequential programming Procedural Functional Logic Etc.
Models of parallel programming There are at least as many models as sequential models, but we mainly focus on issues related to parallelism
Why are they important? ”ideal” models make programming easier and less error-prone efficiently map on real hardware These two are often (almost always) conflicting
Taxonomy Two main axises How ”logical threads of control” are created and mapped to processors Model of processors How these ”threads” communicate with each other Model of communication
An illustrating example problem: N- Queen Find the number of ways to arrange N queens in N- by-N chess board so that none is captured Common strategy Fix the queen positions of first few rows, and for each configuration, a processor counts the number of valid final positions
Model of processors Some programming models assume there are N threads of control from the beginning to the end, and provides no ways to increase/decrease them Often called ”SPMD” models MPI and OpenMP (for most part) Others allow threads to be dynamically created Pthreads (Java threads, python threads, etc.) Java fork/join models
Issues in dynamically created threads The number of ”threads” may get larger than available processors The system must do a reasonable job in such cases Pthreads (or many similar packages in other languages) leave it to the operating system There is a model that allows millions of threads and is good at scheduling them efficiently (Java fork/join models)
Models of communication Message passing Shared memory
Message passing models Threads communicating each other via messages A natural abstraction of hardware supporting communication by messages Sockets, MPI send( ) recv()
Shared memory models Threads access data, and modification to data somhow become visible to other threads A natural abstraction of hardware supporting communication by shared memory ”shared data” makes programming easier
”Models” and Hardware Natural to map message passing models on messaging HW, shared memory models on multicore/multiprocessors, but not always necesary to do so It is even desirable to have a ”universal” portable model that is easy to program and maps to whatever HW you have Ideal goal of parallel programming research not yet realized Every parallel programming language is a compromise
Following weeks we will see Particular instantiation of these models Today's common/standard ones Socket, MPI, OpenMP, Pthreads Emerging ones Java fork/join, X10