COMP60611 Fundamentals of Parallel and Distributed Systems Lecture 2 Introduction to Parallel Programs John Gurd, Graham Riley Centre for Novel Computing School of Computer Science University of Manchester
Overview We focus on the higher of the two implementation-oriented Levels of Abstraction The Program Level sequential state-transition programming model two fundamental ways to go parallel processes (message-passing) threads (data-sharing) implications for parallel programming languages Summary 18/01/2019
On the Nature of Digital Systems Programs and their hardware realisations, whether parallel or sequential, are essentially the same; i.e., programmed state-transition. For example, a sequential system (abstract machine or concrete hardware) comprises a state-transition machine (processor) attached to a memory which is divided into two logical sections; one fixed (the code state) and one changeable (the data state). The code state contains instructions (or statements) that can be executed in a data-dependent sequence, changing the contents of the data state in such a way as to progress the required computation. The sequence is controlled by a program counter. The data state contains the variables of the computation. These start in an initial state, defining the input of the computation, and finish by holding its logical output. In programming terms, the data state contains the data structures of the program. 18/01/2019
Sequential Digital Systems Performance in the above model is governed by a state-transition cycle. The program counter identifies the 'current' instruction. This is fetched from the code state and then executed. Execution involves reading data from the data state, performing appropriate operations, then writing results back to the data state and assigning a new value to the program counter. To a first approximation, execution time will depend on the exact number and sequence of these actions (we ignore, for the moment, the effect of any memory buffering schemes). This is the programmer's model of what constitutes a sequential computation. It is predominantly a model of memory, and the programmer's art is essentially to map algorithms into memory in such a way that they will execute with good performance. 18/01/2019
Sequential Digital Systems It will be convenient to think of memory in diagrammatic terms. In this sense, the model can be visualised as follows: The Code has an associated, data-dependent locus of control, governed by the program counter; there is also some associated processor state which we roll into the Data, for the moment. This whole memory image is called a process (after Unix terminology; other names are used). Code – fixed memory Data – changeable memory 18/01/2019
Parallel Execution It is possible to execute more than one process concurrently, and to arrange for the processes to co-operate in solving some large problem using a message-passing protocol (cf. Unix pipes, forks, etc.). However, the 'start-up' costs associated with each process are large, mainly due to the cost of protecting its data memory from access by any other process. As a consequence, a large parallel grain size is needed. An alternative is to exploit parallelism within a single process, using some form of 'lightweight' process, or thread. This should allow use of a smaller parallel grain size, but carries risks associated with sharing of data. We shall look at the case where just two processors are active. This can be readily generalised to a larger number of processors. 18/01/2019
Two-fold Parallelism In the message-passing scheme, two-fold parallelism is achieved by simultaneous activation of two 'co-operating' processes. Each process can construct messages (think of these as values of some abstract data type) and send them to other processes. A process has to receive incoming messages explicitly (this restriction can be overcome, but it is not a straightforward matter to do so). The message-passing scheme is illustrated in the following diagram: . 18/01/2019
Two-fold Parallelism In the message-passing scheme, two-fold parallelism is achieved by simultaneous activation of two 'co-operating' processes. Each process can construct messages (think of these as values of some abstract data type) and send them to other processes. A process has to receive incoming messages explicitly (this restriction can be overcome, but it is not a straightforward matter to do so). The message-passing scheme is illustrated in the following diagram: Process A 18/01/2019
Two-fold Parallelism In the message-passing scheme, two-fold parallelism is achieved by simultaneous activation of two 'co-operating' processes. Each process can construct messages (think of these as values of some abstract data type) and send them to other processes. A process has to receive incoming messages explicitly (this restriction can be overcome, but it is not a straightforward matter to do so). The message-passing scheme is illustrated in the following diagram: Process B 18/01/2019
Two-fold Parallelism Within a single process, an obvious way of allowing two-fold parallel execution is to allow two program counters to control progress through two separate, but related, code states. To a first approximation, the two streams of instructions will need to share the sequential data state. When, as frequently happens, Code A and Code B are identical, this scheme is termed single-program, multiple-data (SPMD). 18/01/2019
Two-fold Parallelism Within a single process, an obvious way of allowing two-fold parallel execution is to allow two program counters to control progress through two separate, but related, code states. To a first approximation, the two streams of instructions will need to share the sequential data state. Thread A When, as frequently happens, Code A and Code B are identical, this scheme is termed single-program, multiple-data (SPMD). 18/01/2019
Two-fold Parallelism Within a single process, an obvious way of allowing two-fold parallel execution is to allow two program counters to control progress through two separate, but related, code states. To a first approximation, the two streams of instructions will need to share the sequential data state. Thread B When, as frequently happens, Code A and Code B are identical, this scheme is termed single-program, multiple-data (SPMD). 18/01/2019
Privatising Data Each stream of instructions (from Code A and from Code B) will issue references to the shared data state using a global addressing scheme (i.e. the same address, issued from whichever stream of instructions, will access the same shared data memory location). There are obvious problems of contention and propriety associated with this sharing arrangement; it will be necessary to use locks to protect any variable that might be shared, and these will affect performance. Hence, it is usual to try and identify more precisely which parts of the data state really need to be shared; then at least the use of locks can be confined to those variables (and only those variables) that really need the protection. 18/01/2019
Privatising Data In general, there will be some variables that are only referenced from one instruction stream or the other. Assuming that these can be identified, we can segregate the data state into three segments, as follows: We can then isolate the execution objects, thread A and thread B, within the process, which have the Shared Data as their only part in common. 18/01/2019
Privatising Data In general, there will be some variables that are only referenced from one instruction stream or the other. Assuming that these can be identified, we can segregate the data state into three segments, as follows: Thread A We can then isolate the execution objects, thread A and thread B, within the process, which have the Shared Data as their only part in common. 18/01/2019
Privatising Data In general, there will be some variables that are only referenced from one instruction stream or the other. Assuming that these can be identified, we can segregate the data state into three segments, as follows: Thread B We can then isolate the execution objects, thread A and thread B, within the process, which have the Shared Data as their only part in common. 18/01/2019
Identifying Private Data Determining which variables fall into which category (private-to-A; private-to-B; shared) is non-trivial. In particular, the required category for a certain variable may depend on the values of variables elsewhere in the data state. In the general case (more than two threads) the procedure for identifying categories must distinguish the following: Shared variable --- can potentially be accessed by more than one thread. Private variable (to thread X) --- can only ever be accessed by thread X. How to achieve this distinction in acceptable time is an interesting research problem. 18/01/2019
Parallel Programming Language Requirements Consider the additional programming language constructs that will be necessary to handle parallelism in either of the ways we have described. Message-Passing (between processes): means to create new processes; means to place data in a process; means to send/receive messages; means to terminate 'dead' processes. Data-Sharing (between threads in one process): means to create new threads; means to share/privatise data; means to synchronise shared accesses; means to terminate 'dead' threads. 18/01/2019
Summary The transition from a sequential programming model to a parallel programming model can be made in two distinct ways: Message-passing between separate processes (process-based parallel programming); or Data-sharing between separate threads within a single process. Both models place new requirements for constructs in programming languages. Neither model precludes use of the other; they are simply different ways of introducing parallelism at the program level. 18/01/2019