COMP60611 Fundamentals of Parallel and Distributed Systems

Slides:



Advertisements
Similar presentations
Computer Organization and Architecture
Advertisements

Computer Architecture
CS533 - Concepts of Operating Systems
Lecture 13 - Introduction to the Central Processing Unit (CPU)
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Sept COMP60611 Fundamentals of Parallel and Distributed Systems Lecture 1 Levels of Abstraction and Implementation Options Len Freeman, Graham Riley.
ELEN 033 Lecture #1 Tokunbo Ogunfunmi Santa Clara University.
C o n f i d e n t i a l 1 Course: BCA Semester: III Subject Code : BC 0042 Subject Name: Operating Systems Unit number : 1 Unit Title: Overview of Operating.
Principles of Linear Pipelining
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Process-Concept.
13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.
1 Parallel execution Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Parallel Computing Presented by Justin Reschke
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Computer Systems Architecture Edited by Original lecture by Ian Sunley Areas: Computer users Basic topics What is a computer?
Sept COMP60611 Fundamentals of Parallel and Distributed Systems Lecture 8 Introduction to Modelling Concurrency John Gurd, Graham Riley Centre for.
Computational Thinking, Problem-solving and Programming: General Principals IB Computer Science.
Programming what is C++
CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.
Processes and threads.
Computer Organization and Architecture + Networks
Programming & Scratch.
Overview Register Transfer Language Register Transfer
Distributed Shared Memory
Lecture 13 - Introduction to the Central Processing Unit (CPU)
Lecture 5: Computer systems architecture
Advanced OS Concepts (For OCR)
Operating System Concepts
Operating Systems (CS 340 D)
Introduction Introduction to VHDL Entities Signals Data & Scalar Types
Memory Consistency Models
Memory Consistency Models
Computer Engg, IIT(BHU)
The University of Adelaide, School of Computer Science
Overview Introduction General Register Organization Stack Organization
Lecture 2 Introduction to Programming
REGISTER TRANSFER LANGUAGE AND DESIGN OF CONTROL UNIT
Operating Systems (CS 340 D)
Computer Organization and Design
Operating Systems (CS 340 D)
COMP60611 Fundamentals of Parallel and Distributed Systems
Lecture 2: Processes Part 1
Threads and Data Sharing
Objective of This Course
Threads Chapter 4.
Channels.
COMP60621 Fundamentals of Parallel and Distributed Systems
Multithreaded Programming
Operating Systems (CS 340 D)
Overview Last lecture Digital hardware systems Today
Abstraction.
Chapter 12 Pipelining and RISC
Paper by D.L Parnas And D.P.Siewiorek Prepared by Xi Chen May 16,2003
Operating Systems: A Modern Perspective, Chapter 6
Channels.
Computer System Overview
Communication between modules, cohesion and coupling
COMP60621 Designing for Parallelism
Channels.
COMP60611 Fundamentals of Parallel and Distributed Systems
Programming with Shared Memory Specifying parallelism
COMP60611 Fundamentals of Parallel and Distributed Systems
COMPUTER ORGANIZATION AND ARCHITECTURE
Parallel execution Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
William Stallings Computer Organization and Architecture
Presentation transcript:

COMP60611 Fundamentals of Parallel and Distributed Systems Lecture 2 Introduction to Parallel Programs John Gurd, Graham Riley Centre for Novel Computing School of Computer Science University of Manchester

Overview We focus on the higher of the two implementation-oriented Levels of Abstraction The Program Level sequential state-transition programming model two fundamental ways to go parallel processes (message-passing) threads (data-sharing) implications for parallel programming languages Summary 18/01/2019

On the Nature of Digital Systems Programs and their hardware realisations, whether parallel or sequential, are essentially the same; i.e., programmed state-transition. For example, a sequential system (abstract machine or concrete hardware) comprises a state-transition machine (processor) attached to a memory which is divided into two logical sections; one fixed (the code state) and one changeable (the data state). The code state contains instructions (or statements) that can be executed in a data-dependent sequence, changing the contents of the data state in such a way as to progress the required computation. The sequence is controlled by a program counter. The data state contains the variables of the computation. These start in an initial state, defining the input of the computation, and finish by holding its logical output. In programming terms, the data state contains the data structures of the program. 18/01/2019

Sequential Digital Systems Performance in the above model is governed by a state-transition cycle. The program counter identifies the 'current' instruction. This is fetched from the code state and then executed. Execution involves reading data from the data state, performing appropriate operations, then writing results back to the data state and assigning a new value to the program counter. To a first approximation, execution time will depend on the exact number and sequence of these actions (we ignore, for the moment, the effect of any memory buffering schemes). This is the programmer's model of what constitutes a sequential computation. It is predominantly a model of memory, and the programmer's art is essentially to map algorithms into memory in such a way that they will execute with good performance. 18/01/2019

Sequential Digital Systems It will be convenient to think of memory in diagrammatic terms. In this sense, the model can be visualised as follows: The Code has an associated, data-dependent locus of control, governed by the program counter; there is also some associated processor state which we roll into the Data, for the moment. This whole memory image is called a process (after Unix terminology; other names are used). Code – fixed memory Data – changeable memory 18/01/2019

Parallel Execution It is possible to execute more than one process concurrently, and to arrange for the processes to co-operate in solving some large problem using a message-passing protocol (cf. Unix pipes, forks, etc.). However, the 'start-up' costs associated with each process are large, mainly due to the cost of protecting its data memory from access by any other process. As a consequence, a large parallel grain size is needed. An alternative is to exploit parallelism within a single process, using some form of 'lightweight' process, or thread. This should allow use of a smaller parallel grain size, but carries risks associated with sharing of data. We shall look at the case where just two processors are active. This can be readily generalised to a larger number of processors. 18/01/2019

Two-fold Parallelism In the message-passing scheme, two-fold parallelism is achieved by simultaneous activation of two 'co-operating' processes. Each process can construct messages (think of these as values of some abstract data type) and send them to other processes. A process has to receive incoming messages explicitly (this restriction can be overcome, but it is not a straightforward matter to do so). The message-passing scheme is illustrated in the following diagram: . 18/01/2019

Two-fold Parallelism In the message-passing scheme, two-fold parallelism is achieved by simultaneous activation of two 'co-operating' processes. Each process can construct messages (think of these as values of some abstract data type) and send them to other processes. A process has to receive incoming messages explicitly (this restriction can be overcome, but it is not a straightforward matter to do so). The message-passing scheme is illustrated in the following diagram: Process A 18/01/2019

Two-fold Parallelism In the message-passing scheme, two-fold parallelism is achieved by simultaneous activation of two 'co-operating' processes. Each process can construct messages (think of these as values of some abstract data type) and send them to other processes. A process has to receive incoming messages explicitly (this restriction can be overcome, but it is not a straightforward matter to do so). The message-passing scheme is illustrated in the following diagram: Process B 18/01/2019

Two-fold Parallelism Within a single process, an obvious way of allowing two-fold parallel execution is to allow two program counters to control progress through two separate, but related, code states. To a first approximation, the two streams of instructions will need to share the sequential data state. When, as frequently happens, Code A and Code B are identical, this scheme is termed single-program, multiple-data (SPMD). 18/01/2019

Two-fold Parallelism Within a single process, an obvious way of allowing two-fold parallel execution is to allow two program counters to control progress through two separate, but related, code states. To a first approximation, the two streams of instructions will need to share the sequential data state. Thread A When, as frequently happens, Code A and Code B are identical, this scheme is termed single-program, multiple-data (SPMD). 18/01/2019

Two-fold Parallelism Within a single process, an obvious way of allowing two-fold parallel execution is to allow two program counters to control progress through two separate, but related, code states. To a first approximation, the two streams of instructions will need to share the sequential data state. Thread B When, as frequently happens, Code A and Code B are identical, this scheme is termed single-program, multiple-data (SPMD). 18/01/2019

Privatising Data Each stream of instructions (from Code A and from Code B) will issue references to the shared data state using a global addressing scheme (i.e. the same address, issued from whichever stream of instructions, will access the same shared data memory location). There are obvious problems of contention and propriety associated with this sharing arrangement; it will be necessary to use locks to protect any variable that might be shared, and these will affect performance. Hence, it is usual to try and identify more precisely which parts of the data state really need to be shared; then at least the use of locks can be confined to those variables (and only those variables) that really need the protection. 18/01/2019

Privatising Data In general, there will be some variables that are only referenced from one instruction stream or the other. Assuming that these can be identified, we can segregate the data state into three segments, as follows: We can then isolate the execution objects, thread A and thread B, within the process, which have the Shared Data as their only part in common. 18/01/2019

Privatising Data In general, there will be some variables that are only referenced from one instruction stream or the other. Assuming that these can be identified, we can segregate the data state into three segments, as follows: Thread A We can then isolate the execution objects, thread A and thread B, within the process, which have the Shared Data as their only part in common. 18/01/2019

Privatising Data In general, there will be some variables that are only referenced from one instruction stream or the other. Assuming that these can be identified, we can segregate the data state into three segments, as follows: Thread B We can then isolate the execution objects, thread A and thread B, within the process, which have the Shared Data as their only part in common. 18/01/2019

Identifying Private Data Determining which variables fall into which category (private-to-A; private-to-B; shared) is non-trivial. In particular, the required category for a certain variable may depend on the values of variables elsewhere in the data state. In the general case (more than two threads) the procedure for identifying categories must distinguish the following: Shared variable --- can potentially be accessed by more than one thread. Private variable (to thread X) --- can only ever be accessed by thread X. How to achieve this distinction in acceptable time is an interesting research problem. 18/01/2019

Parallel Programming Language Requirements Consider the additional programming language constructs that will be necessary to handle parallelism in either of the ways we have described. Message-Passing (between processes): means to create new processes; means to place data in a process; means to send/receive messages; means to terminate 'dead' processes. Data-Sharing (between threads in one process): means to create new threads; means to share/privatise data; means to synchronise shared accesses; means to terminate 'dead' threads. 18/01/2019

Summary The transition from a sequential programming model to a parallel programming model can be made in two distinct ways: Message-passing between separate processes (process-based parallel programming); or Data-sharing between separate threads within a single process. Both models place new requirements for constructs in programming languages. Neither model precludes use of the other; they are simply different ways of introducing parallelism at the program level. 18/01/2019