1 ParallelAlgorithms Parallel Algorithms Dr. Stephen Tse Lesson 9.

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface Portable Parallel Programs.
Advertisements

MPI Message Passing Interface
More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
Concurrency Important and difficult (Ada slides copied from Ed Schonberg)
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Chapter 8 Runtime Support. How program structures are implemented in a computer memory? The evolution of programming language design has led to the creation.
Informationsteknologi Wednesday, September 26, 2007 Computer Systems/Operating Systems - Class 91 Today’s class Mutual exclusion and synchronization 
Reference: Message Passing Fundamentals.
Reference: Getting Started with MPI.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Inter Process Communication:  It is an essential aspect of process management. By allowing processes to communicate with each other: 1.We can synchronize.
A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.
1 Threads Chapter 4 Reading: 4.1,4.4, Process Characteristics l Unit of resource ownership - process is allocated: n a virtual address space to.
Today Objectives Chapter 6 of Quinn Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and printing 2-D.
C++ fundamentals.
Mapping Techniques for Load Balancing
1 What is message passing? l Data transfer plus synchronization l Requires cooperation of sender and receiver l Cooperation not always apparent in code.
An Overview of Parallel Computing. Software Issues The idea of a process is a fundamental building block in most paradigms of parallel computing A process.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
1 MPI Primer Lesson 10 2 What is MPI MPI is the standard for multi- computer and cluster message passing introduced by the Message-Passing Interface.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
CS 420 – Design of Algorithms MPI Data Types Basic Message Passing - sends/receives.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Processes CS 6560: Operating Systems Design. 2 Von Neuman Model Both text (program) and data reside in memory Execution cycle Fetch instruction Decode.
Operating Systems 1 K. Salah Module 1.2: Fundamental Concepts Interrupts System Calls.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu
Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.
Introduction to parallel computing concepts and technics
MPI Point to Point Communication
Introduction to MPI.
Computer Engg, IIT(BHU)
The University of Adelaide, School of Computer Science
MPI Message Passing Interface
Chapter 5: Process Synchronization
Parallel Programming with MPI and OpenMP
CMSC 611: Advanced Computer Architecture
Multiprocessor Introduction and Characteristics of Multiprocessor
MPI-Message Passing Interface
Lecture 14: Inter-process Communication
A Message Passing Standard for MPP and Workstations
CSCE569 Parallel Computing
Background and Motivation
Introduction to parallelism and the Message Passing Interface
Concurrency: Mutual Exclusion and Process Synchronization
Chapter 6: Synchronization Tools
MPI Message Passing Interface
Programming Parallel Computers
Presentation transcript:

1 ParallelAlgorithms Parallel Algorithms Dr. Stephen Tse Lesson 9

2 Parallel Algorithms Main Strategy: divide-conquer-combine Main issues: 1.Problem decompositions (data, control, data + control) 2.Process scheduling 3.Communication handling (interconnect topology, size and number of messages) 4.Synchronization 5.Performance analysis and algorithm improvement

3 Problems of Various Complexity Embarrassingly parallel: no communication, no load imbalance, nearly 100\% parallel efficiency. Possible ending to scalability. Synchronized parallel: communications are needed and loads are usually balanced, but both are predictable (natural synchronization), e.g. solving simple PDEs. Possible leading to quasi-scalability. Asynchronized parallel: communications are needed and loads are usually not balanced, and both are unpredictable (asynchronous), e.g. Monte Carlo methods to locate global minimum. Rare leading to quasi-scalability.

4 Illustration of Parallel Complexity

5 Programming Paradigms Master-slave Domain decomposition –Data –Control Data parallel Single program multiple data (SPMD) Virtual-shared-memory model

6 Illustration of Programming Paradigms

7 Comparison of Programming Paradigms Explicit Vs. implicit –Men vs. machine Virtual-shared memory Vs. message passing –Shared-memory architectures tend to be very expensive: an mxn crossbar will need mn hardware switches. thus, they tend to be very small. –Message passing is a powerful way of expressing parallelism. It has been called the “assembly language of parallel computing” because it forces the programmer to deal with the detail. But programmers can design extremely sophisticated program by calling sophisticated algorithms from the recent portable MPI libraries. Data parallel Vs. control parallel –Many parallel algorithms cannot efficiently converted into the current generation of High Power Fortran (HPF) data-parallel language compilation. Master-slave Vs. the rest –Master-slave mode: One slow (not doing what it suppose to do) processor (t max ) can mess up the entire team. This observation shows that Master-Slave scheme is usually very inefficient because of the load imbalance issue due to slow master processor. Therefore, Slave-Master scheme is usually avoided.

8 Illustration of Virtual Shared

9 Allocation of Partitions on Space- and-Time-Sharing Systems:

10 Men vs. Machines

11 Distributed-Memory MIMD In this system, each processor has its own private memory The vertex corresponds to processor/memory pair called node. In a network, some vertices are nodes and some are switches. Network connects only nodes are called static networks (mesh network, if every node connects to at lease two other nodes). Network connects to both nodes and switches are called dynamic Networks (crossbar network, if every node connects to the left and the right with switches) Memory CPU Memory CPU Memory CPU..... Interconnection Network Generic distributed-memory system

12 Process A process is a fundamental building block of parallel computing. A process is an instance of a program that is executing on a physical processor. A program is parallel if, at any time during its execution, it can comprise more than one process. In a parallel program, processes can be created, specified, and destroyed. There are also ways to coordinate inter-process interaction.

13 Mutual Exclusion and Binary Semaphore To ensure only one process can execute a certain sequence of statements at a time, we arrange for mutual exclusion and the sequence of statements is called a critical section. One approach is to use binary semaphore. –A shared variable s, if its value indicates 1, the critical section is free, if its value is 0, the region cannot be accessed. This process may look like this: shared int s=1; while (!s);/* Wait until s = 1 */ s = 0;/* Close down access */ Sum = sum + private_x;/* Critical section */ s = 1;/* Re-open access*/ The problem is that the operations involved in manipulating s under multi processes situations are not atomic actions, actions that are indivisible or change the program state. While one process is fetching s=1 into a register to test whether it’s OK to enter the critical region, another process can be storing s=0. Binary Semaphore: In addition to the shared variable, we have to arrange a function such that once a process starts to access s, no other process can access it until the original process is done with the access, including the rest of the value. i.e. void P(int* s);/* await (s > 0) s = s – 1 */ void V(int* s);/* s = s + 1 */ The function P has the effect as:While (!s); s = 0; it prevents other processes from accessing s once one process gets out of the loop. The function V sets s to 1, but it does this “atomically.” A binary semaphore has value either 0 or 1. A V operation on a binary semaphore is executed only when the semaphore has value 0. The idea of binary semaphore is somewhat error-prone and forces serial execution of the critical region. Thus, a number of alternatives have been devised.

14 Synchronous and Buffered Communication The most commonly used method of programming distributed- memory MIMD systems is message passing. The basic idea is to coordinate the processes activities by explicitly sending and receiving messages. The number of processes are set at the beginning of program execution, and each processes is assigned a unique integer rank in the range 0, 1, …, p-1, where p is the number of processors. While processes are running on different nodes, i.e. process 0 sends a “request to send” to process 1 and wait until it receives a “ready to receive” from 1, then it begins transmission of the actual message. This is a synchronous communication. If the system buffers the message, that is the contents of the message copied into a system-controlled block of memory, and the sending process can continue to do useful work. When the receiving process arrives at the point that it is ready to receive the message, the system software then simply copies the buffered message to the memory location controlled by the receiving process. This is called buffered communication.

15 Single-program multi-data (SPMD) If programming consisted the following styles: 1.The user issues message passing directive to the operating system that has the effect of placing a copy of the executable program on one processor. 2.Each processor begins execution of its copy of the executable. 3.Different processes can execute different statements by branching within the program based on their process ranks. This approach of programming the distributed-memory MIMD system is called the Single-Program Multi- Data (SPMD) approach. The effect of running different programs in SPMD is obtained by use of conditional branches within the source code. Message Passing Interface (MPI) is a library of definitions and functions that can be used in C programs. All programs in our MPI studies are using SPMD paradigm.

16 Communicator and Rank A communicator is a collection of processes that can send messages to each other. The function MPI_COMM_WORLD is predefined in MPI and consists of all the processes running when program execution begins. The flow of control of an SPMD program depends on the rank of a process. The function MPI_Comm_rank returns the rank of a process in its second parameter. The first parameter is the communicator.

17 MPI Message The actual message passing in the program is carried out by the MPI function MPI_Send (sends a message to a designated process) and MPI_Recv (receives a message from a process.) Problems involved in message passing: 1.a message must be composed and put in a buffer; 2.the message must be “dropped in a mailbox”; in order to know where to deliver the message, it must “enclosing the message in an envelop” and the designation addressed of the message. 3.But just the address isn’t enough. Since the physical message is a sequence of electrical signals, the system needs to know where the message ends or the size of the message. 4.To take appropriate action about the message, the receiver needs the return address or the address of the source process. 5.Also different message type or tag can help receiver to take proper action of the message. 6.Need to know from which communicator the message comes from. Therefore, the message envelop contains: 1.The rank of the receiver 2.The rank of the sender 3.A tag (message type) 4.a communicator The actual message are stored in a block of memory. The system needs the count and datatype to determine how much storage is needed for the message: 1.the count value 2.the MPI datatype The message also need a message pointer to know where to get the message 1.message pointer –.

18 Sending Message The parameters for MPI_Send and MPI_Recv are: int MPI_Sent( void*message/* in */, intcount/* in */, MPI_Datatype datatype/* in */, intdest/* in */, inttag/* in */, MPI_Commcomm/* in */) int MPI_Recv( void*message/*out */, intcount/* in */, MPI_Datatype datatype/* in */, intsource/* in */, inttag/* in */, MPI_Commcomm/* in */, MPI_Status*status/*out */)

19 Send and Receive pair The status returns information on the data that was actually received. It reference the struct with at least three members: status -> MPI_SOURCE/* contains the rank of the process that sent the message*/ status -> MPI_TAG/* status -> MPI_ERROR Send Tag A Receive Tag B (can be any sender/tag: MPI_ANY_TAG MPI_ANY_SENDER ) Identical Message

20 In Summary The count and datatype determine the size of the message The tag and comm are used to make sure that messages don’t get mixed up. Each message consists of two parts: the data being transmitted and the envelop of information message Data Envelop Pointer Count Datatype 1.the rank of the receiver 2.the rank of the sender 3.a tag 4.a communicator 5.status (for receive)

21