1 TRAPEZOIDAL RULE IN MPI Copyright © 2010, Elsevier Inc. All rights Reserved.

Slides:

Advertisements

Similar presentations

1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.

Advertisements

The University of Adelaide, School of Computer Science

Part IV: Memory Management

Practical techniques & Examples

1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.

1 Buffers l When you send data, where does it go? One possibility is: Process 0Process 1 User data Local buffer the network User data Local buffer.

MPI Program Structure Self Test with solution. Self Test 1.How would you modify "Hello World" so that only even-numbered processors print the greeting.

Parallel Sorting Algorithms Comparison Sorts if (A>B) { temp=A; A=B; B=temp; } Potential Speed-up –Optimal Comparison Sort: O(N lg N) –Optimal Parallel.

11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.

MPI – An introduction by Jeroen van Hunen What is MPI and why should we use it? Simple example + some basic MPI functions Other frequently used MPI functions.

CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.

1 Complexity of Network Synchronization Raeda Naamnieh.

Reference: Message Passing Fundamentals.

Sorting Algorithms CS 524 – High-Performance Computing.

1 Friday, November 17, 2006 “In the confrontation between the stream and the rock, the stream always wins, not through strength but by perseverance.” -H.

Point-to-Point Communication Self Test with solution.

Portability Issues. The MPI standard was defined in May of This standardization effort was a response to the many incompatible versions of parallel.

Chapter 6 Floyd’s Algorithm. 2 Chapter Objectives Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining.

1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.

A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing.

Today Objectives Chapter 6 of Quinn Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and printing 2-D.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.

L15: Putting it together: N-body (Ch. 6) October 30, 2012.

1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 3 Distributed Memory Programming with MPI An Introduction to Parallel Programming Peter Pacheco.

1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 3 Distributed Memory Programming with MPI An Introduction to Parallel Programming Peter Pacheco.

Lecture 5 MPI Introduction Topics …Readings January 19, 2012 CSCE 713 Advanced Computer Architecture.

The University of Adelaide, School of Computer Science

1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 3 Distributed Memory Programming with MPI An Introduction to Parallel Programming Peter Pacheco.

1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.

The University of Adelaide, School of Computer Science

CS453 Lecture 3.  A sequential algorithm is evaluated by its runtime (in general, asymptotic runtime as a function of input size).  The asymptotic runtime.

Specialized Sending and Receiving David Monismith CS599 Based upon notes from Chapter 3 of the MPI 3.0 Standard

Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

10/02/2012CS4230 CS4230 Parallel Programming Lecture 11: Breaking Dependences and Task Parallel Algorithms Mary Hall October 2,

Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

9/22/2011CS4961 CS4961 Parallel Programming Lecture 9: Task Parallelism in OpenMP Mary Hall September 22,

MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.

1 BİL 542 Parallel Computing. 2 Message Passing Chapter 2.

1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 3 Distributed Memory Programming with MPI An Introduction to Parallel Programming Peter Pacheco.

Data Structures and Algorithms in Parallel Computing Lecture 8.

Parallel Computing Presented by Justin Reschke

CSCI-455/552 Introduction to High Performance Computing Lecture 21.

Concurrency and Performance Based on slides by Henri Casanova.

1/46 PARALLEL SOFTWARE ( SECTION 2.4). 2/46 The burden is on software From now on… In shared memory programs: Start a single process and fork threads.

Proving Correctness and Measuring Performance CET306 Harry R. Erwin University of Sunderland.

Lecture 7 CSS314 Parallel Computing Book: “An Introduction to Parallel Programming” by Peter Pacheco

1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu

Lecture 6 MPI Introduction Topics …Readings January 19, 2012 CSCE 713 Advanced Computer Architecture.

MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.

MPI: Message Passing Interface An Introduction S. Lakshmivarahan School of Computer Science.

1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 4 Shared Memory Programming with Pthreads An Introduction to Parallel Programming Peter Pacheco.

MPI Point to Point Communication

The University of Adelaide, School of Computer Science

Parallel Programming in C with MPI and OpenMP

Blocking / Non-Blocking Send and Receive Operations

Parallel Programming with MPI and OpenMP

Collective Communication in MPI and Advanced Features

CSE8380 Parallel and Distributed Processing Presentation

CSCE569 Parallel Computing

Background and Motivation

Introduction to parallelism and the Message Passing Interface

Approximating the Buffer Allocation Problem Using Epochs

Barriers implementations

Introduction to High Performance Computing Lecture 16

Parallel Programming in C with MPI and OpenMP

Synchronizing Computations

Presentation transcript:

1 TRAPEZOIDAL RULE IN MPI Copyright © 2010, Elsevier Inc. All rights Reserved

2 The Trapezoidal Rule Copyright © 2010, Elsevier Inc. All rights Reserved

3 The Trapezoidal Rule Copyright © 2010, Elsevier Inc. All rights Reserved

4 One trapezoid Copyright © 2010, Elsevier Inc. All rights Reserved

5 Pseudo-code for a serial program Copyright © 2010, Elsevier Inc. All rights Reserved

6 Parallelizing the Trapezoidal Rule 1.Partition problem solution into tasks. 2.Identify communication channels between tasks. 3.Aggregate tasks into composite tasks. 4.Map composite tasks to cores. Copyright © 2010, Elsevier Inc. All rights Reserved

7 Parallel pseudo-code Copyright © 2010, Elsevier Inc. All rights Reserved

8 Tasks and communications for Trapezoidal Rule Copyright © 2010, Elsevier Inc. All rights Reserved

9 First version (1) Copyright © 2010, Elsevier Inc. All rights Reserved

10 First version (2) Copyright © 2010, Elsevier Inc. All rights Reserved

11 First version (3) Copyright © 2010, Elsevier Inc. All rights Reserved

12 PERFORMANCE EVALUATION Copyright © 2010, Elsevier Inc. All rights Reserved

13 Elapsed parallel time Returns the number of seconds that have elapsed since some time in the past. Copyright © 2010, Elsevier Inc. All rights Reserved

14 Elapsed serial time In this case, you don’t need to link in the MPI libraries. Returns time in microseconds elapsed from some point in the past. Copyright © 2010, Elsevier Inc. All rights Reserved

15 Elapsed serial time Copyright © 2010, Elsevier Inc. All rights Reserved

16 MPI_Barrier Ensures that no process will return from calling it until every process in the communicator has started calling it. Copyright © 2010, Elsevier Inc. All rights Reserved

17 MPI_Barrier Copyright © 2010, Elsevier Inc. All rights Reserved

18 Run-times of serial and parallel matrix-vector multiplication Copyright © 2010, Elsevier Inc. All rights Reserved (Seconds)

19 Speedup Copyright © 2010, Elsevier Inc. All rights Reserved

20 Efficiency Copyright © 2010, Elsevier Inc. All rights Reserved

21 Speedups of Parallel Matrix- Vector Multiplication Copyright © 2010, Elsevier Inc. All rights Reserved

22 Efficiencies of Parallel Matrix- Vector Multiplication Copyright © 2010, Elsevier Inc. All rights Reserved

23 Scalability A program is scalable if the problem size can be increased at a rate so that the efficiency doesn’t decrease as the number of processes increase. Copyright © 2010, Elsevier Inc. All rights Reserved

24 Scalability Programs that can maintain a constant efficiency without increasing the problem size are sometimes said to be strongly scalable. Programs that can maintain a constant efficiency if the problem size increases at the same rate as the number of processes are sometimes said to be weakly scalable. Copyright © 2010, Elsevier Inc. All rights Reserved

25 A PARALLEL SORTING ALGORITHM Copyright © 2010, Elsevier Inc. All rights Reserved

26 Sorting n keys and p = comm sz processes. n/p keys assigned to each process. No restrictions on which keys are assigned to which processes. When the algorithm terminates: The keys assigned to each process should be sorted in (say) increasing order. If 0 ≤ q < r < p, then each key assigned to process q should be less than or equal to every key assigned to process r. Copyright © 2010, Elsevier Inc. All rights Reserved

27 Serial bubble sort Copyright © 2010, Elsevier Inc. All rights Reserved

28 Odd-even transposition sort A sequence of phases. Even phases, compare swaps: Odd phases, compare swaps: Copyright © 2010, Elsevier Inc. All rights Reserved

29 Example Start: 5, 9, 4, 3 Even phase: compare-swap (5,9) and (4,3) getting the list 5, 9, 3, 4 Odd phase: compare-swap (9,3) getting the list 5, 3, 9, 4 Even phase: compare-swap (5,3) and (9,4) getting the list 3, 5, 4, 9 Odd phase: compare-swap (5,4) getting the list 3, 4, 5, 9 Copyright © 2010, Elsevier Inc. All rights Reserved

30 Serial odd-even transposition sort Copyright © 2010, Elsevier Inc. All rights Reserved

31 Communications among tasks in odd-even sort Copyright © 2010, Elsevier Inc. All rights Reserved Tasks determining a[i] are labeled with a[i].

32 Parallel odd-even transposition sort Copyright © 2010, Elsevier Inc. All rights Reserved

33 Pseudo-code Copyright © 2010, Elsevier Inc. All rights Reserved

34 Compute_partner Copyright © 2010, Elsevier Inc. All rights Reserved

35 Safety in MPI programs The MPI standard allows MPI_Send to behave in two different ways: it can simply copy the message into an MPI managed buffer and return, or it can block until the matching call to MPI_Recv starts. Copyright © 2010, Elsevier Inc. All rights Reserved

36 Safety in MPI programs Many implementations of MPI set a threshold at which the system switches from buffering to blocking. Relatively small messages will be buffered by MPI_Send. Larger messages, will cause it to block. Copyright © 2010, Elsevier Inc. All rights Reserved

37 Safety in MPI programs If the MPI_Send executed by each process blocks, no process will be able to start executing a call to MPI_Recv, and the program will hang or deadlock. Each process is blocked waiting for an event that will never happen. Copyright © 2010, Elsevier Inc. All rights Reserved (see pseudo-code)

38 Safety in MPI programs A program that relies on MPI provided buffering is said to be unsafe. Such a program may run without problems for various sets of input, but it may hang or crash with other sets. Copyright © 2010, Elsevier Inc. All rights Reserved

39 MPI_Ssend An alternative to MPI_Send defined by the MPI standard. The extra “s” stands for synchronous and MPI_Ssend is guaranteed to block until the matching receive starts. Copyright © 2010, Elsevier Inc. All rights Reserved

40 Restructuring communication Copyright © 2010, Elsevier Inc. All rights Reserved

41 MPI_Sendrecv An alternative to scheduling the communications ourselves. Carries out a blocking send and a receive in a single call. The dest and the source can be the same or different. Especially useful because MPI schedules the communications so that the program won’t hang or crash. Copyright © 2010, Elsevier Inc. All rights Reserved

42 MPI_Sendrecv Copyright © 2010, Elsevier Inc. All rights Reserved

43 Safe communication with five processes Copyright © 2010, Elsevier Inc. All rights Reserved

44 Parallel odd-even transposition sort Copyright © 2010, Elsevier Inc. All rights Reserved

45 Run-times of parallel odd-even sort Copyright © 2010, Elsevier Inc. All rights Reserved (times are in milliseconds)

46 Concluding Remarks (1) MPI or the Message-Passing Interface is a library of functions that can be called from C, C++, or Fortran programs. A communicator is a collection of processes that can send messages to each other. Many parallel programs use the single- program multiple data or SPMD approach. Copyright © 2010, Elsevier Inc. All rights Reserved

47 Concluding Remarks (2) Most serial programs are deterministic: if we run the same program with the same input we’ll get the same output. Parallel programs often don’t possess this property. Collective communications involve all the processes in a communicator. Copyright © 2010, Elsevier Inc. All rights Reserved

48 Concluding Remarks (3) When we time parallel programs, we’re usually interested in elapsed time or “wall clock time”. Speedup is the ratio of the serial run-time to the parallel run-time. Efficiency is the speedup divided by the number of parallel processes. Copyright © 2010, Elsevier Inc. All rights Reserved

49 Concluding Remarks (4) If it’s possible to increase the problem size (n) so that the efficiency doesn’t decrease as p is increased, a parallel program is said to be scalable. An MPI program is unsafe if its correct behavior depends on the fact that MPI_Send is buffering its input. Copyright © 2010, Elsevier Inc. All rights Reserved