Pipelined Computations

Slides:



Advertisements
Similar presentations
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Advertisements

1 5.1 Pipelined Computations. 2 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming).
1 5.1 Pipelined Computations. 2 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming).
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Numerical Algorithms Matrix multiplication
Numerical Algorithms • Matrix multiplication
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining.
COMPE575 Parallel & Cluster Computing 5.1 Pipelined Computations Chapter 5.
2a.1 Evaluating Parallel Programs Cluster Computing, UNC-Charlotte, B. Wilkinson.
HOW TO SOLVE IT? Algorithms. An Algorithm An algorithm is any well-defined (computational) procedure that takes some value, or set of values, as input.
Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science.
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “ Introduction to the Design & Analysis of Algorithms, ” 2 nd ed., Ch. 1 Chapter.
Slides prepared by Rose Williams, Binghamton University ICS201 Lecture 19 : Recursion King Fahd University of Petroleum & Minerals College of Computer.
Pipelined Computations P0P1P2P3P5. Pipelined Computations a s in s out a[0] a s in s out a[1] a s in s out a[2] a s in s out a[3] a s in s out a[4] for(i=0.
CSCI-455/552 Introduction to High Performance Computing Lecture 13.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
 2008 Pearson Education, Inc. All rights reserved JavaScript: Control Statements I.
Chun-Yuan Lin OpenMP-Programming training-5. “Type 2” Pipeline Space-Time Diagram.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.
UNIT-I INTRODUCTION ANALYSIS AND DESIGN OF ALGORITHMS CHAPTER 1:
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 1. Complexity Bounds.
Data Structures Using C++
Chapter 9 Recursion. Copyright ©2004 Pearson Addison-Wesley. All rights reserved.10-2 Recursive Function recursive functionThe recursive function is –a.
Numerical Algorithms Chapter 11.
Sort Algorithm.
Matrices Introduction.
Sets and Maps Chapter 9.
Lecture 3: Parallel Algorithm Design
Chapter 10 Recursion Instructor: Yuksel / Demirer.
5.13 Recursion Recursive functions Functions that call themselves
Dynamic Programming Sequence of decisions. Problem state.
GC211Data Structure Lecture2 Sara Alhajjam.
Synchronous Computations
Chapter 4 Divide-and-Conquer
© 2013 Goodrich, Tamassia, Goldwasser
Pipelined Computations
Data Structures Recursion CIS265/506: Chapter 06 - Recursion.
Parallel Sorting Algorithms
Registers and Counters Register : A Group of Flip-Flops. N-Bit Register has N flip-flops. Each flip-flop stores 1-Bit Information. So N-Bit Register Stores.
CS4230 Parallel Programming Lecture 12: More Task Parallelism Mary Hall October 4, /04/2012 CS4230.
Lecture 16: Parallel Algorithms I
Lecture 22: Parallel Algorithms
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
Decomposition Data Decomposition Functional Decomposition
Numerical Algorithms • Parallelizing matrix multiplication
Data Structures Review Session
Introduction to High Performance Computing Lecture 12
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt Oct 24, 2013.
Pipelined Pattern This pattern is implemented in Seeds, see
Parallel Sorting Algorithms
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
Dictionaries 1/17/2019 7:55 AM Hash Tables   4
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt March 20, 2014.
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson slides5.ppt August 17, 2014.
CS210- Lecture 5 Jun 9, 2005 Agenda Queues
Sets and Maps Chapter 9.
Data Structures & Algorithms
Potential for parallel computers/parallel programming
Dynamic Programming Sequence of decisions. Problem state.
Potential for parallel computers/parallel programming
Data Parallel Pattern 6c.1
Dynamic Programming Sequence of decisions. Problem state.
Divide and Conquer Merge sort and quick sort Binary search
Algorithms Tutorial 27th Sept, 2019.
Presentation transcript:

Pipelined Computations Chapter 5 Pipelined Computations 5.1

Pipelined Computations Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming). Each task executed by a separate process or processor. 5.2

Example Add all the elements of array a to an accumulating sum: for (i = 0; i < n; i++) sum = sum + a[i]; The loop could be “unfolded” to yield sum = sum + a[0]; sum = sum + a[1]; sum = sum + a[2]; sum = sum + a[3]; sum = sum + a[4]; . 5.3

Pipeline for an unfolded loop 5.4

Another Example Frequency filter - Objective to remove specific frequencies (f0, f1, f2,f3, etc.) from a digitized signal, f(t). Signal enters pipeline from left: 5.5

Where pipelining can be used to good effect Assuming problem can be divided into a series of sequential tasks, pipelined approach can provide increased execution speed under the following three types of computations: 1. If more than one instance of the complete problem is to be Executed 2. If a series of data items must be processed, each requiring multiple operations 3. If information to start next process can be passed forward before process has completed all its internal operations 5.6

“Type 1” Pipeline Space-Time Diagram 5.7

Alternative space-time diagram 5.8

“Type 2” Pipeline Space-Time Diagram 5.9

“Type 3” Pipeline Space-Time Diagram Pipeline processing where information passes to next stage before previous state completed. 5.10

If the number of stages is larger than the number of processors in any pipeline, a group of stages can be assigned to each processor: 5.11

Computing Platform for Pipelined Applications Multiprocessor system with a line configuration Strictly speaking pipeline may not be the best structure for a cluster - however a cluster with switched direct connections, as most have, can support simultaneous message passing. 5. 12

Example Pipelined Solutions (Examples of each type of computation) 5.13

Pipeline Program Examples Adding Numbers Type 1 pipeline computation 5.14

Basic code for process Pi : recv(&accumulation, Pi-1); accumulation = accumulation + number; send(&accumulation, Pi+1); except for the first process, P0, which is send(&number, P1); and the last process, Pn-1, which is recv(&number, Pn-2); accumulation = accumulation + number; 5.15

SPMD program if (process > 0) { recv(&accumulation, Pi-1); accumulation = accumulation + number; } if (process < n-1) send(&accumulation, P i+1); The final result is in the last process. Instead of addition, other arithmetic operations could be done. 5.16

Pipelined addition numbers Master process and ring configuration 5.17

Sorting Numbers A parallel version of insertion sort. 5.18

Pipeline for sorting using insertion sort Type 2 pipeline computation 5.19

The basic algorithm for process Pi is recv(&number, Pi-1); if (number > x) { send(&x, Pi+1); x = number; } else send(&number, Pi+1); With n numbers, number ith process is to accept = n - i. Number of passes onward = n - i - 1 Hence, a simple loop could be used. 5.20

Insertion sort with results returned to master process using bidirectional line configuration 5.21

Insertion sort with results returned 5.22

Prime Number Generation Sieve of Eratosthenes Series of all integers generated from 2. First number, 2, is prime and kept. All multiples of this number deleted as they cannot be prime. Process repeated with each remaining number. The algorithm removes non-primes, leaving only primes. Type 2 pipeline computation 5.23

The code for a process, Pi, could be based upon recv(&x, Pi-1); /* repeat following for each number */ recv(&number, Pi-1); if ((number % x) != 0) send(&number, P i+1); Each process will not receive the same number of numbers and is not known beforehand. Use a “terminator” message, which is sent at the end of the sequence: recv(&x, Pi-1); for (i = 0; i < n; i++) { recv(&number, Pi-1); If (number == terminator) break; (number % x) != 0) send(&number, P i+1); } 5.24

Solving a System of Linear Equations Upper-triangular form where a’s and b’s are constants and x’s are unknowns to be found. 5.25

Back Substitution First, unknown x0 is found from last equation; i.e., Value obtained for x0 substituted into next equation to obtain x1; i.e., Values obtained for x1 and x0 substituted into next equation to obtain x2: and so on until all the unknowns are found. 5.26

Pipeline Solution First pipeline stage computes x0 and passes x0 onto the second stage, which computes x1 from x0 and passes both x0 and x1 onto the next stage, which computes x2 from x0 and x1, and so on. Type 3 pipeline computation 5.27

The ith process (0 < i < n) receives the values x0, x1, x2, …, xi-1 and computes xi from the equation: 5.28

Sequential Code Given constants ai,j and bk stored in arrays a[ ][ ] and b[ ], respectively, and values for unknowns to be stored in array, x[ ], sequential code could be x[0] = b[0]/a[0][0]; /* computed separately */ for (i = 1; i < n; i++) { /*for remaining unknowns*/ sum = 0; For (j = 0; j < i; j++ sum = sum + a[i][j]*x[j]; x[i] = (b[i] - sum)/a[i][i]; } 5.29

Parallel Code Pseudocode of process Pi (1 < i < n) of could be for (j = 0; j < i; j++) { recv(&x[j], Pi-1); send(&x[j], Pi+1); } sum = 0; for (j = 0; j < i; j++) sum = sum + a[i][j]*x[j]; x[i] = (b[i] - sum)/a[i][i]; send(&x[i], Pi+1); Now have additional computations to do after receiving and resending values. 5.30

Pipeline processing using back substitution 5.31