COMPE575 Parallel & Cluster Computing 5.1 Pipelined Computations Chapter 5.

Slides:

Advertisements

Similar presentations

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Advertisements

Chapter 4 Retiming.

Lecture 3: Parallel Algorithm Design

1 5.1 Pipelined Computations. 2 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming).

Data Structures Using C++ 2E

1 5.1 Pipelined Computations. 2 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming).

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Numerical Algorithms Matrix multiplication

Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers Chapter 11: Numerical Algorithms Sec 11.2: Implementing.

Numerical Algorithms • Matrix multiplication

Chapter 10 Recursion Instructor: alkar/demirer. Copyright ©2004 Pearson Addison-Wesley. All rights reserved.10-2 Recursive Function recursive functionThe.

1 Lecture 11 Sorting Parallel Computing Fall 2008.

Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.

Data Structures Using Java1 Chapter 8 Search Algorithms.

Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining.

Hash Tables1 Part E Hash Tables  

Topic Overview One-to-All Broadcast and All-to-One Reduction

Hashing General idea: Get a large array

Advanced Topics in Algorithms and Data Structures 1 Two parallel list ranking algorithms An O (log n ) time and O ( n log n ) work list ranking algorithm.

1 Calculating Polynomials We will use a generic polynomial form of: where the coefficient values are known constants The value of x will be the input and.

2a.1 Evaluating Parallel Programs Cluster Computing, UNC-Charlotte, B. Wilkinson.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

HOW TO SOLVE IT? Algorithms. An Algorithm An algorithm is any well-defined (computational) procedure that takes some value, or set of values, as input.

Data Structures Using Java1 Chapter 8 Search Algorithms.

Chapter 3: The Fundamentals: Algorithms, the Integers, and Matrices

Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science.

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “ Introduction to the Design & Analysis of Algorithms, ” 2 nd ed., Ch. 1 Chapter.

Slides prepared by Rose Williams, Binghamton University ICS201 Lecture 19 : Recursion King Fahd University of Petroleum & Minerals College of Computer.

Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.

Pipelined Computations P0P1P2P3P5. Pipelined Computations a s in s out a[0] a s in s out a[1] a s in s out a[2] a s in s out a[3] a s in s out a[4] for(i=0.

CSCI-455/552 Introduction to High Performance Computing Lecture 13.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

 2008 Pearson Education, Inc. All rights reserved JavaScript: Control Statements I.

Chun-Yuan Lin OpenMP-Programming training-5. “Type 2” Pipeline Space-Time Diagram.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Processor Architecture

Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.

UNIT-I INTRODUCTION ANALYSIS AND DESIGN OF ALGORITHMS CHAPTER 1:

Chapter 27 Lecture 23: Circuits: I. Direct Current When the current in a circuit has a constant direction, the current is called direct current Most of.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 1. Complexity Bounds.

Data Structures Using C++

Pass the Buck Every good programmer is lazy, arrogant, and impatient. In the game “Pass the Buck” you try to do as little work as possible, by making your.

8.1 8 Algorithms Foundations of Computer Science  Cengage Learning.

Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.

CSCI-455/552 Introduction to High Performance Computing Lecture 21.

Chapter 15 Running Time Analysis. Topics Orders of Magnitude and Big-Oh Notation Running Time Analysis of Algorithms –Counting Statements –Evaluating.

chap10 Chapter 10 Recursion chap10 2 Recursive Function recursive function The recursive function is a kind of function that calls.

Loop Design What goes into coding a loop. Considerations for Loop Design ● There are basically two kinds of loops: ● Those that form some accumulated.

CHAPTER #7 Problem Solving with Loop. Overview Loop logical structure Incrementing Accumulating WHILE/WHILE-END FOR Nested loop Pointer Algorithmic instruction.

Numerical Algorithms Chapter 11.

Lecture 3: Parallel Algorithm Design

Chapter 10 Recursion Instructor: Yuksel / Demirer.

Chapter 4 Divide-and-Conquer

Pipelined Computations

Parallel Sorting Algorithms

CS4230 Parallel Programming Lecture 12: More Task Parallelism Mary Hall October 4, /04/2012 CS4230.

Decomposition Data Decomposition Functional Decomposition

Numerical Algorithms • Parallelizing matrix multiplication

Pipelined Computations

Introduction to High Performance Computing Lecture 12

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt Oct 24, 2013.

Pipelined Pattern This pattern is implemented in Seeds, see

Parallel Sorting Algorithms

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt March 20, 2014.

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson slides5.ppt August 17, 2014.

Data Parallel Pattern 6c.1

Presentation transcript:

COMPE575 Parallel & Cluster Computing 5.1 Pipelined Computations Chapter 5

COMPE575 Parallel & Cluster Computing 5.2 Pipelined Computations Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming). Each task executed by a separate process or processor.

COMPE575 Parallel & Cluster Computing 5.3 Example Add all the elements of array a to an accumulating sum: for (i = 0; i < n; i++) sum = sum + a[i]; The loop could be “unfolded” to yield sum = sum + a[0]; sum = sum + a[1]; sum = sum + a[2]; sum = sum + a[3]; sum = sum + a[4];.

COMPE575 Parallel & Cluster Computing 5.4 Pipeline for an unfolded loop

COMPE575 Parallel & Cluster Computing 5.5 Another Example Frequency filter - Objective to remove specific frequencies (f 0, f 1, f 2,f 3, etc.) from a digitized signal, f(t). Signal enters pipeline from left:

COMPE575 Parallel & Cluster Computing 5.6 Where pipelining can be used to good effect Assuming problem can be divided into a series of sequential tasks, pipelined approach can provide increased execution speed under the following three types of computations: 1. If more than one instance of the complete problem is to be Executed 2. If a series of data items must be processed, each requiring multiple operations 3. If information to start next process can be passed forward before process has completed all its internal operations

COMPE575 Parallel & Cluster Computing 5.7 “Type 1” Pipeline Space-Time Diagram

COMPE575 Parallel & Cluster Computing 5.8 Alternative space-time diagram

COMPE575 Parallel & Cluster Computing 5.9 “Type 2” Pipeline Space-Time Diagram

COMPE575 Parallel & Cluster Computing “Type 3” Pipeline Space-Time Diagram Pipeline processing where information passes to next stage before previous state completed. 5.10

COMPE575 Parallel & Cluster Computing If the number of stages is larger than the number of processors in any pipeline, a group of stages can be assigned to each processor: 5.11

COMPE575 Parallel & Cluster Computing Computing Platform for Pipelined Applications Multiprocessor system with a line configuration Strictly speaking pipeline may not be the best structure for a cluster - however a cluster with switched direct connections, as most have, can support simultaneous message passing

COMPE575 Parallel & Cluster Computing Example Pipelined Solutions (Examples of each type of computation) 5.13

COMPE575 Parallel & Cluster Computing Pipeline Program Examples Adding Numbers Type 1 pipeline computation 5.14

COMPE575 Parallel & Cluster Computing Basic code for process Pi : recv(&accumulation, Pi-1); accumulation = accumulation + number; send(&accumulation, Pi+1); except for the first process, P0, which is send(&number, P1); and the last process, Pn-1, which is recv(&number, Pn-2); accumulation = accumulation + number; 5.15

COMPE575 Parallel & Cluster Computing SPMD program if (process > 0) { recv(&accumulation, Pi-1); accumulation = accumulation + number; } if (process < n-1) send(&accumulation, P i+1); The final result is in the last process. Instead of addition, other arithmetic operations could be done. 5.16

COMPE575 Parallel & Cluster Computing Pipelined addition numbers Master process and ring configuration 5.17

COMPE575 Parallel & Cluster Computing Sorting Numbers A parallel version of insertion sort. 5.18

COMPE575 Parallel & Cluster Computing 5.19 Pipeline for sorting using insertion sort Type 2 pipeline computation

COMPE575 Parallel & Cluster Computing The basic algorithm for process Pi is recv(&number, Pi-1); if (number > x) { send(&x, Pi+1); x = number; } else send(&number, Pi+1); With n numbers, number ith process is to accept = n - i. Number of passes onward = n - i - 1 Hence, a simple loop could be used. 5.20

COMPE575 Parallel & Cluster Computing Insertion sort with results returned to master process using bidirectional line configuration 5.21

COMPE575 Parallel & Cluster Computing Insertion sort with results returned 5.22

COMPE575 Parallel & Cluster Computing Prime Number Generation Sieve of Eratosthenes Series of all integers generated from 2. First number, 2, is prime and kept. All multiples of this number deleted as they cannot be prime. Process repeated with each remaining number. The algorithm removes non-primes, leaving only primes. Type 2 pipeline computation 5.23

COMPE575 Parallel & Cluster Computing The code for a process, Pi, could be based upon recv(&x, Pi-1); /* repeat following for each number */ recv(&number, Pi-1); if ((number % x) != 0) send(&number, P i+1); Each process will not receive the same number of numbers and is not known beforehand. Use a “terminator” message, which is sent at the end of the sequence: recv(&x, Pi-1); for (i = 0; i < n; i++) { recv(&number, Pi-1); If (number == terminator) break; (number % x) != 0) send(&number, P i+1); } 5.24

COMPE575 Parallel & Cluster Computing Solving a System of Linear Equations Upper-triangular form where a’s and b’s are constants and x’s are unknowns to be found. 5.25

COMPE575 Parallel & Cluster Computing Back Substitution First, unknown x 0 is found from last equation; i.e., Value obtained for x 0 substituted into next equation to obtain x 1 ; i.e., Values obtained for x 1 and x 0 substituted into next equation to obtain x 2 : and so on until all the unknowns are found. 5.26

COMPE575 Parallel & Cluster Computing Pipeline Solution First pipeline stage computes x 0 and passes x 0 onto the second stage, which computes x 1 from x 0 and passes both x 0 and x 1 onto the next stage, which computes x 2 from x 0 and x 1, and so on. Type 3 pipeline computation 5.27

COMPE575 Parallel & Cluster Computing The ith process (0 < i < n) receives the values x 0, x 1, x 2, …, x i-1 and computes x i from the equation: 5.28

COMPE575 Parallel & Cluster Computing Sequential Code Given constants a i,j and b k stored in arrays a[ ][ ] and b[ ], respectively, and values for unknowns to be stored in array, x[ ], sequential code could be x[0] = b[0]/a[0][0]; /* computed separately */ for (i = 1; i < n; i++) { /*for remaining unknowns*/ sum = 0; For (j = 0; j < i; j++ sum = sum + a[i][j]*x[j]; x[i] = (b[i] - sum)/a[i][i]; } 5.29

COMPE575 Parallel & Cluster Computing Parallel Code Pseudocode of process P i (1 < i < n) of could be for (j = 0; j < i; j++) { recv(&x[j], Pi-1); send(&x[j], Pi+1); } sum = 0; for (j = 0; j < i; j++) sum = sum + a[i][j]*x[j]; x[i] = (b[i] - sum)/a[i][i]; send(&x[i], Pi+1); Now have additional computations to do after receiving and resending values. 5.30

COMPE575 Parallel & Cluster Computing 5.31 Pipeline processing using back substitution