Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt Oct 24, 2013.

Slides:



Advertisements
Similar presentations
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Advertisements

Lecture 19: Parallel Algorithms
Linear Algebra Applications in Matlab ME 303. Special Characters and Matlab Functions.
1 5.1 Pipelined Computations. 2 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming).
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.
1 5.1 Pipelined Computations. 2 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming).
Potential for parallel computers/parallel programming
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Numerical Algorithms ITCS 4/5145 Parallel Computing UNC-Charlotte, B. Wilkinson, 2009.
1 Parallel Algorithms II Topics: matrix and graph algorithms.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Numerical Algorithms Matrix multiplication
Numerical Algorithms • Matrix multiplication
Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining.
COMPE575 Parallel & Cluster Computing 5.1 Pipelined Computations Chapter 5.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Assignment Solving System of Linear Equations Using MPI Phạm Trần Vũ.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
CSCI-455/552 Introduction to High Performance Computing Lecture 13.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Data Structure Introduction.
Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 1. Complexity Bounds.
CSCI-455/552 Introduction to High Performance Computing Lecture 23.
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! 1 ITCS 4/5145 Parallel Computing,
Data Parallel Computations and Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson, slides6c.ppt Nov 4, c.1.
Numerical Algorithms Chapter 11.
Potential for parallel computers/parallel programming
to understand recursion you must understand recursion
Data Structures and Algorithms
Linear Equations.
Introduction to parallel algorithms
Introduction to Algorithms
Algorithm Analysis CSE 2011 Winter September 2018.
Pipelined Computations
Parallel Sorting Algorithms
CS4230 Parallel Programming Lecture 12: More Task Parallelism Mary Hall October 4, /04/2012 CS4230.
CHAPTER 9: GAUSSIAN ELIMINATION.
Lecture 16: Parallel Algorithms I
Lecture 22: Parallel Algorithms
Pipelining and Vector Processing
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
Decomposition Data Decomposition Functional Decomposition
Numerical Algorithms • Parallelizing matrix multiplication
CSCE569 Parallel Computing
Pipelined Computations
Introduction to parallel algorithms
Overview Parallel Processing Pipelining
Introduction to High Performance Computing Lecture 12
Pipelined Pattern This pattern is implemented in Seeds, see
Parallel Sorting Algorithms
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt March 20, 2014.
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson slides5.ppt August 17, 2014.
Analysis of Algorithms
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Matrix Addition and Multiplication
Potential for parallel computers/parallel programming
Math review - scalars, vectors, and matrices
Potential for parallel computers/parallel programming
Data Parallel Pattern 6c.1
Data Parallel Computations and Pattern
Introduction to parallel algorithms
Data Parallel Computations and Pattern
Presentation transcript:

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt Oct 24, 2013.

Pipeline pattern Computation divided into a series of tasks that have to be performed one after the other, with the result of one task passed on to the next task. Basis of sequential programming. Task 1 Task 3 Task 2 Result Pipeline pattern is rather specialized and applicable is certain problems and algorithms for good efficiency and speed up.

Effective Pipeline pattern To use as a effective parallel pattern, each task executed by a separate process/processor and somehow processes/ processors have to operate at the same time to get increased speed Task 1 Task 3 Task 2 Slave processes Result Master One way is the approach seen in a automobile assembly line, where a series cars are constructed, in this case a series of identical computations but with different data.

Seeds Pipeline Pattern See “Pipeline Template Tutorial.” by Jeremy Villalobos. http://coit-grid01.uncc.edu/seeds/docs/pipeline_tutorial.pdf Interface: public abstract class PipeLine extends BasicLayerInterface { public abstract Data Compute(int stage, Data input); public abstract Data DiffuseData(int segment); public abstract void GatherData(int segment, Data dat); public abstract int getDataCount(); public abstract int getStageCount(); }

Pipeline applications Broadly speaking, we can divide pipeline applications into two types: 1. More than one instance of complete problem is executed. Partial results passes from one pipeline stage to the next – example is an automobile assembly line building cars. 2. A single problem instance is executed that has a series of data items to be processed sequentially, each requiring multiple operations. Numbers pass from one stage to the next Although a pipeline application may exhibit features of both types. Sometimes a single problem instance requires a series of data items to be processed

Pipeline Space-Time Diagram 1. More than one instance of the complete problem is to be executed Pipeline Space-Time Diagram p stages m instances of problem Execution time = p + m – 1 steps

Speedup factor   So if we provide a constant stream of instances of the problem, get maximum speedup.  

Pipeline Space-Time Diagram 2. Single problem with a series of data items to be processed Pipeline Space-Time Diagram n data items Execution time = p + n – 1 steps

Speedup factor   So if we provide a constant stream of data items, get maximum speedup. Processors can use this form of pipeline for streams of data (video etc.)

Example Pipelined Solutions

Matrix-Vector Multiplication c = A x b A is a matrix, b and c are vectors. Matrix-vector multiplication follows directly from definition of matrix-matrix multiplication by making B an n x1 matrix (vector). Result an n x 1 matrix (vector).

Matrix-Vector Multiplication . . . . . . delay a a a a B[3] B[2] B[1] B[0] sin sout sin sout sin sout P0 P1 P2 P3 C[0] C[1] C[2] C[3] C elements accumulate in pipeline stages B array elements pass through

Might be more easy to see drawn this way

Space-Time diagram B[0] B[1] B[2] B[3] P6 P5 P4 P3 A[3][3] A[3][2] A[3][1] A[3][0] P2 A[2][3] A[2][2] A[2][1] A[2][0] A[1][3] A[1][2] A[1][1] A[1][0] P1 A[0][3] A[0][2] A[0][1] A[0][0] P0 Time B[0] B[1] B[2] B[3]

Matrix Multiplication Two-dimensional pipeline

Add together all the elements of each row of an array: Another Example Add together all the elements of each row of an array: Add elements of each row j B[i] for (i = 0; i < N; i++) { B[i] = 0; for (j = 0; j < M; j++) { B[i] += A[i][j]; } i A[i][j] row column

Pipeline A[i][0] A[i][1] A[i][2] A[i][3] A[i][4] Final results B[i] sin sout A[i][1] a sin sout A[i][2] a sin sout A[i][3] a sin sout A[i][4] a sin sout Initialize B[i]=0 Final results B[i] P0 P1 P2 P3 P4 0 < I < N Sout = Sin + a Note the value of i is needed in each stage Series of values, B[0], B[1], B[2], with partial accumulations of A rows pass through pipeline. While Pj is operating on B[i], Pj-1 is operating on B[i+1].

Unfolding loops We are essentially unfolding inner for loop and then each statement is done by a separate pipeline stage: for (i = 0; i < N; i++) { B[i] = 0; for (j = 0; j < M; j++) { B[i] += A[j][i]; } for (j = 0; j < M; j++) { B[i] += A[0][i]; B[i] += A[1][i]; B[i] += A[2][i]; B[i] += A[3][i]; … } Unfold

Questions/discussion How does the previous pipeline compare to an automobile assembly line? Which of the following types is this pipeline: 1. More than one instance of complete problem is executed. 2. A single problem instance is executed that has a series of data items to be processed sequentially, each requiring multiple operations. Numbers pass from one stage to the next

Frequency filter - Objective to remove specific frequencies (f0, f1, f2,f3, etc.) from a digitized signal, f(t). Signal enters pipeline from left:

Sorting numbers with a pipeline pattern Parallel version of Insertion sort Numbers to sort Insertion sort: Each number is inserted into the position such that all numbers before it are smaller, by comparing with all the numbers before it. 1 4, 3, 1, 2, 5 2 4, 3, 1, 2 5 5 3 4, 3, 1 2 2 4 4, 3 1 5 Time steps 3 5 5 1 2 4 4 3 6 1 2 5 4 5 7 1 2 3 4 8 1 2 3 5 5 9 1 2 3 4 10 1 2 3 4 5

Pipeline for sorting using insertion sort Receive x from Pi-1 if (stored_number < x) { send stored_number to Pi x = stored_number; } else send x to Pi The basic algorithm for process Pi is: Larger numbers P0 P0 P0 Series of number to sort xn-1, … x1, x0 Compare Compare Compare xmin Next smallest number Smallest number

Prime Number Generation: Sieve of Eratosthenes Series of all integers generated from 2. First number, 2, is prime and kept. All multiples of this number deleted as they cannot be prime. Process repeated with each remaining number. The algorithm removes non-primes, leaving only primes.

The code for a process, Pi, could be based upon Receive x from P i-1 /* repeat following for each number */ Receive number from Pi-1 if ((number % x) != 0) send number to Pi+1 Each process will not receive the same number of numbers and the exact number is not known beforehand. Use a “terminator” message, which is sent at the end of the sequence: Receive x from P i-1 for (i = 0; i < n; i++) { Receive number from Pi-1 if (number == terminator) break; if ((number % x) != 0) send number to Pi+1 }

Other situations that a pipeline can be used effectively If information to start next process can be passed forward before all its internal operations have been completed.

a’s and b’s are constants and x’s are unknowns to be found. Example: Solving Upper-triangular System of Linear Equations using Back Substitution (created after Gaussian Elimination step for a general system of linear equations) a’s and b’s are constants and x’s are unknowns to be found.

Back Substitution First, unknown x0 is found from last equation; i.e., Value obtained for x0 substituted into next equation to obtain x1; i.e., Values obtained for x1 and x0 substituted into next equation to obtain x2: and so on until all the unknowns are found.

Pipeline Solution First pipeline stage computes x0 and passes x0 onto the second stage, which computes x1 from x0 and passes both x0 and x1 onto the next stage, which computes x2 from x0 and x1, and so on.

Pipeline processing using back substitution

More complex pipelines Non-linear pipelines – those that features such as: Multiple paths Multifunction Multiple outputs Feedback and feedforward paths These types of pipelines are found in logic design courses

Example pipeline with two paths This is an application for a pattern operator

Structures that allow two-way movement between stages Not truly a pipeline We will look into these structures separately.

Questions