Pipelined Computations P0P1P2P3P5. Pipelined Computations a s in s out a[0] a s in s out a[1] a s in s out a[2] a s in s out a[3] a s in s out a[4] for(i=0.

Slides:



Advertisements
Similar presentations
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Advertisements

PROBLEM SOLVING TECHNIQUES
1 5.1 Pipelined Computations. 2 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming).
Lecture: Pipelining Basics
4.1 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M.
1 5.1 Pipelined Computations. 2 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming).
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
EECC756 - Shaaban #1 lec # 8 Spring Synchronous Iteration Iteration-based computation is a powerful method for solving numerical (and some.
Solving Equations = 4x – 5(6x – 10) -132 = 4x – 30x = -26x = -26x 7 = x.
Lecture 7 : Parallel Algorithms (focus on sorting algorithms) Courtesy : SUNY-Stony Brook Prof. Chowdhury’s course note slides are used in this lecture.
Numerical Algorithms Matrix multiplication
Algorithms: Sorting. Rand Sort. Compare-and-exchange. Merge Sort. Quick Sort. Odd-even merge sort. Bitonic merge sort.
Parallel System Performance CS 524 – High-Performance Computing.
Numerical Algorithms • Matrix multiplication
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
EECC756 - Shaaban #1 lec # 8 Spring Message-Passing Computing Examples Problems with a very large degree of parallelism: –Image Transformations:
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining.
COMPE575 Parallel & Cluster Computing 5.1 Pipelined Computations Chapter 5.
1 Friday, November 03, 2006 “The greatest obstacle to discovery is not ignorance, but the illusion of knowledge.” -D. Boorstin.
Parallel System Performance CS 524 – High-Performance Computing.
Chapter 7 (Part 2) Sorting Algorithms Merge Sort.
Assignment Solving System of Linear Equations Using MPI Phạm Trần Vũ.
2a.1 Evaluating Parallel Programs Cluster Computing, UNC-Charlotte, B. Wilkinson.
Chapter 7 – Solving Systems of Linear Equations 7.3 – Solving Linear Systems by Linear Combinations.
Solve the equation -3v = -21 Multiply or Divide? 1.
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “ Introduction to the Design & Analysis of Algorithms, ” 2 nd ed., Ch. 1 Chapter.
Complexity of algorithms Algorithms can be classified by the amount of time they need to complete compared to their input size. There is a wide variety:
Definitions Speed-up Efficiency Cost Diameter Dilation Deadlock Embedding Scalability Big Oh notation Latency Hiding Termination problem Bernstein’s conditions.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 3, 2005 Session 7.
CSCI-455/552 Introduction to High Performance Computing Lecture 13.
Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.
Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
. 5.1 write linear equation in slope intercept form..5.2 use linear equations in slope –intercept form..5.3 write linear equation in point slope form..5.4.
Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University.
Fundamentals of Programming Languages-II
1 Ch.19 Divide and Conquer. 2 BIRD’S-EYE VIEW Divide and conquer algorithms Decompose a problem instance into several smaller independent instances May.
Numerical Algorithms.Matrix Multiplication.Gaussian Elimination.Jacobi Iteration.Gauss-Seidel Relaxation.
$100 $200 $300 $400 $100 $200 $300 $400 $300 $200 $100 Adding multiples of 10 Add 1 to 2 digit numbers with regrouping Add 3 two digit numbers Ways.
Vishnu Kotrajaras, PhD.1 Data Structures
CSCI-455/552 Introduction to High Performance Computing Lecture 15.
Array Sort. Sort Pass 1 Sort Pass 2 Sort Pass 3.
生产者一级消费者 二级消费者 1 、写出花坛中有可能出现的食物链,并用 连接植物、动物名 称 2 、试一试, 还可以怎样写, 来表示它们之间的关系呢 ?
Asymptotic Notations Dr. Munesh Singh.
Students will solve two step equations (12-4).
Synchronous Computations
Problem Solving Strategies
Pipelined Computations
CS4230 Parallel Programming Lecture 12: More Task Parallelism Mary Hall October 4, /04/2012 CS4230.
Array Processor.
CSc 110, Spring 2017 Lecture 39: searching.
Decomposition Data Decomposition Functional Decomposition
Pipelined Computations
Programming and Data Structure
Introduction to High Performance Computing Lecture 12
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt Oct 24, 2013.
Pipelined Pattern This pattern is implemented in Seeds, see
Computer Programming.
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt March 20, 2014.
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson slides5.ppt August 17, 2014.
Dense Linear Algebra (Data Distributions)
Matrix Addition and Multiplication
Introduction to High Performance Computing Lecture 16
Learn to solve 2-step equations
Data Parallel Pattern 6c.1
Lecture: Pipelining Basics
Solving a System of Linear Equations
Presentation transcript:

Pipelined Computations P0P1P2P3P5

Pipelined Computations a s in s out a[0] a s in s out a[1] a s in s out a[2] a s in s out a[3] a s in s out a[4] for(i=0 ; i<n ; i++) sum = sum + a[i]; sum = sum + a[0]; sum = sum + a[1]; sum = sum + a[2]; sum = sum + a[3]; sum = sum + a[4];

Pipelined Computations f0f0 f in f out f1f1 f in f out f2f2 f in f out f3f3 f in f out f4f4 f in f out Signal without f 0 Signal without f 1 Signal without f 2 Signal without f 3

Pipelined Computations 1.If more than one instance of the complete problem is to be executed. 2.If a series of data items must be processed, each requiring multiple operations 3.If information to start the next process can be passed forward before the process has completed all its internal operations

Pipelined Computations P5 P4 P3 P2 P1 P mp-1 The average number of cycles is (m+p-1) cycles.

Pipelined Computations P4P3P2P1P0P5 P4P3P2P1P0P5 P4P3P2P1P0P5 P4P3P2P1P0P5 P4P3P2P1P0P5 I0 I1 I2 I3 I4 … time

Pipelined Computations P0P1P2P3P4P5P6P7P8P9 d9d8d7d6d5d4d3d2d1d0d9d8d7d6d5d4d3d2d1d0 P8 P7 P6 P5 P4 P3 P2 P1 P p-1n

Pipelined Computations P0 P1 P2 P3 P4 P5 P0 P1 P2 P3 P4 P5

Pipelined Computations P0P1P2P3 Processor 0 P4P5P6P7 Processor 1 P8P9P10P11 Processor 2

Pipelined Computations Host computer An ideal interconnection structure is a line or ring structure.

Adding numbers P0P1P2P3P4 recv(&accumulation, Pi-1); accumulation = accumulation + number; send(&accumulation, Pi+1);

Adding numbers P 0 : send(&number, P 1 ); P n-1 : recv(&number, P n-2 ); accumulation = accumulation + number; send(&number, P 0 ); if( process > 0){ recv(&accumulation, Pi-1); accumulation = accumulation + number; } if( process < n-1) send(&accumulation, Pi+1);

Adding numbers P0P1P2P4 d n-1 …..d 2 d 1 d 0 sum

Adding numbers P0P1P2Pn-1 d n-1 …..d 2 d 1 d 0 sum …..

Adding numbers Analysis : t total =(time for one pipeline cycle)(number of cycles) t total =(t comp +t comm )(m+p-1) t a = t total /m Single Instance of Problem : t comp = 1 t comp = 2(t startup +t data ) t comp = (2(t startup +t data )+1)n

Adding numbers Multiple Instance of Problem: t comp = (2(t startup +t data )+1)(m+n-1) t a = t total /m=2(t startup +t data )+1

Adding numbers Data partitioning with Multiple Instances of problem t comp = d t comp = 2(t startup +t data ) t comp = (2(t startup +t data )+d)(m+n/d-1)

Sorting Numbers P0 P1 P2 P3 P ,3,1,2,5 4,3,1,2 4,3,1 4,

Sorting Numbers recv(&number, Pi-1); if(number > x) { send(&x, Pi+1); x = number; } else send(&number, Pi+1); right_procno = n-i-1; recv(&x, Pi-1); for(j=0 ; j<right_procno ; j++) recv(&number, Pi-1); if(&number > x){ send(&x, Pi+1); x = number; } else send(&number, Pi+1); }

Sorting Numbers x max compare x n-1..x 1 x 0 x max compare x max compare P0 P1 P2 smaller numbers

Sorting Numbers P0P1P2 Pn-1 d n-1 …..d 2 d 1 d 0 Master right_procno = n-i-1; recv(&x, Pi-1); for(j=0 ; j<right_procno ; j++) recv(&number, Pi-1); if(&number > x){ send(&x, Pi+1); x = number; } else send(&number, Pi+1); } send(&number, Pi-1); for(j=0 ; j<right_procno ; j++){ recv(&x, Pi+1); send(&x, Pi-1); }

Sorting Numbers analysis : Sorting phase Returning sorted numbers 2n-1 n P4 P3 P2 P1 P0

Primer number Generation 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 To find the primes up to n, it is only necessary to start at numbers up to

Primer number Generation Analysis for(i=2 ; i<n ; i++) prime[i] = 1; for(i=2 ; i<=sqrt_n ; i++) if(prime[i] == 1) for(j=i+1 ; i<n ; j=j+i) prime[j] = 0; Sequential code:

Primer number Generation Parallel Code recv(&x, Pi-1) recv(&number, Pi-1); if((number%x) !=0) send(&number, Pi+1); p1p1 compare x n-1..x 1 x 0 p2p2 compare p3p3 P0 P1 P2 Not multiple of 1 st prime number recv(&x, Pi-1); for(i=0 ; i<n ; i++){ recv(&number, Pi-1); if(number == terminator) break; if((number%x) !=0 ) send(&number, Pi-1); }

Solving a System of Linear Equations Upper-triangular

Solving a System of Linear Equations Sequential Code x[0] = b[0] / a[0][0]; for(i=1 ; i<n ; i++){ sum = 0; for(j=0 ; j<i ; j++){ sum = sum + a[i][j]*x[j]; x[i]=(b[i]-sum)/a[i][i]; }

Solving a System of Linear Equations Compute x0Compute x1Compute x2Compute x3 X0 X1 X2 X3 X0 X1 X2 x0 x1 x0

Solving a System of Linear Equations Parallel Code for(j=0 ; j<i ; j++){ recv(&x[j], Pi-1); send(&x[j], Pi+1); } sum=0; for(j=0 ; j<i ; j++) sum=sum+a[i][j]*x[j]; x[i] = (b[I]-sum)/a[i][i]; Send(&x[i], Pi+1);

Solving a System of Linear Equations Parallel Code : P i sum = 0; for(j=0 ; j<i ; j++){ recv(&x[j], Pi-1); send(&x[j], Pi+1); sum = sum + a[i][j]*x[j]; } x[i]=(b[i]-sum)/a[i][i]; send(&x[i], Pi+1);

divide send(x0) end recv(x0) send(x0) multiply/add divide/subtract send(x1) end recv(x0) send(x0) multiply/add recv(x1) send(x1) multiply/add divide/subtract send(x2) end recv(x0) send(x0) multiply/add recv(x1) send(x1) multiply/add divide/subtract recv(x2) send(x2) multiply/add divide/subtract send(x3) end recv(x0) send(x0) multiply/add recv(x1) send(x1) multiply/add divide/subtract recv(x2) send(x2) multiply/add divide/subtract recv(x3) send(x3) multiply/add divide/subtract send(x4) end time P0 P1 P2 P3 P4 Solving a System of Linear Equations

P5 P4 P3 P2 P1 P0 First value passed onward final computed value