Pipeline Computations Increasing Throughput By Using More Processes.

Slides:

Advertisements

Similar presentations

The Important Thing About By. The Important Thing About ******** The important thing about ***** is *****. It is true s/he can *****, *****, and *****.

Advertisements

Problems with M things. Review: NULL MPI handles All MPI_*_NULL are handles to invalid MPI objects MPI_REQUEST_NULL is an exception Calling MPI_TEST/MPI_WAIT.

For(int i = 1; i

What is shape function ? shape function is a function that will give the displacements inside an element if its displacement at all the node locations.

JQL : The Java Query Language Darren Willis, David J Pearce, James Noble.

Slide-1 University of Maryland Five Common Defect Types in Parallel Computing Prepared for Applied Parallel Computing Prof. Alan Edelman Taiga Nakamura.

Practical techniques & Examples

School of Engineering & Technology Computer Architecture Pipeline.

Parallel Processing1 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.

Typhoon: An Ultra-Available Archive and Backup System Utilizing Linear-Time Erasure Codes.

12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.

1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,

The 4th IEEE International Conference on Broadband Communications, Networks and Systems (BROADNETS) Raleigh, NC, USA September 10-13, 2007 Measuring Queue.

Performance Metrics Parallel Computing - Theory and Practice (2/e) Section 3.6 Michael J. Quinn mcGraw-Hill, Inc., 1994.

1 Parallel Computing—Introduction to Message Passing Interface (MPI)

S A B D C T = 0 S gets message from above and sends messages to A, C and D S.

Performance Evaluation Techniques Professor Bob Kinicki Computer Science Department.

Status – Week 245 Victor Moya. Summary Streamer Streamer Creditos investigación. Creditos investigación.

(a) (b) (c) (d). What is (1,2,3)  (3,4,2)? (a) (1, 2, 3, 4) (b) (1,2)  (3,4) (c) (1,3,4,2) (d) (3,1)  (4,2)

Parallel Programming with Java

Program.-(4)* Write a program for input two integer number with message and display their sum. Algorithm steps Algorithm steps 1.Display message for input.

Arduino Part 2 Topics: Serial Communication Programming Constructs: functions, loops and conditionals Digital Input.

E 2 DTS: An energy efﬁciency distributed time synchronization algorithm for underwater acoustic mobile sensor networks Zhengbao Li, Zhongwen Guo, Feng.

10 ROSES JUST FOR YOU !. You receive this … because you’re a special person.

Amalgam: a Reconfigurable Processor for Future Fabrication Processes Nicholas P. Carter University of Illinois at Urbana-Champaign.

Company LOGO Mid semester presentation Spring 2008/9 Performed by: Alexander PavlovDavid Domb Supervisor: Mony Orbach GPS/INS Computing System.

1 Timing MPI Programs The elapsed (wall-clock) time between two points in an MPI program can be computed using MPI_Wtime : double t1, t2; t1 = MPI_Wtime();...

Implementation of the Hangman Game in C++

MPJ Express Alon Vice Ayal Ofaim. Contributors 2 Aamir Shafi Jawad Manzoor Kamran Hamid Mohsan Jameel Rizwan Hanif Amjad Aziz Bryan Carpenter Mark Baker.

1 Overview on Send And Receive routines in MPI Kamyar Miremadi November 2004.

Triangular Form and Gaussian Elimination Boldly on to Sec. 7.3a… HW: p odd.

Wall Encounter By Made easy by Dwayne Abuel.

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.

Robin HJ & R. Hughes-Jones Manchester Sep 1999 Gigabit Ethernet in Ptolemy Status Sep 99 : Stars that exist : –GigEChipTranslate between GigEPacket and.

Georgia Institute of Technology More on Creating Classes part 2 Barb Ericson Georgia Institute of Technology Oct 2005.

1 Ethics of Computing MONT 113G, Spring 2012 Session 5 Binary Addition.

Modeling with Parallel DEVS Serialization in DEVS models Select function Implicit serialization of parallel models E-DEVS: internal transition first,

$100 $200 $300 $400 $100 $200 $300 $400 $300 $200 $100 Skip counting in twos Even or Odd numbers Add & tell if even or odd Add & make numbers Counting.

Basic Circuits – Lab 4 Serial and OSC (maybe some theory too) Xmedia Spring 2011.

Advanced Computer Networks Lecture 1 - Parallelization 1.

1 CSC103: Introduction to Computer and Programming Lecture No 19.

Sec. 1-5 Day 1 HW pg (16-26 even, 33-36). An identity is an equation that is true for all values of the variable. An equation that is an identity.

Computer Science 320 A First Program in Parallel Java.

Wireless TYWu. 433Mhz RF link kit Picture 433Mhz RF link kit Specification –Frequency: 433Mhz. –Receiver Data Output: High - 1/2 Vcc, Low - 0.7v –Transmitter.

All even numbers are divisible by 2 Even numbers are numbers that end with either 0, 2, 4, 6, or 8.

1 CSC103: Introduction to Computer and Programming Lecture No 16.

Print Row Function void PrintRow(float x[ ][4],int i) { int j; for(j=0;j

EE3A1 Computer Hardware and Digital Design Lecture 9 Pipelining.

CONSECUTIVE INTEGERS. CONSECUTIVE INTEGERS - Consecutive integers are integers that follow each other in order. They have a difference of 1 between each.

ECE 449/549 Class Notes #2 Introduction to Discrete-Event Systems Specification (DEVS) Sept

Int fact (int n) { If (n == 0) return 1; else return n * fact (n – 1); } 5 void main () { Int Sum; : Sum = fact (5); : } Factorial Program Using Recursion.

Cache Pipelining with Partial Operand Knowledge Erika Gunadi and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison.

Algebra 3.5 Consecutive Integer Problems. Consider integers on a number line…… n n + 1 n + 2 n + 3.

© David Kirk/NVIDIA and Wen-mei W

Introduction to Omnet++ By: Mahsa Soheil Shamaee.

Decisions Chapter 4.

Continue the car washer example Experimental frame

Stack Memory 1 (also called Call Stack)

Consecutive Integers: Numbers are one apart

Physics-based simulation for visual computing applications

Java Lesson 36 Mr. Kalmes.

To get y, multiply x by 3 then add 2 To get y, multiply x by negative ½ then add 2 To get y, multiply x by 2 then subtract 3 To get y, multiply x.

Solving One and Two Step Equations

Word Problems Numerical Solutions

mediation and duplation

DOUBLE AND HALF.

Synchronizing Computations

Excursions into Parallel Programming

Presentation transcript:

Pipeline Computations Increasing Throughput By Using More Processes

Original Pipeline P2P6P4P5P3P1 time P2P6P4P5P3P1

latency throughput 24 sec 1 every 8 sec

P2P6P4P5P3P1

Add More Processes P2-1 & P2-2P6 P4-1, P4-2 & P4-3 P5-1, P5-2, P5-3 & P5-4 P3P P2 takes twice as long as the fastest, so make 2 of those. P4 three times as long, so make 3. P5 four times as long, so make 4 of those.

Add More Processes P2P6P4P5P3P Does half the work

Add More Processes 2 4 P2-1P1 4 Gets every second message (the even ones) Gets every second message (the odd ones) P2-2

Add Dispersers and Collectors 2 4 P2-1 P1 4 DisperserCollector P1 P2-2 P2-1P2-2 2 P3

New Pipeline P1 P2 P3 P4 P5 P6 Time = 0

New Pipeline Can move from P1 to P2-1 P1 P2 P3 P4 P5 P6 Time = 2

New Pipeline Can move from P1 to P2-2 P1 P2 P3 P4 P5 P6 Time = 4

New Pipeline Can move from P1 to P2-1 Can move from P2-1 to P3 P1 P2 P3 P4 P5 P6 Time = 6

New Pipeline Can move from P1 to P2-2 Can move from P2-2 to P3 Can move from P3 to P4-1 P1 P2 P3 P4 P5 P6 Time = 8

New Pipeline Can move from P1 to P2-1 Can move from P2-1 to P3 Can move from P3 to P4-2 P1 P2 P3 P4 P5 P6 Time = 10

New Pipeline Can move from P1 to P2-2 Can move from P2-2 to P3 Can move from P3 to P4-3 P1 P2 P3 P4 P5 P6 Time = 12

New Pipeline Can move from P1 to P2-1 Can move from P2-1 to P3 Can move from P3 to P4-1 Can move from P4-1 to P5-1 P1 P2 P3 P4 P5 P6 Time = 14

New Pipeline Can move from P4-2 to P5-2 Can move from P1 to P2-2 Can move from P2-2 to P3 Can move from P3 to P4-2 P1 P2 P3 P4 P5 P6 Time = 16

New Pipeline Time = 18 Can move from P4-3 to P5-3 Can move from P2 to P3 Can move from P3 to P4-3

New Pipeline Time = 20 Can move from P4-1 to P5-4 Can move from P2 to P3 Can move from P3 to P4-1

New Pipeline Time = 22 Can move from P3 to P4-2 Can move from P4-2 to P5-1 Can move from P5-1 to P6

New Pipeline Time = 24 Can move from P4-3 to P5-2 Can move from P5-2 to P6

New Pipeline Time = 26 Can move from P4-1 to P5-3 Can move from P5-3 to P6

New Pipeline Time = 28 Can move from P4-2 to P5-4 Can move from P5-4 to P6

New Pipeline Time = 30 Can move from P5-1 to P6

New Pipeline Time = 32 Can move from P5-2 to P6

New Pipeline Time = 34 Can move from P5-3 to P6

New Pipeline Time = 36 Can move from P5-4 to P6

Result Latency: 24 secs. (same as before) Throughput: 1 every 2 secs. (1 every 8 secs. before) Input rate: 1 every 2 secs. (1 every 8 secs. before)

Simulation in MPI void proc(int delay, int from, int too) { while (true) { MPI_Recv(…, from, …); // work for ‘delay’ secs. MPI_Send(…, to, …); } A process:

Simulation in MPI void disperser(int noProc, int from, int procs[]) { while (true) { for (i=0;i<noProc; i++) { MPI_Recv(…, from, …); MPI_Send(…, procs[i], …); } A disperser:

Simulation in MPI void collector(int noProc, int procs[], int to) { while (true) { for (i=0;i<noProc; i++) { MPI_Recv(…, procs[i], …); MPI_Send(…, to, …); } A collector:

New Pipeline Diagram

New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break;

New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break;

New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break; 3,4: proc(4, 2, 5); break;

New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break; 3,4: proc(4, 2, 5); break; 5: collector(2, [3,4], 6); break;

New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break; 3,4: proc(4, 2, 5); break; 5: collector(2, [3,4], 6); break; 6: proc(2, 5, 7); break;

New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break; 3,4: proc(4, 2, 5); break; 5: collector(2, [3,4], 6); break; 6: proc(2, 5, 7); break; 7: disperser(3, 7, [8,9,10]); break;

New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break; 3,4: proc(4, 2, 5); break; 5: collector(2, [3,4], 6); break; 6: proc(2, 5, 7); break; 7: disperser(3, 7, [8,9,10]); break; 8,9,10: proc(6, 7, 11); break;

New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break; 3,4: proc(4, 2, 5); break; 5: collector(2, [3,4], 6); break; 6: proc(2, 5, 7); break; 7: disperser(3, 7, [8,9,10]); break; 8,9,10: proc(6, 7, 11); break; 11: collector(3, [8,9,10], 12); break;

New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break; 3,4: proc(4, 2, 5); break; 5: collector(2, [3,4], 6); break; 6: proc(2, 5, 7); break; 7: disperser(3, 7, [8,9,10]); break; 8,9,10: proc(6, 7, 11); break; 11: collector(3, [8,9,10], 12); break; 12: disperser(4, 11, [13,14,15,16]); break;

16 New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break; 3,4: proc(4, 2, 5); break; 5: collector(2, [3,4], 6); break; 6: proc(2, 5, 7); break; 7: disperser(3, 7, [8,9,10]); break; 8,9,10: proc(6, 7, 11); break; 11: collector(3, [8,9,10], 12); break; 12: disperser(4, 11, [13,14,15,16]); break; 13,14,15,16: proc(8,12,17); break; 16

New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break; 3,4: proc(4, 2, 5); break; 5: collector(2, [3,4], 6); break; 6: proc(2, 5, 7); break; 7: disperser(3, 7, [8,9,10]); break; 8,9,10: proc(6, 7, 11); break; 11: collector(3, [8,9,10], 12); break; 12: disperser(4, 11, [13,14,15,16]); break; 13,14,15,16: proc(8,12,17); break; 17: collector(4, [13,14,15,16], 18); break;

New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break; 3,4: proc(4, 2, 5); break; 5: collector(2, [3,4], 6); break; 6: proc(2, 5, 7); break; 7: disperser(3, 7, [8,9,10]); break; 8,9,10: proc(6, 7, 11); break; 11: collector(3, [8,9,10], 12); break; 12: disperser(4, 11, [13,14,15,16]); break; 13,14,15,16: proc(8,12,17); break; 17: collector(4, [13,14,15,16], 18); break; 18: proc(2, 17, output); break; }

Improvement ? Theory: We went from 1 packet every 8 seconds to 1 every 2 seconds Practice: Run the program and see

Original Pipeline Results donald-duck.cs.unlv.edu(5) mpirun -np 8 pipe 10 Packet 0 - Transport Time: Elapsed since last packet: Packet 1 - Transport Time: Elapsed since last packet: Packet 2 - Transport Time: Elapsed since last packet: Packet 3 - Transport Time: Elapsed since last packet: Packet 4 - Transport Time: Elapsed since last packet: Packet 5 - Transport Time: Elapsed since last packet: Packet 6 - Transport Time: Elapsed since last packet: Packet 7 - Transport Time: Elapsed since last packet: Packet 8 - Transport Time: Elapsed since last packet: Packet 9 - Transport Time: Elapsed since last packet: Latency : Total Time : Average Transport Time: Average Rate : 91072

Improved Pipeline Results donald-duck.cs.unlv.edu(6) mpirun -np 20 pipe2 10 Packet 0 - Transport Time: Elapsed since last packet: Packet 1 - Transport Time: Elapsed since last packet: Packet 2 - Transport Time: Elapsed since last packet: Packet 3 - Transport Time: Elapsed since last packet: Packet 4 - Transport Time: Elapsed since last packet: Packet 5 - Transport Time: Elapsed since last packet: Packet 6 - Transport Time: Elapsed since last packet: Packet 7 - Transport Time: Elapsed since last packet: Packet 8 - Transport Time: Elapsed since last packet: Packet 9 - Transport Time: Elapsed since last packet: Latency : Total Time : Average Transport Time: Average Rate : 29991

Results Original (1000 packets): Latency : 296,258 Total Time : 90,209,466 Average Transport Time: 569,236 Average Rate : 89,986 Improved Pipeline (1000 packets): Latency : 293,727 Total Time : 30,264,697 Average Transport Time: 329,841 Average Rate : 29,974