Dependence Precedence. Precedence & Dependence Can we execute a 1000 line program with 1000 processors in one step? What are the issues to deal with in.

Slides:



Advertisements
Similar presentations
Problem Solving INFORMATION TECHNOLOGY
Advertisements

Solutions for Homework Assignment 1
COSC513 Operating System Research Paper Fundamental Properties of Programming for Parallelism Student: Feng Chen (134192)
Vector: Data Layout Vector: x[n] P processors Assume n = r * p
2. Processes and Interactions 2.1 The Process Notion 2.2 Defining and Instantiating Processes –Precedence Relations –Implicit Process Creation –Dynamic.
CS107 Introduction to Computer Science Lecture 3, 4 An Introduction to Algorithms: Loops.
Instruction-Level Parallel Processors {Objective: executing two or more instructions in parallel} 4.1 Evolution and overview of ILP-processors 4.2 Dependencies.
Programmability Issues
Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
ALGORITHMS & FLOWCHARTING II
Jack Ou, Ph.D. CES522 Engineering Science Sonoma State University
INTRODUCTION COMPUTATIONAL MODELS. 2 What is Computer Science Sciences deal with building and studying models of real world objects /systems. What is.
True BASIC Ch. 6 Practice Questions. What is the output? PRINT X LET X = -1 PRINT X FOR X = 4 TO 5 STEP 2 PRINT X NEXT X PRINT X END.
©Silberschatz, Korth and Sudarshan15.1Database System ConceptsTransactions Transaction Concept Transaction State Implementation of Atomicity and Durability.
Anthony J. Nowakowski, Ph.D. Computer Literacy Computer Literacy Brought to you by...
A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.
Sentinel Logic Assumes while loops and input statements.
Data: A collection of raw facts and figures. It may consist of numbers, characters, symbols or pictures. Information: Organized and processed form of.
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
Computer Unit Identify the Part Grade 7 Computer Unit.
Chapter One Introduction to Pipelined Processors.
Configuration.
The CPU (or Central Processing Unit. Statistics Clock speed – number of instructions that can be executed per second Data width – The number of bits held.
Chapter One Introduction to Pipelined Processors.
Threads in Java. History  Process is a program in execution  Has stack/heap memory  Has a program counter  Multiuser operating systems since the sixties.
Computer Programming A program is a set of instructions a computer follows in order to perform a task. solve a problem Collectively, these instructions.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Introduction to Concurrency.
A step-by-step procedure for solving a problem in a finite number of steps.
Repetition Structures Repetition Structures allow you to write programs that will repeat program steps multiple times. –Also called Loops –Counter controlled.
CCSB223/SAD/CHAPTER131 Chapter 13 Designing the System Internals.
Parallel execution Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 25, 2011 Synchronization.ppt Synchronization These notes will introduce: Ways to achieve.
DOCUMENTATION SECTION GLOBAL DECLARATION SECTION
Chapter 14 Transactions Yonsei University 1 st Semester, 2015 Sanghyun Park.
Transactions. Transaction: Informal Definition A transaction is a piece of code that accesses a shared database such that each transaction accesses shared.
Concurrency Properties. Correctness In sequential programs, rerunning a program with the same input will always give the same result, so it makes sense.
Vector and symbolic processors
Software System Lab. Transactions Transaction Concept A transaction is a unit of program execution that accesses and possibly updates various.
Computer Fundamentals MSCH 233 Lecture 1. What is a computer? A computer is an electronic machine which can accept data in a certain form, process the.
Introduction to Computers Section 4B. home Central Processing Unit The computer’s primary processing hardware, which interprets and executes program instructions.
Programming Introduction. What is a program? Computers cannot think for themselves, they can only follow instructions. A program is a set of instructions.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
An Overview of Parallel Processing
Chapter One Introduction to Pipelined Processors.
Software. Introduction n A computer can’t do anything without a program of instructions. n A program is a set of instructions a computer carries out.
CS 221 – May 22 Timing (sections 2.6 and 3.6) Speedup Amdahl’s law – What happens if you can’t parallelize everything Complexity Commands to put in your.
1.  A step by step process to solve any problem is called algorithm.  Algorithm is a process which take some values as input and provide us output.
Slide 6-1 Chapter 6 Terms System Software Considerations Introduction to Information Systems Judith C. Simon.
6/27/20161 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam King,
1 ITCS 4/5145 Parallel Programming, B. Wilkinson, Nov 12, CUDASynchronization.ppt Synchronization These notes introduce: Ways to achieve thread synchronization.
1 Sections 3.1 – 3.2a Basic Syntax and Semantics Fundamentals of Java: AP Computer Science Essentials, 4th Edition Lambert / Osborne.
Starter What does the following code do?
PERFORMANCE EVALUATIONS
Introduction to Algorithms
ALGORITHMS & FLOWCHARTING II
Algorithm.
Array Processor.
Processor Management Damian Gordon.
المدخل إلى تكنولوجيا التعليم في ضوء الاتجاهات الحديثة
Conditionals.
COMP60611 Fundamentals of Parallel and Distributed Systems
CS150 Introduction to Computer Science 1
Visit for more Learning Resources
How Computers Work Part 1 6 February 2008.
Lecture 18 Syed Mansoor Sarwar
A LESSON IN LOOPING What is a loop?
Synchronization These notes introduce:
Processor Management Damian Gordon.
 Is a machine that is able to take information (input), do some work on (process), and to make new information (output) COMPUTER.
Presentation transcript:

Dependence Precedence

Precedence & Dependence Can we execute a 1000 line program with 1000 processors in one step? What are the issues to deal with in various parallelizing situations: –Parallel Programming? –Instruction Level Parallelism? What type analysis is used to study concurrent database operation?

Dependence

Making Use of Processors In parallelizing algorithms, we want to use as many processors as possible in an effort to finish in as little time as possible. Often, it is not possible to make complete use of all processors in all time units –Some instructions (or sections of instructions) depend upon others –Others have a different, related problem called precedence (next section)

Input and Output Input and output cannot be parallelized in the strict sense because we’re dealing with a user. We assume multiple, parallel streams of input and output (modems, etc.).

Read and Print statements Read(x)x <- keyboard Print(x)screen <- x

Dependency Relationships Dependencies are relationships between the steps of an algorithm such that one step depends upon another. (S1)read (a) (S2)b <- a * 3 (S3)c <- b * a

Dependency Relationships Dependencies are relationships between the steps of an algorithm such that one step depends upon another. (S1)a <- keyboard (S2)b <- a * 3 (S3)c <- b * a Here, S2 is dependent on S1 to provide the appropriate value of a. Similarly, S3 is dependent on both S1 (for a’s value) and S2 (for b’s value). Since S2 needs a also, we can simply say that S3 is dependent on S2. Don’t need

Dependence Defined by a “read after write”* relationship This means moving from the left to the right side of the assignment operator. a <- 5 b <- a + 2 *Note: “Read” and “Write” in this case refer to reading the value from a memory location and writing a value to a memory location. Not Input/Output.

Graphing Dependence Relations S1 S2 Processors Time

(S1)read (a) (S2)b <- a * 3 (S3)c <- b * a Dependency Graphs

(S1)a <- keyboard (S2)b <- a * 3 (S3)c <- b * a In this case, it does not matter how many processors we have; we can use only one processor to finish in 3 time units. Dependency Graphs S1 S2 S3 Processors Time

What If There Are No Dependencies? S1S2S3 (S1) read (a) (S2) b <- b + 3 (S3) c <- c + 4 We can use three processors to get it done in a single time chunk. Processors Time

A Dependency Example (S1) read (a) (S2) read (b) (S3) c <- a * 4 (S4) d <- b / 3 (S5) e <- c * d (S6) f <- d + 8

A Dependency Example (S1) a <- keyboard (S2) b <- keyboard (S3) c <- a * 4 (S4) d <- b / 3 (S5) e <- c * d (S6) f <- d + 8

A Dependency Example (S1) a <- keyboard (S2) b <- keyboard (S3) c <- a * 4 (S4) d <- b / 3 (S5) e <- c * d (S6) f <- d + 8 S1S2

A Dependency Example (S1) a <- keyboard (S2) b <- keyboard (S3) c <- a * 4 (S4) d <- b / 3 (S5) e <- c * d (S6) f <- d + 8 S1S2 S3

A Dependency Example (S1) a <- keyboard (S2) b <- keyboard (S3) c <- a * 4 (S4) d <- b / 3 (S5) e <- c * d (S6) f <- d + 8 S1S2 S3S4

A Dependency Example (S1) a <- keyboard (S2) b <- keyboard (S3) c <- a * 4 (S4) d <- b / 3 (S5) e <- c * d (S6) f <- d + 8 S1S2 S3S4 S5

A Dependency Example (S1) a <- keyboard (S2) b <- keyboard (S3) c <- a * 4 (S4) d <- b / 3 (S5) e <- c * d (S6) f <- d + 8 S1S2 S3S4 S5 S6

A Dependency Example (S1) a <- keyboard (S2) b <- keyboard (S3) c <- a * 4 (S4) d <- b / 3 (S5) e <- c * d (S6) f <- d + 8 S1S2 S3S4 S5 S6 Using 2 processors, we finish 6 instructions in 3 units of time.

Dependence and Iteration Ignore steps that are not part of loop (overhead costs similar to making parallelism work) –Don’t worry about loop, exitif, counter variables, endloop, etc. Use notation to indicate passes: ‘ “ “‘ Unroll the loop, replacing the counter variable with a literal value.

An Iterative Example I <- 1 loop exitif (I > MAX_ARRAY) (S1)read (A[I]) (S2)B[I] <- A[I] + 4 (S3)C[I] <- A[I] / 3 (S4)D[I] <- B[I] / C[I] I <- I + 1 endloop

(S1) read (A[1]) (S2) B[1] <- A[1] + 4 (S3) C[1] <- A[1] / 3 (S4) D[1] <- B[1] / C[1] (S1’) read (A[2]) (S2’) B[2] <- A[2] + 4 (S3’) C[2] <- A[2] / 3 (S4’) D[2] <- B[2] / C[2] (S1”) read (A[3]) (S2”) B[3] <- A[3] + 4 (S3”) C[3] <- A[3] / 3 (S4”) D[3] <- B[3] / C[3] S1 S2 S4 S3 One iteration

An Iterative Example S1 S2 S4 S3 S1’ S2’ S4’ S3’ S1” S2” S4” S3”

Limited Number of Processors What if the number of processors is fixed? Some processors may be being used by another program/user If the number of processors available are less than the number of processors that can be utilized, shift instructions into lower time units.

A Limited Processor Example S1 S2 S4 S3 S1’ S2’ S4’ S3’ S1” S2” S4” S3”

Questions?

Precedence

Precedence Relationships Exists if a statement would contaminate the data needed by another, preceding instruction. (S1)read (a) (S2)print (a) (S3)a <- a * 7 (S4)print (a)

Precedence Relationships Exists if a statement would contaminate the data needed by another, preceding instruction. (S1)a <- keyboard (S2)screen <- a (S3)a <- a * 7 (S4)screen <- a S2 and S3 are dependent on S1 (for the initial value of a).

Precedence Relationships Exists if a statement would contaminate the data needed by another, preceding instruction. (S1)a <- keyboard (S2)screen <- a (S3)a <- a * 7 (S4)screen <- a S2 and S3 are dependent on S1 (for the initial value of a). S4 is dependent on S3 (for updated a).

Precedence Relationships Exists if a statement would contaminate the data needed by another, preceding instruction. (S1)a <- keyboard (S2)screen <- a (S3)a <- a * 7 (S4)screen <- a S2 and S3 are dependent on S1 (for the initial value of a). S4 is dependent on S3 (for updated a). There is also a precedence relationship between S2 and S3.

Precedence Relationships Exists if a statement would contaminate the data needed by another, preceding instruction. (S1)a <- keyboard (S2)screen <- a (S3)a <- a * 7 (S4)screen <- a S2 and S3 are dependent on S1 (for the initial value of a). S4 is dependent on S3 (for updated a). There is also a precedence relationship between S2 and S3. S3 must follow S2, else S3 could corrupt what S2 does.

Precedence Relationships Exists if a statement would contaminate the data needed by another, preceding instruction. (S1)a <- keyboard (S2)screen <- a (S3)a <- a * 7 (S4)screen <- a S2 and S3 are dependent on S1 (for the initial value of a). S4 is dependent on S3 (for updated a). There is also a precedence relationship between S2 and S3. S3 must follow S2, else S3 will corrupt what S2 does.

Precedence Defined by a “write after write” or “write after read” relationship. This means using the variable on the left side of the assignment operator after it has appeared previously on the right or left. b <- a + 2a <- 7a <- 5

Showing Precedence Relations S1 S2 Processors Time

Precedence Graphs (S1)read (a) (S2)print (a) (S3)a <- a * 7 (S4)print (a)

S1 S2 S3 S4 Precedence Graphs (S1)a <- keyboard (S2)screen <- a (S3)a <- a * 7 (S4)screen <- a Precedence arrow blocks S3 from executing until S2 is finished.

S1 S2 S3 S4 Precedence Graphs (S1)a <- keyboard (S2)screen <- a (S3)a <- a * 7 (S4)screen <- a Precedence arrow blocks S3 from executing until S2 is finished. Dependency arrow between S1 and S3 is superfluous

What if there is No Precedence? (S1) read (a) (S2) b <- b + 3 (S3) c <- c + 4 We can use three processors to get it done in a single time chunk. S1 S2 S3

Precedence and Iteration Ignore steps that are not part of loop (overhead costs similar to making parallelism work) –Don’t worry about loop, exitif, counter variables, endloop, etc. Use notation to indicated passes: ‘ “ “‘ Unroll the loop, replacing the counter variable with a literal value.

An Iterative Example i <- 1 loop exitif (i > 3) (S1)read (a) (S2)print (a) (S3)a <- a * 7 (S4)print (a) i <- i + 1 endloop

An Iterative Example i <- 1 loop exitif (i > 3) (S1)a <- keyboard (S2)screen <- a (S3)a <- a * 7 (S4)screen <- a i <- i + 1 endloop

(S1) a <- keyboard (S2) screen <- a (S3) a <- a * 7 (S4) screen <- a (S1’) a <- keyboard (S2’) screen <- a (S3’) a <- a * 7 (S4’) screen <- a (S1”) a <- keyboard (S2”) screen <- a (S3”) a <- a * 7 (S4”) screen <- a S4 S1 S2 S3 S1’

S1 S2 S3 S4 S1’ S2’ S3’ S4’ S1”S2” S3” S4” Iteration and Precedence Graphs

Space vs. Time We can optimize time performance by changing shared variable to an array of independent variables. i <- 1 loop exitif (i > 3) (S1)read (a[i]) (S2)print (a[i]) (S3)a[i] <- a[i] * 7 (S4)print (a[i]) i <- i + 1 endloop

S1 S2 S3 S4 S1’ S2’ S3’ S4’ S1” S2” S3” S4” Precedence Graphs We can use 3 processors to finish in 4 time units. Note that product complexity is unchanged.

What if Both Precedence and Dependence? If two instructions have both a precedence and a dependence relation (S1) a <- 5 (S2) a <- a + 2 showing only dependence is sufficient. S1 S2

Another Iterative Example i <- 1 loop exitif (i > N) (S1) read (a[i]) (S2) a[i] <- a[i] * 7 (S3) c <- a[i] / 3 (S4) print (c) i <- i + 1 endloop

Another Iterative Example i <- 1 loop exitif (i > N) (S1) a[i] <- keyboard (S2) a[i] <- a[i] * 7 (S3) c <- a[i] / 3 (S4) screen <- c i <- i + 1 endloop

(S1) a[1] <- keyboard (S2) a[1] <- a[1] * 7 (S3) c <- a[1] / 3 (S4) screen <- c (S1’) a[2] <- keyboard (S2’) a[2] <- a[2] * 7 (S3’) c <- a[2] / 3 (S4’) screen <- c (S1”) a[3] <- keyboard (S2”) a[3] <- a[3] * 7 (S3”) c <- a[3] / 3 (S4”) screen <- c

S1 S2 S3 S4 S1’ S2’ S3’ S4’ S1” S2” S3” S4” We have precedence relationships between iterations because of the shared c variable.

Crossing Index Bounds Example I <- 1 loop exitif( I > MAX ) // MAX is 3 (S1)A[I] <- A[I] + B[I] (S2)read( B[I] ) (S3)C[I] <- A[I] * 3 (S4)D[I] <- B[I] * A[I+1] I <- I + 1 endloop

Crossing Index Bounds Example I <- 1 loop exitif( I > MAX ) // MAX is 3 (S1) A[I] <- A[I] + B[I] (S2) B[I] <- keyboard (S3) C[I] <- A[I] * 3 (S4) D[I] <- B[I] * A[I+1] I <- I + 1 endloop

(S1) A[1] <- A[1] + B[1] (S2) B[1] <- keyboard (S3) C[1] <- A[1] * 3 (S4) D[1] <- B[1] * A[2] (S1’) A[2] <- A[2] + B[2] (S2’) B[2] <- keyboard (S3’) C[2] <- A[2] * 3 (S4’) D[2] <- B[2] * A[3] (S1”) A[3] <- A[3] + B[3] (S2”) B[3] <- keyboard (S3”) C[3] <- A[3] * 3 (S4”) D[3] <- B[3] * A[4]

Precedence between iterations S1 S2 S4 S3 S1’ S2’ S4’ S3’

Questions?

Practical Applications We used the single assignments as easy illustrations of the principles. There are additional real applications of this capability: –Much bigger than one assignment –Smaller than one assignment

Large Data Sets Consider the SETI project What do you now know about the data that makes it practical to distribute across millions of processors?

Instruction Processing Break computer’s processing into steps A - fetch instruction B - fetch data C - logical processing (math, test and branch) D - store result Independent for all sequential processing Dependency occurs when branch “ruins” three instruction fetches A B C D time I <- 0 loop exitif( I > MAX) endloop I <- I + 1 blah...

Questions?