Parallel Processing & Parallel Algorithm May 8, 2003 B4 Yuuki Horita.

Parallel Processing & Parallel Algorithm May 8, 2003 B4 Yuuki Horita

Chapter 3 Principles of Parallel Programming

Data Dependency Processors Communication Mapping Granularity

Data Dependency data flow dependency data anti-dependency data output dependency data input dependency data control dependency

data flow dependency & data anti-dependency - data flow dependency S1: A = B + C S2: D = A + E - data anti-dependency S1: A = B + C S2: B = D + E

data output dependency & data input dependency - data output dependency S1: A = B + C S2: A = D + E - data input dependency S1: A = B + C S2: D = B + E

data control dependency S1: A = B - C if ( A > 0 ) S2: then D = 1; S3: else D = 2; endif S1 ⇒ S2,S3

Dependency Graph G = ( V, E ) V(Vertices) : statements E(Edges) : dependency ex) S1: A = B + C S1 → S2 : flow dep., anti-dep. S2: B = A * 3 S1 → S3 : output dep., input dep. S3: A = 2 * C S2 → S3 : anti-dep. S4: P = B ≧ 0 S2 → S4 : flow dep. if ( P is True) S4 → S5, S6 : control dep. S5: then D = 1 S6: else D = 2 endif

Elimination of the dependencies data output dependency data anti-dependency ⇒ renaming can remove these form of dependency

Elimination of the dependencies(2) Ex) S1: A = B + C S2: B = D + E S3: A = F + G S1,S2 : anti-dep. S1,S3 : output-dep. S1: A = B + C S2: B’ = D + E S3: A’ = F + G No dependency! Renaming

Processors communication Message passing communication - processors communicate via communication links Shared memory communication - processors communicate via common memory

Message Passing System Interconnection Network PE1 M1 PE2 M2 PE3 M3 PEm Mm ・・・

Message Passing System(2) Send and Receive operations - blocking - nonblocking System - synchronous blocking send and blocking receive operations - asynchronous nonblocking send and blocking receive operations ( the messages are buffered )

Shared memory system Interconnection Network PE1PE2PE3PEm ・・・ M1M2M3Mm ・・・ Global Memory

Mapping … matching parallel algorithms to parallel architecture mapping to - asynchronous architecture - synchronous architecture - distributed architecture

Mapping to Asynchronous Architecture Mapping a program to an asynchronous shared- memory computer has the following steps: 1.Allocate each statement in the program to a processor 2.Allocate each variable to a memory 3.Specify the control flow for each processor the sender may send many messages without the receiver removing them from the channel ⇒ the need for buffering messages

Mapping to Synchronous Architecture Mapping a program is the same as Asynchronous Architecture a common clock for synchronization purposes each processor executes an instruction at each step ( at each clock tick) only one message exists at any one time on the channel ⇒ no buffering is needed

Mapping to Distributed Architecture A local memory accessible only by owning processor only a pair of processor along the channel Mapping a program is the same as in the asynchronous shared-memory architecture, except that each variable is allocated either to a processor or a channel

Granularity relates to the ratio of the amount of computation to the amount of communication fine : at statement level medium : at procedure level coarse : at program level

Program Level Parallelism a program creates a new process by creating a complete copy of itself - Fork() （ UNIX)

Statement Level Parallelism ・ Parbegin-Parend block Par-begin Statement 1 Statement 2 : Statement n Par-end ⇒ the statements Statement 1 ～ Statement n are executed in parallel

Statement Level Parallelism(2) ex) (a + b) * (c + d) – (e / f ) Par-begin t1 = a + b t2 = c + d Par-end t4 = t1 * t2 t3 = e / f Par-end t5 = t4 – t3

Statement Level Parallelism(3) ・ Fork, Join, Quit - Fork x cause a new process to be created and to start executing at the instruction labeled x - Join t, y t = t – 1 if ( t = 0) then go to y - Quit the process terminates

Statement Level Parallelism(4) ex) (a + b) * (c + d) – (e / f ) n = 2 m = 2 Fork P2 Fork P3 P1: t1 = a + b; Join m, P4; Quit; P2: t2 = c + d; Join m, P4; Quit; P4: t4 = t1*t2; Join n, P5; Quit; P3: t3 = e / f; Join n, P5; Quit; P5: t5 = t4 – t3

Parallel Processing & Parallel Algorithm May 8, 2003 B4 Yuuki Horita.

Similar presentations

Presentation on theme: "Parallel Processing & Parallel Algorithm May 8, 2003 B4 Yuuki Horita."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallel Processing & Parallel Algorithm May 8, 2003 B4 Yuuki Horita.

Similar presentations

Presentation on theme: "Parallel Processing & Parallel Algorithm May 8, 2003 B4 Yuuki Horita."— Presentation transcript:

Similar presentations

About project

Feedback