Presentation is loading. Please wait.

Presentation is loading. Please wait.

Synchronization Memory Consistency

Similar presentations


Presentation on theme: "Synchronization Memory Consistency"— Presentation transcript:

1 Synchronization Memory Consistency
Parallel Processing Shared Memory Multiprocessors – Tutorial Synchronization Memory Consistency ACOE401 Shared Memory Architectures 1

2 Shared Memory Architectures
Question 1: The following code runs in a multi-processor system, where Thread1 and Thread2 run on different processors. X, Y, A, B are shared memory values (all initialized to 0), while r1 and r2 are register values. The system employs sequential consistency. Write down a sequence of instruction execution that will store in A and B the values 0, 0 respectively, after the execution of both threads. Write down a sequence of instruction execution that will store in A and B the values 0, 1 respectively, after the execution of both threads. Write down a sequence of instruction execution that will store in A and B the values 2, 1 respectively, after the execution of both threads. Determine the values stored in A and B, after the execution of both threads, if the instruction execution sequence is 1a2a1b2b1c2c1d2d Justify why the values for A and B can never be 2, 0 respectively, after the execution of both threads. Thread 1 Thread 2 1a: move r1,1; /* r1 = 1 */ 1b: store (X),r1; /* X = 1 */ 1c: move r2,2; /* r2 = 2 */ 1d: store (Y),r2; /* Y = 2 */ 2a: load r1,(Y); /* r1 = Y */ 2b: store (A),r1; /* A = Y */ 2c: load r2,(X) /* r2 = X */ 2d: store (B),r2; /* B = X */ ACOE401 Shared Memory Architectures

3 Shared Memory Architectures
Answer 1: Thread 1 Thread 2 1a: move r1,1; /* r1 = 1 */ 1b: store (X),r1; /* X = 1 */ 1c: move r2,2; /* r2 = 2 */ 1d: store (Y),r2; /* Y = 2 */ 2a: load r1,(Y); /* r1 = Y */ 2b: store (A),r1; /* A = Y */ 2c: load r2,(X) /* r2 = X */ 2d: store (B),r2; /* B = X */ Write down a sequence of instruction execution that will store in A and B the values 0, 0 respectively, after the execution of both threads. 2a  2b  2c  2d  1a  1b  1c  1d Write down a sequence of instruction execution that will store in A and B the values 0, 1 respectively, after the execution of both threads. 1a  1b  2a  2b  2c  2d  1c  1d Or 2a  2b  1a  1b  1c  1d  2c  2d Write down a sequence of instruction execution that will store in A and B the values 2, 1 respectively, after the execution of both threads. 1a  1b  1c  1d  2a  2b  2c  2d The result stored in A and B will be 0,1 Justify why the values for A and B can never be 2, 0 respectively, after the execution of both threads. Thread1 sets the value for X first and the value for Y last, while Thread 2 reads first the value of Y and then the value of X. Hence, if A = 2 then B can not be 0 ACOE401 Shared Memory Architectures

4 Shared Memory Architectures
Question 2: The following code runs in a multi-processor system, where Thread1 and Thread2 run on different processors. X is a shared memory value (initialized to 0), while r1 and r2 are register values. The system employs sequential consistency. Write down a sequence of instruction execution that will store in X the value 3, after the execution of both threads. Write down a sequence of instruction execution that will store in X the value 4, after the execution of both threads. Write down a sequence of instruction execution that will store in X the value 6, after the execution of both threads. Add to the code the necessary synchronization instruction (flags) that will ensure the instruction sequence 1a1b2a2b2c1c1d1e1f Thread 1 Thread 2 1a: move r1,1 1b: store (X),r1 1c: move r1,2 1d: load r2,(X) 1e: add r1,r1,r2 1f: store (X),r1 2a: load r1,(X) 2b: add r2,r1,r1 2c: store (X),r2 ACOE401 Shared Memory Architectures

5 Answer 2: Write down a sequence of instruction execution that will store in X the value 3, after the execution of both threads. 2a2b2c1a1b1c1d1e1f Write down a sequence of instruction execution that will store in X the value 4, after the execution of both threads. 1a1b2a2b2c1c1d1e1f Write down a sequence of instruction execution that will store in X the value 6, after the execution of both threads. 1a1b1c1d1e1f2a2b2c Add to the code the necessary synchronization instruction (flags) that will ensure the instruction sequence 1a1b2a2b2c1c1d1e1f Use flag1 and flag2 and assume that both are initially 0 Thread 1 Thread 2 1a: move r1,1 1b: store (X),r1 1x1: move r3,1 1x2: store (flag1),r3 1x3: load r3,(flag2) 1x4: if (r3==0) goto 1x3 1c: move r1,2 1d: load r2,(X) 1e: add r1,r1,r2 1f: store (X),r1 2x1: load r3,(flag1) 2x2: if(r3==0) goto 2x1 2a: load r1,(X) 2b: add r2,r1,r1 2c: store (X),r2 2x3: move r3,1 2x4: store (flag2),r3 ACOE401 Shared Memory Architectures

6 Shared Memory Architectures
Question 3: The following code runs in a multi-processor system, where Thread1 and Thread2 run on different processors. X, Y, W, and Z are shared memory value (initialized to 0), while r1 and r2 are register values. (a) If the system employs sequential consistency, write down all possible values for A and B, after the execution of both threads. Justify your answer. If the systems employs out-of-order execution with no speculative execution, write down a sequence of instruction execution that will store in A and B the values 0,1, after the execution of both threads. (Note: the processor can reorder instructions within branch boundaries, given that data dependencies are maintained) If the systems employs out-of-order execution with speculative execution, write down a sequence of instruction execution that will store in A and B the values 0,0, after the execution of both threads. (Note: the processor can reorder instructions even across branch boundaries, given that data dependencies are maintained) Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: Store (Y),r1 1d: Load r2,(Z) 1e: If (r2==0) goto 1d 1f: Load r3,(W) 1g: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: Store (Z),r1 2d: Load r2,(Y) 2e: If (r2==0) goto 2d 2f: Load r3,(X) 2g: Store (B),r3; ACOE401 Shared Memory Architectures

7 Shared Memory Architectures
Answer 3: (a) If the system employs sequential consistency, write down all possible values for A and B, after the execution of both threads. Justify your answer. 1,1: 1f executes after Z=1 and thus W=1. 2f executes after Y=1 and thus X=1 If the systems employs out-of-order execution with no speculative execution, write down a sequence of instruction execution that will store in A and B the values 1,0, after the execution of both threads. (Note: the processor can reorder instructions within branch boundaries, given that data dependencies are maintained) 1a1c2a2b2c2d2e2f2g1b1d1e1f1g If the systems employs out-of-order execution with speculative execution, write down a sequence of instruction execution that will store in A and B the values 0,0, after the execution of both threads. (Note: the processor can reorder instructions even across branch boundaries, given that data dependencies are maintained) 1f2f1a1b1c2a2b2c2d2e2g1d1e1g Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: Store (Y),r1 1d: Load r2,(Z) 1e: If (r2==0) goto 1d 1f: Load r3,(W) 1g: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: Store (Z),r1 2d: Load r2,(Y) 2e: If (r2==0) goto 2d 2f: Load r3,(X) 2g: Store (B),r3; ACOE401 Shared Memory Architectures

8 Shared Memory Architectures
Question 4: If the code given below runs on a system that employs sequential consistency, specify whether the following instruction sequences are valid. Justify your answer. 2a1a2b1c2c2d2e2f1b2g1d1e1f1g 1a2a2b2c1b1c2d1d1e2e2f2g1f1g 1a1b2a2b2c2d2e2f2g1c1d1e1f1g Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: Store (Y),r1 1d: Load r2,(Z) 1e: If (r2==0) goto 1d 1f: Load r3,(W) 1g: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: Store (Z),r1 2d: Load r2,(Y) 2e: If (r2==0) goto 2d 2f: Load r3,(X) 2g: Store (B),r3; ACOE401 Shared Memory Architectures

9 Shared Memory Architectures
Answer 4: If the code given below runs on a system that employs sequential consistency, specify whether the following instruction sequences are valid. Justify your answer. 2a1a2b1c2c2d2e2f1b2g1d1e1f1g This instruction sequence is invalid because 1c executes before 1b. Out of Order execution is not allowed in sequential consistency. 1a2a2b2c1b1c2d1d1e2e2f2g1f1g This instruction sequence is valid because all instructions are executed in program order and conditional branches are not violated. 1a1b2a2b2c2d2e2f2g1c1d1e1f1g This instruction sequence is invalid because 2e and 2f are executed before 1c, ie the branch in 1e is false before thread 1 sets Y in 1c. Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: Store (Y),r1 1d: Load r2,(Z) 1e: If (r2==0) goto 1d 1f: Load r3,(W) 1g: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: Store (Z),r1 2d: Load r2,(Y) 2e: If (r2==0) goto 2d 2f: Load r3,(X) 2g: Store (B),r3; ACOE401 Shared Memory Architectures

10 Shared Memory Architectures
Question 5: If the code given below runs on a system that employs out-of-order execution with no speculative execution, specify whether the following instruction sequences are valid. Justify your answer. 1a1b1c2b2a2d2c1d2e2f2g1e1f1g 1a1c1b1f2a2c2b2d2e2f2g1d1e1g 1a1c2a2b2c2d2e1b2f2g1d1e1f1g Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: Store (Y),r1 1d: Load r2,(Z) 1e: If (r2==0) goto 1d 1f: Load r3,(W) 1g: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: Store (Z),r1 2d: Load r2,(Y) 2e: If (r2==0) goto 2d 2f: Load r3,(X) 2g: Store (B),r3; ACOE401 Shared Memory Architectures

11 Shared Memory Architectures
Answer 5: If the code given below runs on a system that employs out-of-order execution with no speculative execution, specify whether the following instruction sequences are valid. Justify your answer. 1a1b1c2b2a2d2c1d2e2f2g1e1f1g This instruction sequence is invalid because 2b executes before 2a due to Out of Order execution, however there is a data dependency on r1 between 2a and 2b. 1a1c1b1f2a2c2b2d2e2f2g1d1e1g This instruction sequence is invalid because 1f executes before 1e. Non speculative execution does not allow Out of Order execution beyond branches. 1a1c2a2b2c2d2e1b2f2g1d1e1f1g This instruction sequence is valid because out-of-order execution does not violate data dependencies and branch boundaries. Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: Store (Y),r1 1d: Load r2,(Z) 1e: If (r2==0) goto 1d 1f: Load r3,(W) 1g: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: Store (Z),r1 2d: Load r2,(Y) 2e: If (r2==0) goto 2d 2f: Load r3,(X) 2g: Store (B),r3; ACOE401 Shared Memory Architectures

12 Shared Memory Architectures
Question 6: If the code given below runs on a system that employs out-of-order execution with speculative execution, specify whether the following instruction sequences are valid. Justify your answer. 1a1b1c2b2a2d2c1d2g2e2f1e1f1g 1a1c1b1g2a2c2b2d2e2f1f1d1e2g 1f1a1c2a2b2c2d2e1b2f2g1d1e1g Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: Store (Y),r1 1d: Load r2,(Z) 1e: If (r2==0) goto 1d 1f: Load r3,(W) 1g: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: Store (Z),r1 2d: Load r2,(Y) 2e: If (r2==0) goto 2d 2f: Load r3,(X) 2g: Store (B),r3; ACOE401 Shared Memory Architectures

13 Shared Memory Architectures
Answer 6: If the code given below runs on a system that employs out-of-order execution with speculative execution, specify whether the following instruction sequences are valid. Justify your answer. 1a1b1c2b2a2d2c1d2g2e2f1e1f1g This instruction sequence is invalid because 2g executes before 2f due to Out of Order execution, however there is a data dependency on r3 between 2f and 2g. 1a1c1b1g2a2c2b2d2e2f1f1d1e2g This instruction sequence is invalid because 1g executes before 1f due to Out of Order execution, however there is a data dependency on r3 between 1f and 1g. 1f1a1c2a2b2c2d2e1b2f2g1d1e1g This instruction sequence is valid because out-of-order execution does not violate data dependencies and branch boundaries. Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: Store (Y),r1 1d: Load r2,(Z) 1e: If (r2==0) goto 1d 1f: Load r3,(W) 1g: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: Store (Z),r1 2d: Load r2,(Y) 2e: If (r2==0) goto 2d 2f: Load r3,(X) 2g: Store (B),r3; ACOE401 Shared Memory Architectures

14 Shared Memory Architectures
Question 6new: If the code given below runs on a system that employs out-of-order execution with speculative execution, specify three reasons for which the sequence is invalid (possible reasons data dependencies, ooo, if, speculative) need to work on sequences 1a1b1c2b2a2d2c1d2g2e2f1e1f1g 1a1c1b1g2a2c2b2d2e2f1f1d1e2g 1f1a1c2a2b2c2d2e1b2f2g1d1e1g Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: Store (Y),r1 1d: Load r2,(Z) 1e: If (r2==0) goto 1d 1f: Load r3,(W) 1g: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: Store (Z),r1 2d: Load r2,(Y) 2e: If (r2==0) goto 2d 2f: Load r3,(X) 2g: Store (B),r3; ACOE401 Shared Memory Architectures

15 Shared Memory Architectures
Question 7: If the code given below runs on a system that employs out-of-order execution with speculative execution, specify whether the following instruction sequences are valid. Justify your answer. (a) 1a1b1e1c1d1f …. (b) 1a1c1d1b1e1f …. (c) 2b2a2c2d2e2f …. (d) 2a2b2c2e2d2f …. (e) 3a3b3e3c3d3f …. (f) 3a3b3c3e3d3f …. Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: MembarSS 1d: Store (Y),r1 1e: Load r2,(Z) 1f: If (r2==0) goto 1e 1g: Load r3,(W) 1h: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: MembarLL 2d: Store (Z),r1 2e: Load r2,(Y) 2f: If (r2==0) goto 2e 2g: Load r3,(X) 2h: Store (B),r3; Thread 3 3a: Move r1,1 3b: Store (W),r1 3c: Membar 3d: Store (Z),r1 3e: Load r2,(Y) 3f: If (r2==0) goto 3e 3g: Load r3,(X) 3h: Store (B),r3; ACOE401 Shared Memory Architectures

16 Shared Memory Architectures
Solution 7: If the code given below runs on a system that employs out-of-order execution with speculative execution, specify whether the following instruction sequences are valid. Justify your answer. (a) 1a1b1e1c1d1f … Yes. No data dependencies, no stores across 1c (b) 1a1c1d1b1e1f … No. 1d can not execute before 1b due to 1c (c) 2b2a2c2d2e2f … No. Data dependency between 2a and 2b (d) 2a2b2c2e2d2f … Yes. No data dependencies, no loads across 2c (e) 3a3b3e3c3d3f … Yes. No data dependencies, no Stores/Loads across 3c (f) 3a3e3c3b3d3f … No. 3e can not execute before 3b due to 3c Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: MembarSS 1d: Store (Y),r1 1e: Load r2,(Z) 1f: If (r2==0) goto 1e 1g: Load r3,(W) 1h: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: MembarLL 2d: Store (Z),r1 2e: Load r2,(Y) 2f: If (r2==0) goto 2e 2g: Load r3,(X) 2h: Store (B),r3; Thread 3 3a: Move r1,1 3b: Store (W),r1 3c: Membar 3d: Store (Z),r1 3e: Load r2,(Y) 3f: If (r2==0) goto 3e 3g: Load r3,(X) 3h: Store (B),r3; ACOE401 Shared Memory Architectures

17 Shared Memory Architectures
Question 6: For the OpenMP program shown below, there are at least four reasons that could lead to wrong results when running the program with 4 threads. Identify each reason and suggest a change in the code that will correct the result. Sequential Code OpenMP Code int, i, k, m; double res=0.0; double val; for (i=0; i< 1000; i++) for (k = 0, k<1000; k++) { m=(i+k) % (i+1); res+=sin(m); } val=sqrt((8*res)/3); double resa[4]; double val; int i,k,m; #pragma omp parallel num_threads(4) { int id = omp_get_thread_num(); if (id==0) for(i=0;i<4;i++) resa[i] = 0.0; #pragma omp for schedule(static) for (i=0; i< 1000; i++) { for (k=0;k<1000,k++) { m=(i+k) % (i+1); resa[id]+=sin(m); }} #pragma omp master {for(i=1;i<4;i++) resa[0]+ = resa[i];} } val=sqrt((8*resa[0])/3); ACOE401 Shared Memory Architectures

18 Shared Memory Architectures
Answer 6: Problem 1 double resa[4]; double val; int i,k,m; #pragma omp parallel num_threads(4) { int id = omp_get_thread_num(); if (id==0) for(i=0;i<4;i++) resa[i] = 0.0; #pragma omp for schedule(static) for (i=0; i< 1000; i++) { for (k=0;k<1000,k++) { m=(i+k) % (i+1); resa[id]+=sin(m); }} #pragma omp master {for(i=1;i<4;i++) resa[0]+ = resa[i];} } val=sqrt((8*resa[0])/3); Synchronization Error on the initialization of sera[i]. Thread 0 initializes array sera[] to 0, and then each threads assumes that it is initially 0 and adds new values in the omp_for loop. It is possible that a thread uses sera[] before it is initializes. This problem can be solved by inserting an omp_barrier instruction before the omp_for loop, or Allowing each thread initializing its element of the array srea[] if (id==0) for(i=0;i<4;i++) resa[i] = 0.0; #pragma omp barrier; #pragma omp for schedule(static) int id = omp_get_thread_num(); resa[id] = 0.0; #pragma omp for schedule(static) ACOE401 Shared Memory Architectures

19 Shared Memory Architectures
Answer 6: Problem 2 double resa[4]; double val; int i,k,m; #pragma omp parallel num_threads(4) { int id = omp_get_thread_num(); if (id==0) for(i=0;i<4;i++) resa[i] = 0.0; #pragma omp for schedule(static) for (i=0; i< 1000; i++) { for (k=0;k<1000,k++) { m=(i+k) % (i+1); resa[id]+=sin(m); }} #pragma omp master {for(i=1;i<4;i++) resa[0]+ = resa[i];} } val=sqrt((8*resa[0])/3); Data Races on variable k. k is a shared variable used by each thread as a loop iteration index. Thus when thread 0 attempts to increment k, it is possible that its value will be changed to 300 by thread 1. This will result in a wrong number of loop iteration and a a wrong result due to the wrong values of k used to determine m This problem can be solved by declaring ‘k’ as a private variable Note that there is no problem for variable ‘i’ since the loop iteration variable used by the omp-for loop is always by default private { for (int k=0;k<1000,k++) ACOE401 Shared Memory Architectures

20 Shared Memory Architectures
Answer 6: Problem 3 double resa[4]; double val; int i,k,m; #pragma omp parallel num_threads(4) { int id = omp_get_thread_num(); if (id==0) for(i=0;i<4;i++) resa[i] = 0.0; #pragma omp for schedule(static) for (i=0; i< 1000; i++) { for (k=0;k<1000,k++) { m=(i+k) % (i+1); resa[id]+=sin(m); }} #pragma omp master {for(i=1;i<4;i++) resa[0]+ = resa[i];} } val=sqrt((8*resa[0])/3); Data Races on variable m. m is a shared variable used by each thread, thus when a thread calculates m, another thread can change it before it used by the first one. This problem can be solved by declaring ‘m’ as a private variable Note that if it was necessary to use m as a shared variable then we could protect it from data races using the omp_critical instruction. This will affect the performance significantly #pragma omp parallel num_threads(4) { int m; int id = omp_get_thread_num(); { for (k=0;k<1000,k++) { #pragma omp critical {m=(i+k) % (i+1);} ACOE401 Shared Memory Architectures

21 Shared Memory Architectures
Answer 6: Problem 4 double resa[4]; double val; int i,k,m; #pragma omp parallel num_threads(4) { int id = omp_get_thread_num(); if (id==0) for(i=0;i<4;i++) resa[i] = 0.0; #pragma omp for schedule(static) for (i=0; i< 1000; i++) { for (k=0;k<1000,k++) { m=(i+k) % (i+1); resa[id]+=sin(m); }} #pragma omp master {for(i=1;i<4;i++) resa[0]+ = resa[i];} } val=sqrt((8*resa[0])/3); Synchronization Error on the calculation of the global sera[i]. The master thread (Thread 0) calculates sera[0] assuming that the rest have finished calculating their sera[id]. It is possible that a thread did not complete before thread 0 calculates the global sera[] This problem can be solved by inserting an omp_barrier instruction before the omp_master instruction, or Allowing each thread initializing its element of the array srea[] if (id==0) for(i=0;i<4;i++) resa[i] = 0.0; #pragma omp barrier; #pragma omp for schedule(static) int id = omp_get_thread_num(); resa[id] = 0.0; #pragma omp for schedule(static) ACOE401 Shared Memory Architectures

22 Shared Memory Architectures
Answer 6: Problem 1 double resa[4]; double val; int i,k,m; #pragma omp parallel num_threads(4) { int id = omp_get_thread_num(); if (id==0) for(i=0;i<4;i++) resa[i] = 0.0; #pragma omp for schedule(static) for (i=0; i< 1000; i++) { for (k=0;k<1000,k++) { m=(i+k) % (i+1); resa[id]+=sin(m); }} #pragma omp master {for(i=1;i<4;i++) resa[0]+ = resa[i];} } val=sqrt((8*resa[0])/3); Synchronization Error on the initialization of sera[i]. Thread 0 initializes array sera[] to 0, and then each threads assumes that it is initially 0 and adds new values in the omp_for loop. It is possible that a thread uses sera[] before it is initializes. This problem can be solved by inserting an omp_barrier instruction before the omp_for loop, or Allowing each thread initializing its element of the array srea[] if (id==0) for(i=0;i<4;i++) resa[i] = 0.0; #pragma omp barrier; #pragma omp for schedule(static) int id = omp_get_thread_num(); resa[id] = 0.0; #pragma omp for schedule(static) ACOE401 Shared Memory Architectures

23 Shared Memory Architectures
Answer 6: Problem 1 double resa[4]; double val; int i,k,m; #pragma omp parallel num_threads(4) { int id = omp_get_thread_num(); if (id==0) for(i=0;i<4;i++) resa[i] = 0.0; #pragma omp for schedule(static) for (i=0; i< 1000; i++) { for (k=0;k<1000,k++) { m=(i+k) % (i+1); resa[id]+=sin(m); }} #pragma omp master {for(i=1;i<4;i++) resa[0]+ = resa[i];} } val=sqrt((8*resa[0])/3); Synchronization Error on the initialization of sera[i]. Thread 0 initializes array sera[] to 0, and then each threads assumes that it is initially 0 and adds new values in the omp_for loop. It is possible that a thread uses sera[] before it is initializes. This problem can be solved by inserting an omp_barrier instruction before the omp_for loop, or Moving the for loop that calculates the global sera[0] outside the parallel region, thus use the implicit barrier of the parallel region resa[id]+=sin(m); }} #pragma omp barrier; #pragma omp master resa[id]+=sin(m); }}} for(i=1;i<4;i++) resa[0]+ = resa[i]; val=sqrt((8*resa[0])/3); ACOE401 Shared Memory Architectures


Download ppt "Synchronization Memory Consistency"

Similar presentations


Ads by Google