Download presentation
Presentation is loading. Please wait.
1
Synchronization Memory Consistency
Parallel Processing Shared Memory Multiprocessors – Tutorial Synchronization Memory Consistency ACOE401 Shared Memory Architectures 1
2
Shared Memory Architectures
Question 1: The following code runs in a multi-processor system, where Thread1 and Thread2 run on different processors. X, Y, A, B are shared memory values (all initialized to 0), while r1 and r2 are register values. The system employs sequential consistency. Write down a sequence of instruction execution that will store in A and B the values 0, 0 respectively, after the execution of both threads. Write down a sequence of instruction execution that will store in A and B the values 0, 1 respectively, after the execution of both threads. Write down a sequence of instruction execution that will store in A and B the values 2, 1 respectively, after the execution of both threads. Determine the values stored in A and B, after the execution of both threads, if the instruction execution sequence is 1a2a1b2b1c2c1d2d Justify why the values for A and B can never be 2, 0 respectively, after the execution of both threads. Thread 1 Thread 2 1a: move r1,1; /* r1 = 1 */ 1b: store (X),r1; /* X = 1 */ 1c: move r2,2; /* r2 = 2 */ 1d: store (Y),r2; /* Y = 2 */ 2a: load r1,(Y); /* r1 = Y */ 2b: store (A),r1; /* A = Y */ 2c: load r2,(X) /* r2 = X */ 2d: store (B),r2; /* B = X */ ACOE401 Shared Memory Architectures
3
Shared Memory Architectures
Answer 1: Thread 1 Thread 2 1a: move r1,1; /* r1 = 1 */ 1b: store (X),r1; /* X = 1 */ 1c: move r2,2; /* r2 = 2 */ 1d: store (Y),r2; /* Y = 2 */ 2a: load r1,(Y); /* r1 = Y */ 2b: store (A),r1; /* A = Y */ 2c: load r2,(X) /* r2 = X */ 2d: store (B),r2; /* B = X */ Write down a sequence of instruction execution that will store in A and B the values 0, 0 respectively, after the execution of both threads. 2a 2b 2c 2d 1a 1b 1c 1d Write down a sequence of instruction execution that will store in A and B the values 0, 1 respectively, after the execution of both threads. 1a 1b 2a 2b 2c 2d 1c 1d Or 2a 2b 1a 1b 1c 1d 2c 2d Write down a sequence of instruction execution that will store in A and B the values 2, 1 respectively, after the execution of both threads. 1a 1b 1c 1d 2a 2b 2c 2d The result stored in A and B will be 0,1 Justify why the values for A and B can never be 2, 0 respectively, after the execution of both threads. Thread1 sets the value for X first and the value for Y last, while Thread 2 reads first the value of Y and then the value of X. Hence, if A = 2 then B can not be 0 ACOE401 Shared Memory Architectures
4
Shared Memory Architectures
Question 2: The following code runs in a multi-processor system, where Thread1 and Thread2 run on different processors. X is a shared memory value (initialized to 0), while r1 and r2 are register values. The system employs sequential consistency. Write down a sequence of instruction execution that will store in X the value 3, after the execution of both threads. Write down a sequence of instruction execution that will store in X the value 4, after the execution of both threads. Write down a sequence of instruction execution that will store in X the value 6, after the execution of both threads. Add to the code the necessary synchronization instruction (flags) that will ensure the instruction sequence 1a1b2a2b2c1c1d1e1f Thread 1 Thread 2 1a: move r1,1 1b: store (X),r1 1c: move r1,2 1d: load r2,(X) 1e: add r1,r1,r2 1f: store (X),r1 2a: load r1,(X) 2b: add r2,r1,r1 2c: store (X),r2 ACOE401 Shared Memory Architectures
5
Answer 2: Write down a sequence of instruction execution that will store in X the value 3, after the execution of both threads. 2a2b2c1a1b1c1d1e1f Write down a sequence of instruction execution that will store in X the value 4, after the execution of both threads. 1a1b2a2b2c1c1d1e1f Write down a sequence of instruction execution that will store in X the value 6, after the execution of both threads. 1a1b1c1d1e1f2a2b2c Add to the code the necessary synchronization instruction (flags) that will ensure the instruction sequence 1a1b2a2b2c1c1d1e1f Use flag1 and flag2 and assume that both are initially 0 Thread 1 Thread 2 1a: move r1,1 1b: store (X),r1 1x1: move r3,1 1x2: store (flag1),r3 1x3: load r3,(flag2) 1x4: if (r3==0) goto 1x3 1c: move r1,2 1d: load r2,(X) 1e: add r1,r1,r2 1f: store (X),r1 2x1: load r3,(flag1) 2x2: if(r3==0) goto 2x1 2a: load r1,(X) 2b: add r2,r1,r1 2c: store (X),r2 2x3: move r3,1 2x4: store (flag2),r3 ACOE401 Shared Memory Architectures
6
Shared Memory Architectures
Question 3: The following code runs in a multi-processor system, where Thread1 and Thread2 run on different processors. X, Y, W, and Z are shared memory value (initialized to 0), while r1 and r2 are register values. (a) If the system employs sequential consistency, write down all possible values for A and B, after the execution of both threads. Justify your answer. If the systems employs out-of-order execution with no speculative execution, write down a sequence of instruction execution that will store in A and B the values 0,1, after the execution of both threads. (Note: the processor can reorder instructions within branch boundaries, given that data dependencies are maintained) If the systems employs out-of-order execution with speculative execution, write down a sequence of instruction execution that will store in A and B the values 0,0, after the execution of both threads. (Note: the processor can reorder instructions even across branch boundaries, given that data dependencies are maintained) Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: Store (Y),r1 1d: Load r2,(Z) 1e: If (r2==0) goto 1d 1f: Load r3,(W) 1g: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: Store (Z),r1 2d: Load r2,(Y) 2e: If (r2==0) goto 2d 2f: Load r3,(X) 2g: Store (B),r3; ACOE401 Shared Memory Architectures
7
Shared Memory Architectures
Answer 3: (a) If the system employs sequential consistency, write down all possible values for A and B, after the execution of both threads. Justify your answer. 1,1: 1f executes after Z=1 and thus W=1. 2f executes after Y=1 and thus X=1 If the systems employs out-of-order execution with no speculative execution, write down a sequence of instruction execution that will store in A and B the values 1,0, after the execution of both threads. (Note: the processor can reorder instructions within branch boundaries, given that data dependencies are maintained) 1a1c2a2b2c2d2e2f2g1b1d1e1f1g If the systems employs out-of-order execution with speculative execution, write down a sequence of instruction execution that will store in A and B the values 0,0, after the execution of both threads. (Note: the processor can reorder instructions even across branch boundaries, given that data dependencies are maintained) 1f2f1a1b1c2a2b2c2d2e2g1d1e1g Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: Store (Y),r1 1d: Load r2,(Z) 1e: If (r2==0) goto 1d 1f: Load r3,(W) 1g: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: Store (Z),r1 2d: Load r2,(Y) 2e: If (r2==0) goto 2d 2f: Load r3,(X) 2g: Store (B),r3; ACOE401 Shared Memory Architectures
8
Shared Memory Architectures
Question 4: If the code given below runs on a system that employs sequential consistency, specify whether the following instruction sequences are valid. Justify your answer. 2a1a2b1c2c2d2e2f1b2g1d1e1f1g 1a2a2b2c1b1c2d1d1e2e2f2g1f1g 1a1b2a2b2c2d2e2f2g1c1d1e1f1g Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: Store (Y),r1 1d: Load r2,(Z) 1e: If (r2==0) goto 1d 1f: Load r3,(W) 1g: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: Store (Z),r1 2d: Load r2,(Y) 2e: If (r2==0) goto 2d 2f: Load r3,(X) 2g: Store (B),r3; ACOE401 Shared Memory Architectures
9
Shared Memory Architectures
Answer 4: If the code given below runs on a system that employs sequential consistency, specify whether the following instruction sequences are valid. Justify your answer. 2a1a2b1c2c2d2e2f1b2g1d1e1f1g This instruction sequence is invalid because 1c executes before 1b. Out of Order execution is not allowed in sequential consistency. 1a2a2b2c1b1c2d1d1e2e2f2g1f1g This instruction sequence is valid because all instructions are executed in program order and conditional branches are not violated. 1a1b2a2b2c2d2e2f2g1c1d1e1f1g This instruction sequence is invalid because 2e and 2f are executed before 1c, ie the branch in 1e is false before thread 1 sets Y in 1c. Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: Store (Y),r1 1d: Load r2,(Z) 1e: If (r2==0) goto 1d 1f: Load r3,(W) 1g: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: Store (Z),r1 2d: Load r2,(Y) 2e: If (r2==0) goto 2d 2f: Load r3,(X) 2g: Store (B),r3; ACOE401 Shared Memory Architectures
10
Shared Memory Architectures
Question 5: If the code given below runs on a system that employs out-of-order execution with no speculative execution, specify whether the following instruction sequences are valid. Justify your answer. 1a1b1c2b2a2d2c1d2e2f2g1e1f1g 1a1c1b1f2a2c2b2d2e2f2g1d1e1g 1a1c2a2b2c2d2e1b2f2g1d1e1f1g Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: Store (Y),r1 1d: Load r2,(Z) 1e: If (r2==0) goto 1d 1f: Load r3,(W) 1g: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: Store (Z),r1 2d: Load r2,(Y) 2e: If (r2==0) goto 2d 2f: Load r3,(X) 2g: Store (B),r3; ACOE401 Shared Memory Architectures
11
Shared Memory Architectures
Answer 5: If the code given below runs on a system that employs out-of-order execution with no speculative execution, specify whether the following instruction sequences are valid. Justify your answer. 1a1b1c2b2a2d2c1d2e2f2g1e1f1g This instruction sequence is invalid because 2b executes before 2a due to Out of Order execution, however there is a data dependency on r1 between 2a and 2b. 1a1c1b1f2a2c2b2d2e2f2g1d1e1g This instruction sequence is invalid because 1f executes before 1e. Non speculative execution does not allow Out of Order execution beyond branches. 1a1c2a2b2c2d2e1b2f2g1d1e1f1g This instruction sequence is valid because out-of-order execution does not violate data dependencies and branch boundaries. Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: Store (Y),r1 1d: Load r2,(Z) 1e: If (r2==0) goto 1d 1f: Load r3,(W) 1g: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: Store (Z),r1 2d: Load r2,(Y) 2e: If (r2==0) goto 2d 2f: Load r3,(X) 2g: Store (B),r3; ACOE401 Shared Memory Architectures
12
Shared Memory Architectures
Question 6: If the code given below runs on a system that employs out-of-order execution with speculative execution, specify whether the following instruction sequences are valid. Justify your answer. 1a1b1c2b2a2d2c1d2g2e2f1e1f1g 1a1c1b1g2a2c2b2d2e2f1f1d1e2g 1f1a1c2a2b2c2d2e1b2f2g1d1e1g Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: Store (Y),r1 1d: Load r2,(Z) 1e: If (r2==0) goto 1d 1f: Load r3,(W) 1g: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: Store (Z),r1 2d: Load r2,(Y) 2e: If (r2==0) goto 2d 2f: Load r3,(X) 2g: Store (B),r3; ACOE401 Shared Memory Architectures
13
Shared Memory Architectures
Answer 6: If the code given below runs on a system that employs out-of-order execution with speculative execution, specify whether the following instruction sequences are valid. Justify your answer. 1a1b1c2b2a2d2c1d2g2e2f1e1f1g This instruction sequence is invalid because 2g executes before 2f due to Out of Order execution, however there is a data dependency on r3 between 2f and 2g. 1a1c1b1g2a2c2b2d2e2f1f1d1e2g This instruction sequence is invalid because 1g executes before 1f due to Out of Order execution, however there is a data dependency on r3 between 1f and 1g. 1f1a1c2a2b2c2d2e1b2f2g1d1e1g This instruction sequence is valid because out-of-order execution does not violate data dependencies and branch boundaries. Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: Store (Y),r1 1d: Load r2,(Z) 1e: If (r2==0) goto 1d 1f: Load r3,(W) 1g: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: Store (Z),r1 2d: Load r2,(Y) 2e: If (r2==0) goto 2d 2f: Load r3,(X) 2g: Store (B),r3; ACOE401 Shared Memory Architectures
14
Shared Memory Architectures
Question 6new: If the code given below runs on a system that employs out-of-order execution with speculative execution, specify three reasons for which the sequence is invalid (possible reasons data dependencies, ooo, if, speculative) need to work on sequences 1a1b1c2b2a2d2c1d2g2e2f1e1f1g 1a1c1b1g2a2c2b2d2e2f1f1d1e2g 1f1a1c2a2b2c2d2e1b2f2g1d1e1g Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: Store (Y),r1 1d: Load r2,(Z) 1e: If (r2==0) goto 1d 1f: Load r3,(W) 1g: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: Store (Z),r1 2d: Load r2,(Y) 2e: If (r2==0) goto 2d 2f: Load r3,(X) 2g: Store (B),r3; ACOE401 Shared Memory Architectures
15
Shared Memory Architectures
Question 7: If the code given below runs on a system that employs out-of-order execution with speculative execution, specify whether the following instruction sequences are valid. Justify your answer. (a) 1a1b1e1c1d1f …. (b) 1a1c1d1b1e1f …. (c) 2b2a2c2d2e2f …. (d) 2a2b2c2e2d2f …. (e) 3a3b3e3c3d3f …. (f) 3a3b3c3e3d3f …. Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: MembarSS 1d: Store (Y),r1 1e: Load r2,(Z) 1f: If (r2==0) goto 1e 1g: Load r3,(W) 1h: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: MembarLL 2d: Store (Z),r1 2e: Load r2,(Y) 2f: If (r2==0) goto 2e 2g: Load r3,(X) 2h: Store (B),r3; Thread 3 3a: Move r1,1 3b: Store (W),r1 3c: Membar 3d: Store (Z),r1 3e: Load r2,(Y) 3f: If (r2==0) goto 3e 3g: Load r3,(X) 3h: Store (B),r3; ACOE401 Shared Memory Architectures
16
Shared Memory Architectures
Solution 7: If the code given below runs on a system that employs out-of-order execution with speculative execution, specify whether the following instruction sequences are valid. Justify your answer. (a) 1a1b1e1c1d1f … Yes. No data dependencies, no stores across 1c (b) 1a1c1d1b1e1f … No. 1d can not execute before 1b due to 1c (c) 2b2a2c2d2e2f … No. Data dependency between 2a and 2b (d) 2a2b2c2e2d2f … Yes. No data dependencies, no loads across 2c (e) 3a3b3e3c3d3f … Yes. No data dependencies, no Stores/Loads across 3c (f) 3a3e3c3b3d3f … No. 3e can not execute before 3b due to 3c Thread 1 Thread 2 1a: Move r1,1 1b: Store (X),r1 1c: MembarSS 1d: Store (Y),r1 1e: Load r2,(Z) 1f: If (r2==0) goto 1e 1g: Load r3,(W) 1h: Store (A),r3 2a: Move r1,1 2b: Store (W),r1 2c: MembarLL 2d: Store (Z),r1 2e: Load r2,(Y) 2f: If (r2==0) goto 2e 2g: Load r3,(X) 2h: Store (B),r3; Thread 3 3a: Move r1,1 3b: Store (W),r1 3c: Membar 3d: Store (Z),r1 3e: Load r2,(Y) 3f: If (r2==0) goto 3e 3g: Load r3,(X) 3h: Store (B),r3; ACOE401 Shared Memory Architectures
17
Shared Memory Architectures
Question 6: For the OpenMP program shown below, there are at least four reasons that could lead to wrong results when running the program with 4 threads. Identify each reason and suggest a change in the code that will correct the result. Sequential Code OpenMP Code int, i, k, m; double res=0.0; double val; for (i=0; i< 1000; i++) for (k = 0, k<1000; k++) { m=(i+k) % (i+1); res+=sin(m); } val=sqrt((8*res)/3); double resa[4]; double val; int i,k,m; #pragma omp parallel num_threads(4) { int id = omp_get_thread_num(); if (id==0) for(i=0;i<4;i++) resa[i] = 0.0; #pragma omp for schedule(static) for (i=0; i< 1000; i++) { for (k=0;k<1000,k++) { m=(i+k) % (i+1); resa[id]+=sin(m); }} #pragma omp master {for(i=1;i<4;i++) resa[0]+ = resa[i];} } val=sqrt((8*resa[0])/3); ACOE401 Shared Memory Architectures
18
Shared Memory Architectures
Answer 6: Problem 1 double resa[4]; double val; int i,k,m; #pragma omp parallel num_threads(4) { int id = omp_get_thread_num(); if (id==0) for(i=0;i<4;i++) resa[i] = 0.0; #pragma omp for schedule(static) for (i=0; i< 1000; i++) { for (k=0;k<1000,k++) { m=(i+k) % (i+1); resa[id]+=sin(m); }} #pragma omp master {for(i=1;i<4;i++) resa[0]+ = resa[i];} } val=sqrt((8*resa[0])/3); Synchronization Error on the initialization of sera[i]. Thread 0 initializes array sera[] to 0, and then each threads assumes that it is initially 0 and adds new values in the omp_for loop. It is possible that a thread uses sera[] before it is initializes. This problem can be solved by inserting an omp_barrier instruction before the omp_for loop, or Allowing each thread initializing its element of the array srea[] if (id==0) for(i=0;i<4;i++) resa[i] = 0.0; #pragma omp barrier; #pragma omp for schedule(static) int id = omp_get_thread_num(); resa[id] = 0.0; #pragma omp for schedule(static) ACOE401 Shared Memory Architectures
19
Shared Memory Architectures
Answer 6: Problem 2 double resa[4]; double val; int i,k,m; #pragma omp parallel num_threads(4) { int id = omp_get_thread_num(); if (id==0) for(i=0;i<4;i++) resa[i] = 0.0; #pragma omp for schedule(static) for (i=0; i< 1000; i++) { for (k=0;k<1000,k++) { m=(i+k) % (i+1); resa[id]+=sin(m); }} #pragma omp master {for(i=1;i<4;i++) resa[0]+ = resa[i];} } val=sqrt((8*resa[0])/3); Data Races on variable k. k is a shared variable used by each thread as a loop iteration index. Thus when thread 0 attempts to increment k, it is possible that its value will be changed to 300 by thread 1. This will result in a wrong number of loop iteration and a a wrong result due to the wrong values of k used to determine m This problem can be solved by declaring ‘k’ as a private variable Note that there is no problem for variable ‘i’ since the loop iteration variable used by the omp-for loop is always by default private { for (int k=0;k<1000,k++) ACOE401 Shared Memory Architectures
20
Shared Memory Architectures
Answer 6: Problem 3 double resa[4]; double val; int i,k,m; #pragma omp parallel num_threads(4) { int id = omp_get_thread_num(); if (id==0) for(i=0;i<4;i++) resa[i] = 0.0; #pragma omp for schedule(static) for (i=0; i< 1000; i++) { for (k=0;k<1000,k++) { m=(i+k) % (i+1); resa[id]+=sin(m); }} #pragma omp master {for(i=1;i<4;i++) resa[0]+ = resa[i];} } val=sqrt((8*resa[0])/3); Data Races on variable m. m is a shared variable used by each thread, thus when a thread calculates m, another thread can change it before it used by the first one. This problem can be solved by declaring ‘m’ as a private variable Note that if it was necessary to use m as a shared variable then we could protect it from data races using the omp_critical instruction. This will affect the performance significantly #pragma omp parallel num_threads(4) { int m; int id = omp_get_thread_num(); { for (k=0;k<1000,k++) { #pragma omp critical {m=(i+k) % (i+1);} ACOE401 Shared Memory Architectures
21
Shared Memory Architectures
Answer 6: Problem 4 double resa[4]; double val; int i,k,m; #pragma omp parallel num_threads(4) { int id = omp_get_thread_num(); if (id==0) for(i=0;i<4;i++) resa[i] = 0.0; #pragma omp for schedule(static) for (i=0; i< 1000; i++) { for (k=0;k<1000,k++) { m=(i+k) % (i+1); resa[id]+=sin(m); }} #pragma omp master {for(i=1;i<4;i++) resa[0]+ = resa[i];} } val=sqrt((8*resa[0])/3); Synchronization Error on the calculation of the global sera[i]. The master thread (Thread 0) calculates sera[0] assuming that the rest have finished calculating their sera[id]. It is possible that a thread did not complete before thread 0 calculates the global sera[] This problem can be solved by inserting an omp_barrier instruction before the omp_master instruction, or Allowing each thread initializing its element of the array srea[] if (id==0) for(i=0;i<4;i++) resa[i] = 0.0; #pragma omp barrier; #pragma omp for schedule(static) int id = omp_get_thread_num(); resa[id] = 0.0; #pragma omp for schedule(static) ACOE401 Shared Memory Architectures
22
Shared Memory Architectures
Answer 6: Problem 1 double resa[4]; double val; int i,k,m; #pragma omp parallel num_threads(4) { int id = omp_get_thread_num(); if (id==0) for(i=0;i<4;i++) resa[i] = 0.0; #pragma omp for schedule(static) for (i=0; i< 1000; i++) { for (k=0;k<1000,k++) { m=(i+k) % (i+1); resa[id]+=sin(m); }} #pragma omp master {for(i=1;i<4;i++) resa[0]+ = resa[i];} } val=sqrt((8*resa[0])/3); Synchronization Error on the initialization of sera[i]. Thread 0 initializes array sera[] to 0, and then each threads assumes that it is initially 0 and adds new values in the omp_for loop. It is possible that a thread uses sera[] before it is initializes. This problem can be solved by inserting an omp_barrier instruction before the omp_for loop, or Allowing each thread initializing its element of the array srea[] if (id==0) for(i=0;i<4;i++) resa[i] = 0.0; #pragma omp barrier; #pragma omp for schedule(static) int id = omp_get_thread_num(); resa[id] = 0.0; #pragma omp for schedule(static) ACOE401 Shared Memory Architectures
23
Shared Memory Architectures
Answer 6: Problem 1 double resa[4]; double val; int i,k,m; #pragma omp parallel num_threads(4) { int id = omp_get_thread_num(); if (id==0) for(i=0;i<4;i++) resa[i] = 0.0; #pragma omp for schedule(static) for (i=0; i< 1000; i++) { for (k=0;k<1000,k++) { m=(i+k) % (i+1); resa[id]+=sin(m); }} #pragma omp master {for(i=1;i<4;i++) resa[0]+ = resa[i];} } val=sqrt((8*resa[0])/3); Synchronization Error on the initialization of sera[i]. Thread 0 initializes array sera[] to 0, and then each threads assumes that it is initially 0 and adds new values in the omp_for loop. It is possible that a thread uses sera[] before it is initializes. This problem can be solved by inserting an omp_barrier instruction before the omp_for loop, or Moving the for loop that calculates the global sera[0] outside the parallel region, thus use the implicit barrier of the parallel region resa[id]+=sin(m); }} #pragma omp barrier; #pragma omp master resa[id]+=sin(m); }}} for(i=1;i<4;i++) resa[0]+ = resa[i]; val=sqrt((8*resa[0])/3); ACOE401 Shared Memory Architectures
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.