Download presentation
Presentation is loading. Please wait.
Published byGavin Gilbert Modified over 9 years ago
1
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995
2
Goals Expand intuition about concurrent program behavior Explore execution sequences due to compiler or hardware optimizations Introduce shared memory consistency models Explore execution sequences due to a particular memory model Demonstrate Memory Barriers (“fences”)
3
What happens? Example of a mutual exclusion (“Dekker’s Algorithm”) Global variables initially: Flag1 = 0, Flag2 = 0 Flag1 = 1 If(Flag2 == 0) Critical section Flag2 = 1 If(Flag1 == 0) Critical section P1P2
4
Uniprocessor Hardware Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 == 0Flag 2 == 0 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0
5
Uniprocessor Hardware Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 == 1Flag 2 == 0 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T1 P1 Flag1 = 1
6
Uniprocessor Hardware Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 == 1Flag 2 == 0 T1 Write Flag 1 T2 Read Flag 2 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1
7
Uniprocessor Hardware Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 == 1Flag 2 == 1 T1 Write Flag 1 T2 Read Flag 2 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T3 P2 Flag2 = 1
8
Uniprocessor Hardware Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 == 1Flag 2 == 1 T1 Write Flag 1 T2 Read Flag 2 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T4 Read Flag 1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T4 P2 Flag1 == 1 T3 P2 Flag2 = 1
9
Uniprocessor Hardware Optimizations Buffer (Cache) Writes take about 100 cycles Reads take about 1 cycle Use Write Buffer Bypass
10
Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0
11
Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T1 P1 Flag1 = 1
12
Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == ? T1 P1 Flag1 = 1
13
Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1
14
Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T3 P2 Flag2 = 1
15
Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T4 Read Flag 1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T4 P2 Flag1 == ? T3 P2 Flag2 = 1
16
Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T4 Read Flag 1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T4 P2 Flag1 == 1 T3 P2 Flag2 = 1
17
Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 Flag 2 = 1 Shared Bus
18
Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T1 P1 Flag1 = 1 Flag 2 = 1 Shared Bus
19
Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == ? T1 P1 Flag1 = 1 Flag 2 = 1 Shared Bus
20
Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 Flag 2 = 1 Shared Bus
21
Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T3 P2 Flag2 = 1 Flag 2 = 1 Shared Bus
22
Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T3 P2 Flag2 = 1 Flag 2 = 1 Shared Bus T4 Read Flag 1 T4 P2 Flag1 == ?
23
Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T3 P2 Flag2 = 1 Flag 2 = 1 Shared Bus T4 Read Flag 1 T4 P2 Flag1 == 0
24
Producer Consumer Example of a Producer and Consumer Global variables initially: Data = 0, Head = 0 Data = 2 Head = 1 while(Head == 0); print Data; P1P2
25
General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 0 Data = 2 Head = 1 P1 T0 Data = 0, Head = 0 P1 Head = 1 P1 Data = 2
26
General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T0 Data = 0, Head = 0 T1 GI Head = 1 P1 Head = 1 P1 Data = 2
27
General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T0 Data = 0, Head = 0 T2 P2 Head == 1 T1 GI Head = 1 T2 Read Head = 1 P1 Head = 1 P1 Data = 2
28
General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T3 Read Data = 0 T0 Data = 0, Head = 0 T2 P2 Head == 1 T1 GI Head = 1 T3 P2 Data == 0 T2 Read Head = 1 P1 Head = 1 P1 Data = 2
29
General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 2Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T3 Read Data = 0 T0 Data = 0, Head = 0 T2 P2 Head == 1 T1 GI Head = 1 T4 GI Data = 2 T3 P2 Data == 0 T4 Write Data = 2 T2 Read Head = 1 P1 Head = 1 P1 Data = 2
30
What was expected? Example of a Producer and Consumer Global variables initially: Data = 0, Head = 0 Data = 2 Head = 1 while(Head == 0); print Data; P1P2
31
Simplify Example and the Operations Simple Program Global variables initially: A = 0, B = 0 A = 1 B = 2 P1 print A print B P2 WX WY RX RY
32
Reason about possible sequences Expected Output A = 1 B = 2 P1 print A print B P2 WX WY RX RY WX WY RX RY WX RX WY RY WX RX RY WY RX WX RY WY RX WX WY RY RX RY WX WY 1212 1212 1010 0202 0000 0000
33
Reason about possible sequences. We get them all? A = 1 B = 2 P1 print A print B P2 WX WY RX RY WX WY RX RY WX RX WY RY WX RX RY WY RX WX RY WY RX WX WY RY RX RY WX WY
34
Similar Reasoning Example of a Producer and Consumer Global variables initially: Data = 0, Head = 0 Data = 2 Head = 1 while(Head == 0);... = Data; P1P2 WX WY RY RX
35
Reason about possible sequences. Expected Outcomes Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RY RX WX RY WY RX WX RY RX WY RY WX RX WY RY WX WY RX RY RX WX WY 2
36
Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RY RX WX RY WY RX WX RY RX WY RY WX RX WY RY WX WY RX RY RX WX WY 2 0 Reason about possible sequences. Expected Outcomes
37
General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 0 Data = 2 Head = 1 P1 T0 Data = 0, Head = 0 P1 Head = 1 P1 Data = 2 WY RY RX WX
38
General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T0 Data = 0, Head = 0 T1 GI Head = 1 P1 Head = 1 P1 Data = 2 WY RY RX WX
39
General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T0 Data = 0, Head = 0 T2 P2 Head == 1 T1 GI Head = 1 T2 Read Head = 1 P1 Head = 1 P1 Data = 2 WY RY RX WX
40
General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T3 Read Data = 0 T0 Data = 0, Head = 0 T2 P2 Head == 1 T1 GI Head = 1 T3 P2 Data == 0 T2 Read Head = 1 P1 Head = 1 P1 Data = 2 WY RY RX WX
41
General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 2Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T3 Read Data = 0 T0 Data = 0, Head = 0 T2 P2 Head == 1 T1 GI Head = 1 T4 GI Data = 2 T3 P2 Data == 0 T4 Write Data = 2 T2 Read Head = 1 P1 Head = 1 P1 Data = 2 WY RY RX WX 0
42
Compiler Optimizations Constant Propagation Register Allocation Loop Transformation Instruction Scheduling Common Subexpression elimination Et Cetera
43
More H/W Optimizations Speculative Execution Execution reordering (e.g. pipelining) Speculative Store Read to Write reordering Write to Read reordering Write to Write reordering Read to Read reordering Et Cetera
44
Possible Outcomes Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WY RY RX WX WY RX RY WX WY RX WX RY RX WX RY WY RX WY RY WX RX WY WX RY 000 000 WX WY RY RX WX RY WY RX WX RY RX WY RY WX RX WY RY WX WY RX RY RX WX WY 2
45
What’s missing? A = 1 B = 2 P1 print A print B P2 WX WY RX RY WX WY RX RY WX RX WY RY WX RX RY WY RX WX RY WY RX WX WY RY RX RY WX WY
46
Simple Program All possible sequences A = 1 B = 2 P1 print A print B P2 WX WY RX RY WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX
47
Dekker’s Algorithm Simplify the Operations Example of a mutual exclusion (“Dekker’s Algorithm) Global variables initially: Flag1 = 0, Flag2 = 0 Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX
48
Dekker’s Algorithm All possible sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Example of a Synchronization (“Dekker’s Algorithm”) Which of these sequences will prevent concurrent execution?
49
OK WrongOK Wrong OK Wrong Dekker’s Algorithm Sequences and Outcomes WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX
50
OK WrongOK Wrong OK Wrong Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Need to restrict certain sequences Dekker’s Algorithm Sequences and Outcomes
51
OK WrongOK Wrong OK Wrong WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Works whenever WX precedes RX or WY precedes RY Dekker’s Algorithm Sequences and Outcomes
52
Dekker’s Algorithm All possible sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Works whenever WX precedes RX or WY precedes RY 18 are OK 6 are Wrong
53
Simple Program All possible sequences A = 1 B = 2 P1 print A print B P2 WX WY RX RY WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX No ordering requirement
54
Simple Program All possible sequences A = 1 B = 2 P1 print A print B P2 WX WY RX RY WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX No ordering requirement All 24 are “OK” 0 are “Wrong”
55
Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Require WX precede RX and WY precede RY and WY precede RX
56
Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Require WX precede RX and WY precede RY and WY precede RX 5 are OK 19 are Wrong
57
Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX When RY precedes WY, while-RY-loop spins. Eventually we get WY < RY. 5 are OK 19 are Wrong
58
Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX We have RY, RY, RY … Sequences with RY < WY will eventually end with RY 5 are OK 19 are Wrong?
59
Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX We have RY, RY, RY … Sequences with RY < WY will eventually end with RY 5 are OK 19 are Wrong?
60
Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY RY WX RY RX WY RY WX RY WY RX RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RY RX RY WY WX RY RY WX WY RX RY RY WX RX WY RY RY WY WX RX RY RY WY RX WX RY RY RX WX WY RY RY RX WY WX RY We have RY, RY, RY … Sequences with RY < WY will eventually end with RY 5 are OK 19 are Wrong?
61
Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY RY WX RY RX WY RY WX RY WY RX RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RY RX RY WY WX RY RY WX WY RX RY RY WX RX WY RY RY WY WX RX RY RY WY RX WX RY RY RX WX WY RY RY RX WY WX RY We can remove the earlier RY in those sequences. 5 are OK 19 are Wrong?
62
Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX WY RY WX RX WY RY WX WY RX RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX WY RY RX WY RY WX RX WY WX RY RX WX WY RY RX WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX RY Remove all of the duplicated sequences 5 are OK 19 are Wrong?
63
Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX WY RY RX WY RY WX RX WY WX RY Remove all of the duplicated sequences 5 are OK 7 are Wrong
64
Producer Consumer Possible sequences w/write acknowledge Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX WY RY RX WY RY WX RX WY WX RY Some H/W provides write acknowledgment (i.e. wait for pending writes to complete) 5 are OK 7 are Wrong
65
Producer Consumer Possible sequences w/write acknowledge Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX WY RY RX WY RY WX RX WY WX RY Remove all sequences where WY < WX. 5 are OK 7 are Wrong
66
Producer Consumer Possible sequences w/write acknowledge Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY RX WX WY RY Remove all sequences where WY < WX. 2 are OK 2 are Wrong
67
Review. What does the H/W provide? Reordering of loads and stores – doesn’t help Write acknowledge – almost helps Memory Models
68
Sequential Consistency Definition: [A multiprocessor system is sequentially consistent if] the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program. [Lamport 1979] Pros Cons Simple view of program OK for Uniprocessor environments Simple view of program OK for Uniprocessor environments Not OK for Multiprocessor environments Too restrictive for processor performance Not OK for Multiprocessor environments Too restrictive for processor performance
69
Memory Models Relaxed Consistency Description: Relaxed memory consistency models are already implemented on the multiprocessors available. They specify what memory operations may be expected to be reordered by the hardware. Write to Read Write to Write Read to Read / Write Read Others Write Early Read Own Early Write to Read Write to Write Read to Read / Write Read Others Write Early Read Own Early They all have methods to force a particular ordering and these are known as the Safety Net
70
Available Relaxed Memory Models SYNC PowerPC various MEMBARs RMO MB, WMB Alpha release, acquire, nsync, RMW RCpc release, acquire, nsync, RMW RCsc synchronization WO RMW, STBAR PSO RMW PC RMW TSO serialization instructions IBM 370 Safety Net Read Own Write Early Read Others’ Write Early R RW Order W W Order W R Order Relaxation:
71
Producer Consumer Relaxed W->R memory model WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Which of these sequences can be expected with all the memory models listed?
72
Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Require WX precede RX and WY precede RY and WY precede RX
73
Producer Consumer Possible sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX WY RY RX WY RY WX RX WY WX RY Require WX precede RX and WY precede RY and WY precede RX 5 are OK 7 are Wrong
74
Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX WY RY RX WY RY WX RX WY WX RY Start with Sequential Consistency 5 are OK 7 are Wrong Producer Consumer with sequential consistency
75
Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RY RX Start with Sequential Consistency 1 is OK 0 are Wrong
76
Producer Consumer Relaxed W->R ordering sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RY RX Add sequences due to the relaxation of W->R ordering 1 is OK 0 are Wrong
77
Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RY RX No change 1 is OK 0 are Wrong Producer Consumer Relaxed W->R ordering sequences
78
Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RY RX Most processors have relaxed w->w orderings also. 1 is OK 0 are Wrong Producer Consumer Relaxed W->R, and W->W ordering sequences
79
Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX Started with sequential consistency, then added relaxed w->r and w->w orderings 3 are OK 1 is Wrong WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX
80
Dekker’s Algorithm Relaxed W->R memory model WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Which of these sequences can be expected with all the memory models listed?
81
Dekker’s Algorithm Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Works whenever WX precedes RX or WY precedes RY 18 are OK 6 are Wrong
82
Dekker’s Algorithm Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Start with Sequential Consistency 18 are OK 6 are Wrong
83
Dekker’s Algorithm Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Start with Sequential Consistency 6 are OK 0 are Wrong
84
Dekker’s Algorithm Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Add sequences due to relaxed memory model 6 are OK 0 are Wrong
85
Dekker’s Algorithm Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Add sequences due to relaxed memory model 18 are OK 6 are Wrong
86
Safety Nets Atomic instruction (RMW) Code delineation (serialization instructions) Synchronization instructions (SYNC) Identify Data and Synch operations (Weak Ordering model, and Release Consistency model) Memory Bars (aka “fences”)
87
Producer Consumer w/Fence Insert a memory barrier between the instructions we want ordered. Global variables initially: Data = 0, Head = 0 Data = 2 Head = 1 while(Head == 0);... = Data; P1P2 WX WY RY RX
88
Producer Consumer w/Fence Example of a Producer and Consumer with a Memory Barrier applied. Global variables initially: Data = 0, Head = 0 Data = 2 memory_barrier Head = 1 while(Head == 0); memory_barrier... = Data; P1P2 WX WY RY RX All memory operations before the memory barrier must complete before proceeding to memory operations after the memory barrier.
89
Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX Started with sequential consistency, then added relaxed w->r and w->w orderings 3 are OK 1 is Wrong WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX
90
Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX Add memory barriers to force WX < WY and RY < RX 3 are OK 1 is Wrong WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX
91
Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX Looks the same. 3 are OK 1 is Wrong WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences
92
Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX With WX < WY and RY < RX enforced with memory barriers, RX < WX is not possible. 3 are OK 1 is Wrong WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences
93
Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX Due to MB, RY < RX is enforced 2 are OK 1 is Wrong WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences
94
Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX WY < RY < MB. while-RY-loop waits for WY. 3 are OK 1 is Wrong WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences
95
Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX Due to MB, WX < WY is enforced 3 are OK 1 is Wrong WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences
96
Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX WX < WY and WY < RY and RY < RX is enforced therefore WX < RX is enforced 3 are OK 1 is Wrong WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences
97
Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX With WX < WY and RY < RX enforced with memory barriers, RX < WX is not possible. 3 are OK 1 is Wrong WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences
98
Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX With WX < WY and RY < RX enforced with memory barriers, RX < WX is not possible. WX WY RY RX WY WX RY RX WY RY WX RX 3 are OK 0 are Wrong Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences
99
Dekker’s Algorithm w/Fence Example of a mutual exclusion (“Dekker’s Algorithm) Global variables initially: Flag1 = 0, Flag2 = 0 Flag1 = 1 memory_barrier If(Flag2 == 0) Critical section Flag2 = 1 memory_barrier If(Flag1 == 0) Critical section P1P2 WX WY RY RX All memory operations before the memory barrier must complete before proceeding to memory operations after the memory barrier.
100
Dekker’s Algorithm w/Fence Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Started with sequential consistency, then added relaxed w->r orderings 18 are OK 6 are Wrong
101
Dekker’s Algorithm w/Fence Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Add memory barriers to force WX < WY and RY < RX 18 are OK 6 are Wrong Flag1 = 1 memory_barrier If(Flag2 == 0) Critical section P1 Flag2 = 1 memory_barrier If(Flag1 == 0) Critical section P2 WX WY RY RX
102
Dekker’s Algorithm w/Fence Relaxed W->R ordering sequences Add memory barriers to force WX < RY and WY < RX 6 are OK 0 are Wrong Flag1 = 1 memory_barrier If(Flag2 == 0) Critical section P1 Flag2 = 1 memory_barrier If(Flag1 == 0) Critical section P2 WX RY WY RX WX WY RX RY WX WY RY RX WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY
103
Serialization of Writes (Fig 6) w/Fence Insert a memory barrier between the instructions we want ordered. Global variables initially: A = 0, B = 0, C= 0 A = 1 B = 2 P1 WX WY while(B != 1); while(C != 1); Register1 = A P3 RY RZ A = 2 C = 1 P2 WX WZ while(B != 1); while(C != 1); Register2 = A P4 RY RZ W1W2
104
Higher Level Abstractions Lower level of complexity Explicit Parallel Constructs – Fortran 90 – MPI
105
Conclusion The Uniprocessor programming model is simple, but does not work on Multiprocessors Hardware and compilers make many optimizations that reorder loads and stores Memory models exist on the hardware and need to be considered for program correctness The Sequential Consistency model was considered for concurrent programs on the Uniprocessor Relaxed Memory Consistency models are considered on the Multiprocessor because SC is too restrictive for hardware performance. Use memory barriers (fences) to override relaxed memory model when ordering between memory operations must be maintained.
106
Other Processors
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.