Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.

Slides:



Advertisements
Similar presentations
Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Advertisements

Symmetric Multiprocessors: Synchronization and Sequential Consistency.
1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,
Shared Memory Consistency
1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Memory Consistency Models Sarita Adve Department of Computer Science University of Illinois at Urbana-Champaign Ack: Previous tutorials.
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
© Krste Asanovic, 2014CS252, Spring 2014, Lecture 12 CS252 Graduate Computer Architecture Spring 2014 Lecture 12: Synchronization and Memory Models Krste.
Is SC + ILP = RC? Presented by Vamshi Kadaru Chris Gniady, Babak Falsafi, and T. N. VijayKumar - Purdue University Spring 2005: CS 7968 Parallel Computer.
CS492B Analysis of Concurrent Programs Consistency Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.
Consistency Models Based on Tanenbaum/van Steen’s “Distributed Systems”, Ch. 6, section 6.2.
Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.
By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.
Formalisms and Verification for Transactional Memories Vasu Singh EPFL Switzerland.
1 Lecture 7: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Lecture 13: Consistency Models
Computer Architecture II 1 Computer architecture II Lecture 9.
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Memory Consistency Models
1 Lecture 12: Relaxed Consistency Models Topics: sequential consistency recap, relaxing various SC constraints, performance comparison.
Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Meenaktchi Venkatachalam.
1 Lecture 22: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Sunita Marathe.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Evaluation of Memory Consistency Models in Titanium.
Lecture 4. Memory Consistency Models
Shared Memory Consistency Models. Quiz (1)  Let’s define shared memory.
“Shared Memory Consistency Models: A Tutorial” By Sarita Adve, Kourosh Gharachorloo WRL Research Report, 1995 Presentation: Vince Schuster.
Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.
By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.
Anshul Kumar, CSE IITD ECE729 : Advance Computer Architecture Lecture 26: Synchronization, Memory Consistency 25 th March, 2010.
Fence Scoping Changhui Lin †, Vijay Nagarajan*, Rajiv Gupta † † University of California, Riverside * University of Edinburgh.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Release Consistency Yujia Jin 2/27/02. Motivations Place partial order on memory accesses for correct parallel program behavior Relax partial order for.
Memory Consistency Zhonghai Lu Outline Introduction What is a memory consistency model? Who should care? Memory consistency models Strict.
Complexity Implications of Memory Models. Out-of-Order Execution Avoid with fences (and atomic operations) Shared memory processes reordering buffer Hagit.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
CS533 Concepts of Operating Systems Jonathan Walpole.
Fundamentals of Memory Consistency Smruti R. Sarangi Prereq: Slides for Chapter 11 (Multiprocessor Systems), Computer Organisation and Architecture, Smruti.
740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 20: Consistency Models, TM
CS5102 High Performance Computer Systems Memory Consistency
Memory Consistency Models
Lecture 11: Consistency Models
Memory Consistency Models
Relaxed Consistency models and software distributed memory
Threads and Memory Models Hal Perkins Autumn 2011
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Cache Coherence Protocols 15th April, 2006
Shared Memory Consistency Models: A Tutorial
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Introduction to High Performance Computing Lecture 20
Threads and Memory Models Hal Perkins Autumn 2009
Lecture 22: Consistency Models, TM
Background for Debate on Memory Consistency Models
Shared Memory Consistency Models: A Tutorial
Lecture 10: Consistency Models
Memory Consistency Models
Relaxed Consistency Part 2
Programming with Shared Memory Specifying parallelism
Lecture: Consistency Models, TM
Lecture 11: Relaxed Consistency Models
Advanced Operating Systems (CS 202) Memory Consistency and Transactional Memory Feb. 6, 2019.
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 19 Memory Consistency Models Krste Asanovic Electrical Engineering.
Lecture 11: Consistency Models
Presentation transcript:

Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

Goals Expand intuition about concurrent program behavior Explore execution sequences due to compiler or hardware optimizations Introduce shared memory consistency models Explore execution sequences due to a particular memory model Demonstrate Memory Barriers (“fences”)

What happens? Example of a mutual exclusion (“Dekker’s Algorithm”) Global variables initially: Flag1 = 0, Flag2 = 0 Flag1 = 1 If(Flag2 == 0) Critical section Flag2 = 1 If(Flag1 == 0) Critical section P1P2

Uniprocessor Hardware Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 == 0Flag 2 == 0 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0

Uniprocessor Hardware Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 == 1Flag 2 == 0 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T1 P1 Flag1 = 1

Uniprocessor Hardware Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 == 1Flag 2 == 0 T1 Write Flag 1 T2 Read Flag 2 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1

Uniprocessor Hardware Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 == 1Flag 2 == 1 T1 Write Flag 1 T2 Read Flag 2 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T3 P2 Flag2 = 1

Uniprocessor Hardware Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 == 1Flag 2 == 1 T1 Write Flag 1 T2 Read Flag 2 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T4 Read Flag 1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T4 P2 Flag1 == 1 T3 P2 Flag2 = 1

Uniprocessor Hardware Optimizations Buffer (Cache) Writes take about 100 cycles Reads take about 1 cycle Use Write Buffer Bypass

Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0

Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T1 P1 Flag1 = 1

Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == ? T1 P1 Flag1 = 1

Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1

Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T3 P2 Flag2 = 1

Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T4 Read Flag 1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T4 P2 Flag1 == ? T3 P2 Flag2 = 1

Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T4 Read Flag 1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T4 P2 Flag1 == 1 T3 P2 Flag2 = 1

Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 Flag 2 = 1 Shared Bus

Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T1 P1 Flag1 = 1 Flag 2 = 1 Shared Bus

Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == ? T1 P1 Flag1 = 1 Flag 2 = 1 Shared Bus

Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 Flag 2 = 1 Shared Bus

Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T3 P2 Flag2 = 1 Flag 2 = 1 Shared Bus

Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T3 P2 Flag2 = 1 Flag 2 = 1 Shared Bus T4 Read Flag 1 T4 P2 Flag1 == ?

Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T3 P2 Flag2 = 1 Flag 2 = 1 Shared Bus T4 Read Flag 1 T4 P2 Flag1 == 0

Producer Consumer Example of a Producer and Consumer Global variables initially: Data = 0, Head = 0 Data = 2 Head = 1 while(Head == 0); print Data; P1P2

General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 0 Data = 2 Head = 1 P1 T0 Data = 0, Head = 0 P1 Head = 1 P1 Data = 2

General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T0 Data = 0, Head = 0 T1 GI Head = 1 P1 Head = 1 P1 Data = 2

General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T0 Data = 0, Head = 0 T2 P2 Head == 1 T1 GI Head = 1 T2 Read Head = 1 P1 Head = 1 P1 Data = 2

General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T3 Read Data = 0 T0 Data = 0, Head = 0 T2 P2 Head == 1 T1 GI Head = 1 T3 P2 Data == 0 T2 Read Head = 1 P1 Head = 1 P1 Data = 2

General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 2Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T3 Read Data = 0 T0 Data = 0, Head = 0 T2 P2 Head == 1 T1 GI Head = 1 T4 GI Data = 2 T3 P2 Data == 0 T4 Write Data = 2 T2 Read Head = 1 P1 Head = 1 P1 Data = 2

What was expected? Example of a Producer and Consumer Global variables initially: Data = 0, Head = 0 Data = 2 Head = 1 while(Head == 0); print Data; P1P2

Simplify Example and the Operations Simple Program Global variables initially: A = 0, B = 0 A = 1 B = 2 P1 print A print B P2 WX WY RX RY

Reason about possible sequences Expected Output A = 1 B = 2 P1 print A print B P2 WX WY RX RY WX WY RX RY WX RX WY RY WX RX RY WY RX WX RY WY RX WX WY RY RX RY WX WY

Reason about possible sequences. We get them all? A = 1 B = 2 P1 print A print B P2 WX WY RX RY WX WY RX RY WX RX WY RY WX RX RY WY RX WX RY WY RX WX WY RY RX RY WX WY

Similar Reasoning Example of a Producer and Consumer Global variables initially: Data = 0, Head = 0 Data = 2 Head = 1 while(Head == 0);... = Data; P1P2 WX WY RY RX

Reason about possible sequences. Expected Outcomes Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RY RX WX RY WY RX WX RY RX WY RY WX RX WY RY WX WY RX RY RX WX WY 2

Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RY RX WX RY WY RX WX RY RX WY RY WX RX WY RY WX WY RX RY RX WX WY 2 0 Reason about possible sequences. Expected Outcomes

General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 0 Data = 2 Head = 1 P1 T0 Data = 0, Head = 0 P1 Head = 1 P1 Data = 2 WY RY RX WX

General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T0 Data = 0, Head = 0 T1 GI Head = 1 P1 Head = 1 P1 Data = 2 WY RY RX WX

General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T0 Data = 0, Head = 0 T2 P2 Head == 1 T1 GI Head = 1 T2 Read Head = 1 P1 Head = 1 P1 Data = 2 WY RY RX WX

General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T3 Read Data = 0 T0 Data = 0, Head = 0 T2 P2 Head == 1 T1 GI Head = 1 T3 P2 Data == 0 T2 Read Head = 1 P1 Head = 1 P1 Data = 2 WY RY RX WX

General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 2Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T3 Read Data = 0 T0 Data = 0, Head = 0 T2 P2 Head == 1 T1 GI Head = 1 T4 GI Data = 2 T3 P2 Data == 0 T4 Write Data = 2 T2 Read Head = 1 P1 Head = 1 P1 Data = 2 WY RY RX WX 0

Compiler Optimizations Constant Propagation Register Allocation Loop Transformation Instruction Scheduling Common Subexpression elimination Et Cetera

More H/W Optimizations Speculative Execution Execution reordering (e.g. pipelining) Speculative Store Read to Write reordering Write to Read reordering Write to Write reordering Read to Read reordering Et Cetera

Possible Outcomes Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WY RY RX WX WY RX RY WX WY RX WX RY RX WX RY WY RX WY RY WX RX WY WX RY WX WY RY RX WX RY WY RX WX RY RX WY RY WX RX WY RY WX WY RX RY RX WX WY 2

What’s missing? A = 1 B = 2 P1 print A print B P2 WX WY RX RY WX WY RX RY WX RX WY RY WX RX RY WY RX WX RY WY RX WX WY RY RX RY WX WY

Simple Program All possible sequences A = 1 B = 2 P1 print A print B P2 WX WY RX RY WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX

Dekker’s Algorithm Simplify the Operations Example of a mutual exclusion (“Dekker’s Algorithm) Global variables initially: Flag1 = 0, Flag2 = 0 Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX

Dekker’s Algorithm All possible sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Example of a Synchronization (“Dekker’s Algorithm”) Which of these sequences will prevent concurrent execution?

OK WrongOK Wrong OK Wrong Dekker’s Algorithm Sequences and Outcomes WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX

OK WrongOK Wrong OK Wrong Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Need to restrict certain sequences Dekker’s Algorithm Sequences and Outcomes

OK WrongOK Wrong OK Wrong WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Works whenever WX precedes RX or WY precedes RY Dekker’s Algorithm Sequences and Outcomes

Dekker’s Algorithm All possible sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Works whenever WX precedes RX or WY precedes RY 18 are OK      6 are Wrong 

Simple Program All possible sequences A = 1 B = 2 P1 print A print B P2 WX WY RX RY WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX No ordering requirement

Simple Program All possible sequences A = 1 B = 2 P1 print A print B P2 WX WY RX RY WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX No ordering requirement All 24 are “OK”  0 are “Wrong”     

Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Require WX precede RX and WY precede RY and WY precede RX

Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Require WX precede RX and WY precede RY and WY precede RX 5 are OK  19 are Wrong     

Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX When RY precedes WY, while-RY-loop spins. Eventually we get WY < RY. 5 are OK  19 are Wrong     

Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX We have RY, RY, RY … Sequences with RY < WY will eventually end with RY 5 are OK  19 are Wrong?     

Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX We have RY, RY, RY … Sequences with RY < WY will eventually end with RY 5 are OK  19 are Wrong?    

 Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY RY WX RY RX WY RY WX RY WY RX RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RY RX RY WY WX RY RY WX WY RX RY RY WX RX WY RY RY WY WX RX RY RY WY RX WX RY RY RX WX WY RY RY RX WY WX RY We have RY, RY, RY … Sequences with RY < WY will eventually end with RY 5 are OK  19 are Wrong?    

 Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY RY WX RY RX WY RY WX RY WY RX RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RY RX RY WY WX RY RY WX WY RX RY RY WX RX WY RY RY WY WX RX RY RY WY RX WX RY RY RX WX WY RY RY RX WY WX RY We can remove the earlier RY in those sequences. 5 are OK  19 are Wrong?    

 Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX WY RY WX RX WY RY WX WY RX RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX WY RY RX WY RY WX RX WY WX RY RX WX WY RY RX WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX RY Remove all of the duplicated sequences 5 are OK  19 are Wrong?    

Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX WY RY RX WY RY WX RX WY WX RY Remove all of the duplicated sequences 5 are OK  7 are Wrong    

Producer Consumer Possible sequences w/write acknowledge Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX WY RY RX WY RY WX RX WY WX RY Some H/W provides write acknowledgment (i.e. wait for pending writes to complete) 5 are OK  7 are Wrong    

Producer Consumer Possible sequences w/write acknowledge Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX WY RY RX WY RY WX RX WY WX RY Remove all sequences where WY < WX. 5 are OK  7 are Wrong    

Producer Consumer Possible sequences w/write acknowledge Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY RX WX WY RY Remove all sequences where WY < WX. 2 are OK  2 are Wrong   

Review. What does the H/W provide? Reordering of loads and stores – doesn’t help Write acknowledge – almost helps Memory Models

Sequential Consistency Definition: [A multiprocessor system is sequentially consistent if] the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program. [Lamport 1979] Pros Cons Simple view of program OK for Uniprocessor environments Simple view of program OK for Uniprocessor environments Not OK for Multiprocessor environments Too restrictive for processor performance Not OK for Multiprocessor environments Too restrictive for processor performance

Memory Models Relaxed Consistency Description: Relaxed memory consistency models are already implemented on the multiprocessors available. They specify what memory operations may be expected to be reordered by the hardware. Write to Read Write to Write Read to Read / Write Read Others Write Early Read Own Early Write to Read Write to Write Read to Read / Write Read Others Write Early Read Own Early They all have methods to force a particular ordering and these are known as the Safety Net

Available Relaxed Memory Models SYNC PowerPC various MEMBARs RMO MB, WMB Alpha release, acquire, nsync, RMW RCpc release, acquire, nsync, RMW RCsc synchronization WO RMW, STBAR PSO RMW PC RMW TSO serialization instructions IBM 370 Safety Net Read Own Write Early Read Others’ Write Early R  RW Order W  W Order W  R Order Relaxation:

Producer Consumer Relaxed W->R memory model WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Which of these sequences can be expected with all the memory models listed?

Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Require WX precede RX and WY precede RY and WY precede RX

Producer Consumer Possible sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX WY RY RX WY RY WX RX WY WX RY Require WX precede RX and WY precede RY and WY precede RX 5 are OK  7 are Wrong    

Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX WY RY RX WY RY WX RX WY WX RY Start with Sequential Consistency 5 are OK  7 are Wrong     Producer Consumer with sequential consistency

Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RY RX Start with Sequential Consistency 1 is OK  0 are Wrong  

Producer Consumer Relaxed W->R ordering sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RY RX Add sequences due to the relaxation of W->R ordering 1 is OK  0 are Wrong  

Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RY RX No change 1 is OK  0 are Wrong   Producer Consumer Relaxed W->R ordering sequences

Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RY RX Most processors have relaxed w->w orderings also. 1 is OK  0 are Wrong   Producer Consumer Relaxed W->R, and W->W ordering sequences

Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX Started with sequential consistency, then added relaxed w->r and w->w orderings 3 are OK  1 is Wrong  WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX  

Dekker’s Algorithm Relaxed W->R memory model WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Which of these sequences can be expected with all the memory models listed?

Dekker’s Algorithm Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Works whenever WX precedes RX or WY precedes RY 18 are OK      6 are Wrong 

Dekker’s Algorithm Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Start with Sequential Consistency 18 are OK      6 are Wrong 

Dekker’s Algorithm Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Start with Sequential Consistency 6 are OK    0 are Wrong 

Dekker’s Algorithm Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Add sequences due to relaxed memory model 6 are OK    0 are Wrong 

Dekker’s Algorithm Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Add sequences due to relaxed memory model 18 are OK      6 are Wrong 

Safety Nets Atomic instruction (RMW) Code delineation (serialization instructions) Synchronization instructions (SYNC) Identify Data and Synch operations (Weak Ordering model, and Release Consistency model) Memory Bars (aka “fences”)

Producer Consumer w/Fence Insert a memory barrier between the instructions we want ordered. Global variables initially: Data = 0, Head = 0 Data = 2 Head = 1 while(Head == 0);... = Data; P1P2 WX WY RY RX

Producer Consumer w/Fence Example of a Producer and Consumer with a Memory Barrier applied. Global variables initially: Data = 0, Head = 0 Data = 2 memory_barrier Head = 1 while(Head == 0); memory_barrier... = Data; P1P2 WX WY RY RX All memory operations before the memory barrier must complete before proceeding to memory operations after the memory barrier.

Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX Started with sequential consistency, then added relaxed w->r and w->w orderings 3 are OK  1 is Wrong  WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX  

Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX Add memory barriers to force WX < WY and RY < RX 3 are OK  1 is Wrong  WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX  

Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX Looks the same. 3 are OK  1 is Wrong  WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX   Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences

Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX With WX < WY and RY < RX enforced with memory barriers, RX < WX is not possible. 3 are OK  1 is Wrong  WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX   Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences

Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX Due to MB, RY < RX is enforced 2 are OK  1 is Wrong  WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX   Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences

Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX WY < RY < MB. while-RY-loop waits for WY. 3 are OK  1 is Wrong  WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX   Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences

Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX Due to MB, WX < WY is enforced 3 are OK  1 is Wrong  WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX   Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences

Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX WX < WY and WY < RY and RY < RX is enforced therefore WX < RX is enforced 3 are OK  1 is Wrong  WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX   Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences

Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX With WX < WY and RY < RX enforced with memory barriers, RX < WX is not possible. 3 are OK  1 is Wrong  WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX   Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences

Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX With WX < WY and RY < RX enforced with memory barriers, RX < WX is not possible. WX WY RY RX WY WX RY RX WY RY WX RX   3 are OK  0 are Wrong  Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences

Dekker’s Algorithm w/Fence Example of a mutual exclusion (“Dekker’s Algorithm) Global variables initially: Flag1 = 0, Flag2 = 0 Flag1 = 1 memory_barrier If(Flag2 == 0) Critical section Flag2 = 1 memory_barrier If(Flag1 == 0) Critical section P1P2 WX WY RY RX All memory operations before the memory barrier must complete before proceeding to memory operations after the memory barrier.

Dekker’s Algorithm w/Fence Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Started with sequential consistency, then added relaxed w->r orderings 18 are OK      6 are Wrong 

Dekker’s Algorithm w/Fence Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Add memory barriers to force WX < WY and RY < RX 18 are OK      6 are Wrong  Flag1 = 1 memory_barrier If(Flag2 == 0) Critical section P1 Flag2 = 1 memory_barrier If(Flag1 == 0) Critical section P2 WX WY RY RX

Dekker’s Algorithm w/Fence Relaxed W->R ordering sequences Add memory barriers to force WX < RY and WY < RX 6 are OK  0 are Wrong  Flag1 = 1 memory_barrier If(Flag2 == 0) Critical section P1 Flag2 = 1 memory_barrier If(Flag1 == 0) Critical section P2 WX RY WY RX WX WY RX RY WX WY RY RX WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY  

Serialization of Writes (Fig 6) w/Fence Insert a memory barrier between the instructions we want ordered. Global variables initially: A = 0, B = 0, C= 0 A = 1 B = 2 P1 WX WY while(B != 1); while(C != 1); Register1 = A P3 RY RZ A = 2 C = 1 P2 WX WZ while(B != 1); while(C != 1); Register2 = A P4 RY RZ W1W2

Higher Level Abstractions Lower level of complexity Explicit Parallel Constructs – Fortran 90 – MPI

Conclusion The Uniprocessor programming model is simple, but does not work on Multiprocessors Hardware and compilers make many optimizations that reorder loads and stores Memory models exist on the hardware and need to be considered for program correctness The Sequential Consistency model was considered for concurrent programs on the Uniprocessor Relaxed Memory Consistency models are considered on the Multiprocessor because SC is too restrictive for hardware performance. Use memory barriers (fences) to override relaxed memory model when ordering between memory operations must be maintained.

Other Processors