Computer Architecture II 1 Computer architecture II Lecture 9.

Slides:



Advertisements
Similar presentations
Symmetric Multiprocessors: Synchronization and Sequential Consistency.
Advertisements

1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,
Shared Memory Consistency
1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
© Krste Asanovic, 2014CS252, Spring 2014, Lecture 12 CS252 Graduate Computer Architecture Spring 2014 Lecture 12: Synchronization and Memory Models Krste.
Is SC + ILP = RC? Presented by Vamshi Kadaru Chris Gniady, Babak Falsafi, and T. N. VijayKumar - Purdue University Spring 2005: CS 7968 Parallel Computer.
CS492B Analysis of Concurrent Programs Consistency Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.
Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.
Memory consistency models Presented by: Gabriel Tanase.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 7: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Computer architecture II
Lecture 13: Consistency Models
Multiscalar processors
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Memory Consistency Models
1 Lecture 12: Relaxed Consistency Models Topics: sequential consistency recap, relaxing various SC constraints, performance comparison.
Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Meenaktchi Venkatachalam.
1 Lecture 22: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Sunita Marathe.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Evaluation of Memory Consistency Models in Titanium.
Lecture 4. Memory Consistency Models
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
Shared Memory Consistency Models. Quiz (1)  Let’s define shared memory.
Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.
By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.
Anshul Kumar, CSE IITD ECE729 : Advance Computer Architecture Lecture 26: Synchronization, Memory Consistency 25 th March, 2010.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
CS 295 – Memory Models Harry Xu Oct 1, Multi-core Architecture Core-local L1 cache L2 cache shared by cores in a processor All processors share.
Fundamentals of Parallel Computer Architecture - Chapter 71 Chapter 7 Introduction to Shared Memory Multiprocessors Yan Solihin Copyright.
Memory Consistency Zhonghai Lu Outline Introduction What is a memory consistency model? Who should care? Memory consistency models Strict.
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
CS533 Concepts of Operating Systems Jonathan Walpole.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 20: Consistency Models, TM
Lecture 21 Synchronization
Memory Consistency Models
Lecture 11: Consistency Models
Memory Consistency Models
Threads and Memory Models Hal Perkins Autumn 2011
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Shared Memory Consistency Models: A Tutorial
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Bus-Based Coherent Multiprocessors
Threads and Memory Models Hal Perkins Autumn 2009
Multiprocessor Highlights
Lecture 22: Consistency Models, TM
Background for Debate on Memory Consistency Models
Shared Memory Consistency Models: A Tutorial
Lecture 10: Consistency Models
Memory Consistency Models
CSE 153 Design of Operating Systems Winter 19
Relaxed Consistency Part 2
Lecture 24: Multiprocessors
Relaxed Consistency Finale
Programming with Shared Memory Specifying parallelism
Lecture 21: Synchronization & Consistency
Lecture: Consistency Models, TM
Lecture 11: Relaxed Consistency Models
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 19 Memory Consistency Models Krste Asanovic Electrical Engineering.
Lecture 11: Consistency Models
Presentation transcript:

Computer Architecture II 1 Computer architecture II Lecture 9

Computer Architecture II 2 Today: Consistency models –Program order –Difference between coherency and consistency –Sequential consistency –Relaxing sequential consistency

Computer Architecture II 3 Today: Consistency models Program order Difference between coherency and consistency Sequential consistency Relaxing sequential consistency

Computer Architecture II 4 Program order (an example) Order in which instructions appear in source code –May be changed by a compiler –We will assume the order the programmer sees (what you see in the example above, not how the assembly code would look like) Sequential program order – P 1 : 1a->1b – P 2 : 2a->2b Parallel program order: an arbitrary interleaving of sequential orders of P 1 and P 2 –1a->1b->2a->2b –1a->2a->1b->2b –2a->1a->1b->2b –2a->2b->1a->1b P 1 P 2 (1a) A = 1;(2a) print B; (1b) B = 2;(2b) print A;

Computer Architecture II 5 Program order Possible intuitive printings of the program? A compiler or an out-of-order execution on a superscalar processor may reorder 1a and 1b of P 1 as long as they not affect the result of the program on P 1 –This would produce non-intuitive results Now assume that the compiler/superscalar processor does not reorder –P 1 will “see” the results of the writes A=1 and B=2 in the program order –But when will P 2 see the results of the writes A=1 and B=2 ? when will P 2 see the results of the write A=1? –We can say a processor P 1 “sees” the results of write of P 2 or the write operation of P 1 completes with respect to P 2 –Coherence => Writes to one location become visible to all in the same order –But here we have 2 locations! P 1 P 2 (1a) A = 1;(2a) print B; (1b) B = 2;(2b) print A; Initially A=0, B=0

Computer Architecture II 6 Setup for Memory Consistency Coherence => Writes to one location become visible to all in the same order Nothing is said about –when does a write become visible to another processor? Use event synchronization to insure that –Which is the order in which consecutive writes to different locations are seen by other processors P 1 P 2 /*Assume initial value of A is 0*/ A = 1; Barrier Barrier print A;

Computer Architecture II 7 Second Example Intuition not guaranteed by coherence –Refers to one location: return the last value written to A or to flag –Does not say anything about order the modification of A and flag are seen by P 2 Intuitively we expect memory to –respect order between accesses to different locations issued by a given process (1.b seen after 1.a) Conclusion: Coherence is not enough! –pertains only to single location P 1 P 2 /*Assume initial value of A and flag is 0*/ 1.a A = 1;2.a while (flag == 0); /*spin idly*/ 1.b flag = 1;2.b print A;

Computer Architecture II 8 Back to Second Example –What’s the intuition? If 2a prints 2, will 2b print 1? –We need an ordering model for clear semantics across different locations as well so programmers can reason about what results are possible – This is the memory consistency model P 1 P 2 /*Assume initial values of A and B are 0*/ (1a) A = 1;(2a) print B; (1b) B = 2;(2b) print A;

Computer Architecture II 9 Memory Consistency Model Specifies constraints on the order in which memory operations (from any process) can appear to execute with respect to one another –What orders are preserved? –Given a load, which are the possible values returned by it Without it, can’t tell much about an SAS program’s execution Implications for both programmer and system designer –Programmer uses to reason about correctness and possible results –System designer can use to constrain how much accesses can be reordered by compiler or hardware Contract between programmer and system

Computer Architecture II 10 Sequential Consistency Total order achieved by interleaving accesses from different processes –Maintains program order, and memory operations, from all processes, appear to [issue, execute, complete] atomically w.r.t. others –as if there were no caches, and a single memory “A multiprocessor is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program.” [Lamport, 1979]

Computer Architecture II 11 SC Example What matters is order in which operations appear to execute, not the chronological order of events Possible outcomes for (A,B): (0,0), (1,0), (1,2) What about (0,2) ? –program order => 1a->1b and 2a->2b –A = 0 implies 2b->1a, which implies 2a->1b –B = 2 implies 1b->2a, which leads to a contradiction What about 1b->1a->2b->2a ? –appears just like 1a->1b->2a->2b => fine! –execution order 1b->2a->2b->1a is not fine, would produce (0,2) P 1 P 2 /*Assume initial values of A and B are 0*/ (1a) A = 1;(2a) print B; (1b) B = 2;(2b) print A; A=0 B=2

Computer Architecture II 12 Sequential program order – P 1 : 1a->1b – P 2 : 2a->2b Parallel program order: an arbitrary interleaving of sequential orders of P 1 and P 2 –1a->1b->2a->2b –1a->2a->1b->2b –1a->2a->2b->1b –2a->1a->1b->2b –2a->1a->2b->1b –2a->2b->1a->1b –But, 1a->1b->2b->2a is also acceptable for SC! Back to the first example P 1 P 2 (1a) A = 1;(2a) print B; (1b) B = 2;(2b) print A; intuitive

Computer Architecture II 13 Implementing SC Two kinds of requirements –Program order memory operations issued by a process must appear to execute (become visible to others and itself) in program order –Atomicity in the overall hypothetical total order, one memory operation should appear to complete with respect to all processes before the next one is issued guarantees that total order is consistent across processes

Computer Architecture II 14 Summary of Sequential Consistency Maintain order between shared access in each thread –reads or writes wait for previous reads or writes to complete READ WRITE READ WRITEREADWRITE

Computer Architecture II 15 Do we really need SC? SC has strong requirements SC may prevent compiler (code reorganization) and architectural optimizations (out-of-order execution in superscalar) Many programs execute correctly even without “strong” ordering explicit synch operations order key accesses initial: A, B=0 P 1 P 2 A := 1; B := barrier barrier... = A;... = B;

Computer Architecture II 16 Does SC eliminate synchronization? No, still needed –Critical sections ( e.g. insert element into a doubly- linked list) –Barriers (e.g. enforce order on a variable access) –Events (e.g. wait for a condition to become true) only ensures interleaving semantics of individual memory operations

Computer Architecture II 17 Is SC hardware enough? No, Compiler can violate ordering constraints –Register allocation to eliminate memory accesses –Common subexpression elimination –Instruction reordering –Software Pipelining Unfortunately, programming languages and compilers are largely oblivious to memory consistency models P1P2P1P2 B=0A=0r1=0r2=0 A=1B=1A=1B=1 u=Bv=Au=r1v=r2 B=r1A=r2 (u,v)=(0,0) disallowed under SCmay occur here

Computer Architecture II 18 What orderings are essential? Stores to A and B must complete before unlock Loads to A and B must be performed after lock Conclusion: may relax the sequential consistency semantics initial: A, B=0 P 1 P 2 A := 1; B := unlock(L) lock(L)... = A;... = B;

Computer Architecture II 19 Hardware Centric Models Processor Consistency (Goodman 89) Total Store Ordering (Sindhu 90) Partial Store Ordering (Sindhu 90) Causal Memory (Hutto 90) Weak Ordering (Dubois 86) READWRITE READ WRITEREADWRITE READWRITE READ WRITEREADWRITE

Computer Architecture II 20 Relaxing write-to-read (PC, TSO) Why? –Hardware may hide latency of write write-miss in write buffer, later reads hit, maybe even bypass write write to flag not visible until write to A visible PC: non atomic write (write does not complete wrt all other processors) Ex: Sequent Balance, Encore Multimax, vax 8800, SparcCenter, SGI Challenge, Pentium-Pro initial: A, flag, y == 0 P 1 P 2 (a) A = 1;(c) while (flag ==0) {} (b) flag = 1;(d) y = A;

Computer Architecture II 21 Comparing with SC Different results –a, b: same for SC, TSO, PC –c: PC allows A=0 no write atomicity: A=1 may complete wrt P 2 but not wrt P 3 –d: TSO and PC allow A=B=0 (read execute before write) Mechanism for insuring SC semantics: MEMBAR (Sun SPARC V9) –A subsequent read waits until all write complete Initially A,B=0

Computer Architecture II 22 Comparing with SC Different results –a, b: same for SC, TSO, PC –c: PC allows A=0 no write atomicity: A=1 may complete wrt P 2 but not wrt P 3 –d: TSO and PC allow A=B=0 (read execute before write) Mechanism for insuring SC semantics: MEMBAR (Sun SPARC V9) –A subsequent read waits until all write complete Initially A,B=0

Computer Architecture II 23 Comparing with SC Mechanism for insuring SC semantics: MEMBAR (Sun SPARC V9) –A subsequent read waits until all write complete Initially A,B=0 P 1 P 2 /* initially A, B = 0 */ A = 1;B=1, membar; print B;print A;

Computer Architecture II 24 Relaxing write-to-read and write-to-write (PSO) Why? –Bypass multiple write cache missing –Overlap several write operation => good performance But, even example (a) breaks –Use MEMBAR: a subsequent write waits until all previous writes have completed Initially A,B=0

Computer Architecture II 25 Relaxing all orders Retain control and data dependences within each thread Why? –allow multiple overlapping read operations May be bypassed by writes Hyde read latency (for read misses) Two important models –Weak ordering –Release Consistency

Computer Architecture II 26 Weak ordering synchronization operations wait for all previous memory operations to complete arbitrary completion ordering between them : synchronization operation

Computer Architecture II 27 Release consistency Differentiate between synchronization operations –acquire: read operation to gain access to set of operations or variables –release: write operation to grant access to other processors –acquire must complete wrt all processors before following accesses Lock(TaskQ) before newTask->next = Head; …, UnLock(TaskQ) –release must wait until accesses before acquire complete UnLock(TaskQ) waits for Lock(TaskQ), …, Head=newTask->next; : acquire :release

Computer Architecture II 28 Release consistency Intuition: –The programmer inserts acquire/release operations for code that shares variables –acquire has to complete before the following instructions Because the other processes must know a critical section is entered Acquire and code before acquire can be reordered –The code before the release has to complete Because the critical section modifications must become visible to the others Release and code after release can be reordered : acquire :release

Computer Architecture II 29 Preserved Orderings A block contains the instructions of one processor that me be reordered Intuitive results and performance if data races are eliminated through synchronization read/write ° ° ° read/write Synch read/write ° ° ° read/write Synch read/write ° ° ° read/write Weak Ordering read/write ° ° ° read/write Acquire read/write ° ° ° read/write Release read/write ° ° ° read/write Release Consistency