Symmetric Multiprocessors: Synchronization and Sequential Consistency

Slides:



Advertisements
Similar presentations
Cache Coherence. Memory Consistency in SMPs Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has.
Advertisements

Symmetric Multiprocessors: Synchronization and Sequential Consistency.
1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,
1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
4/4/2013 CS152, Spring 2013 CS 152 Computer Architecture and Engineering Lecture 17: Synchronization and Sequential Consistency Krste Asanovic Electrical.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
© Krste Asanovic, 2014CS252, Spring 2014, Lecture 12 CS252 Graduate Computer Architecture Spring 2014 Lecture 12: Synchronization and Memory Models Krste.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
CS492B Analysis of Concurrent Programs Consistency Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Synchronization and Consistency II Steve Ko Computer Sciences and Engineering University at.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Synchronization and Consistency I Steve Ko Computer Sciences and Engineering University at Buffalo.
CS 152 Computer Architecture and Engineering Lecture 19: Synchronization and Sequential Consistency Krste Asanovic Electrical Engineering and Computer.
1 Lecture 7: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Lecture 13: Consistency Models
Computer Architecture II 1 Computer architecture II Lecture 9.
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
CS 152 Computer Architecture and Engineering Lecture 19: Synchronization and Sequential Consistency Krste Asanovic Electrical Engineering and Computer.
April 13, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 18: Snoopy Caches Krste Asanovic Electrical Engineering and Computer.
April 8, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 19: Synchronization and Sequential Consistency Krste Asanovic Electrical.
CS 152 Computer Architecture and Engineering Lecture 20: Snoopy Caches Krste Asanovic Electrical Engineering and Computer Sciences University of California,
April 4, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 17: Synchronization and Sequential Consistency Krste Asanovic Electrical.
1 Lecture 22: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
April 15, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 20: Snoopy Caches Krste Asanovic Electrical Engineering and Computer.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
4/13/2016 CS152, Spring 2016 CS 152 Computer Architecture and Engineering Lecture 18: Snoopy Caches Dr. George Michelogiannakis EECS, University of California.
Multiprocessors – Locks
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 20: Consistency Models, TM
Outline Introduction Centralized shared-memory architectures (Sec. 5.2) Distributed shared-memory and directory-based coherence (Sec. 5.4) Synchronization:
CS 152 Computer Architecture and Engineering Lecture 18: Snoopy Caches
Background on the need for Synchronization
Memory Consistency Models
Atomic Operations in Hardware
The University of Adelaide, School of Computer Science
Lecture 11: Consistency Models
Memory Consistency Models
Lecture 18: Coherence and Synchronization
CS 252 Graduate Computer Architecture Lecture 10: Multiprocessors
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Krste Asanovic Electrical Engineering and Computer Sciences
Krste Asanovic Electrical Engineering and Computer Sciences
Shared Memory Consistency Models: A Tutorial
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 15 Multi-core Chips
CC protocol for blocking caches
Lecture 21: Synchronization and Consistency
Dr. George Michelogiannakis EECS, University of California at Berkeley
Lecture: Coherence and Synchronization
Concurrency: Mutual Exclusion and Process Synchronization
Lecture 10: Consistency Models
Memory Consistency Models
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
CSE 153 Design of Operating Systems Winter 19
CS 152 Computer Architecture and Engineering Lecture 20: Snoopy Caches
CS333 Intro to Operating Systems
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture: Coherence and Synchronization
Lecture 19: Coherence and Synchronization
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 22 Synchronization Krste Asanovic Electrical Engineering and.
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 19 Memory Consistency Models Krste Asanovic Electrical Engineering.
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 11: Consistency Models
Presentation transcript:

Symmetric Multiprocessors: Synchronization and Sequential Consistency Constructive Computer Architecture Symmetric Multiprocessors: Synchronization and Sequential Consistency Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 23, 2015 http://www.csg.csail.mit.edu/6.175

Symmetric Multiprocessors Memory I/O controller Graphics output CPU-Memory bus bridge Processor I/O bus Networks Symmetric? All memory is equally far away from all processors Any processor can do any I/O operation November 23, 2015 http://www.csg.csail.mit.edu/6.175

Synchronization needed even in single-processor systems The need for synchronization arises whenever there are parallel processes in a system fork join P1 P2 Forks and Joins: A parallel process may want to wait until several events have occurred Producer-Consumer: A consumer process must wait until the producer process has produced data Mutual Exclusion: Operating system has to ensure that a resource is used by only one process at a given time producer consumer November 23, 2015 http://www.csg.csail.mit.edu/6.175

A Producer-Consumer Example tail head Rtail Rhead R Producer posting Item x: Load Rtail, tail Store (Rtail), x Rtail=Rtail+1 Store tail, Rtail Consumer: Load Rhead, head spin: Load Rtail, tail if Rhead==Rtail goto spin Load R, (Rhead) Rhead=Rhead+1 Store head, Rhead process(R) The program is written assuming instructions are executed in order. Problems? November 23, 2015 http://www.csg.csail.mit.edu/6.175

A Producer-Consumer Example continued Producer posting Item x: Load Rtail, (tail) Store (Rtail), x Rtail=Rtail+1 Store tail, Rtail Consumer: Load Rhead, head spin: Load Rtail, tail if Rhead==Rtail goto spin Load R, (Rhead) Rhead=Rhead+1 Store head, Rhead process(R) 1 3 2 4 Can the tail pointer get updated before the item x is stored? Programmer assumes that if 3 happens after 2, then 4 happens after 1. Problem sequences are: 2, 3, 4, 1 4, 1, 2, 3 Subtle question – what does ‘store completion’ really mean? November 23, 2015 http://www.csg.csail.mit.edu/6.175

Sequential Consistency A Memory Model P “A system is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in the order specified by the program” Leslie Lamport Sequential Consistency = arbitrary order-preserving interleaving of memory references of sequential programs November 23, 2015 http://www.csg.csail.mit.edu/6.175

Sequential Consistency Sequential concurrent tasks: T1, T2 Shared variables: X, Y (initially X = 0, Y = 0) T1: T2: Store X, 1 (X = 1) Load R1, Y Store Y, 2 (Y = 2) Store Y’, R1 (Y’= Y) Load R2, X Store X’, R2 (X’= X) what are the legitimate answers for X’ and Y’ ? (X’,Y’)  {(1,2), (0,0), (1,0), (0,2)} ? If y is 2 then x cannot be 1 November 23, 2015 http://www.csg.csail.mit.edu/6.175

Sequential Consistency Sequential consistency imposes more memory ordering constraints than those imposed by uniprocessor program dependencies ( ) What are these in our example ? T1: T2: Store X, 1 (X = 1) Load R1, Y Store Y, 2 (Y = 2) Store Y’, R1 (Y’= Y) Load R2, X Store X’, R2 (X’= X) additional SC requirements ( ) High-performance processor implementations often violate SC Example Store Buffer November 23, 2015 http://www.csg.csail.mit.edu/6.175

Store Buffers P P A processor considers a Store to have been executed as soon as it is stored in the Store buffer, that is, before it is put in L1 Stores can be moved from the store buffer to L1 in a different order A load can read values from the local store buffer (forwarding) Cache Cache Memory The net effect of store buffers is that Loads/Stores can appear to be ordered differently to different processors – breaks SC November 23, 2015 http://www.csg.csail.mit.edu/6.175

Violations of SC Example 1 Process 1 Process 2 Store flag1,1; Store flag2,1; Load r1, flag2; Load r2, flag1; Question: Is it possible that r1=0 and r2=0? Sequential consistency: No Suppose Stores don’t leave the store buffers before the Loads are executed: Yes ! Total Store Order (TSO): IBM 370, Sparc’s TSO memory model, x86 Initially, all memory locations contain zeros November 23, 2015 http://www.csg.csail.mit.edu/6.175

Violations of SC Example 2: Non-FIFO Store buffers Process 1 Process 2 Store a, 1; Load r1, flag; Store flag, 1; Load r2, a; Question: Is it possible that r1=1 but r2=0? Sequential consistency: No With non-FIFO store buffers: Yes Sparc’s PSO memory model November 23, 2015 http://www.csg.csail.mit.edu/6.175

Violations of SC Example 3: Non-Blocking Caches Process 1 Process 2 Store a, 1; Load r1, flag; Store flag, 1; Load r2, a; Question: Is it possible that r1=1 but r2=0? Sequential consistency: No Assuming stores are ordered: Yes because Loads can be reordered Sparc’s RMO, PowerPC’s WO, Alpha November 23, 2015 http://www.csg.csail.mit.edu/6.175

Memory Model Issue Architectural optimizations that are correct for uniprocessors, often violate sequential consistency and result in a new memory model for multiprocessors Memory model issues are subtle and contentious because most ISA specifications (X86, ARM, PowerPC, Sparc, MIPS) are ambiguous For the rest of the lecture we will assume the architecture is SC and focus on synchronization issues November 23, 2015 http://www.csg.csail.mit.edu/6.175

Multiple Consumer Example tail head Producer Rtail Consumer 1 R Rhead 2 Producer posting Item x: Load Rtail, tail Store (Rtail), x Rtail=Rtail+1 Store tail, Rtail Consumer: Load Rhead, head spin: Load Rtail, tail if Rhead==Rtail goto spin Load R, (Rhead) Rhead=Rhead+1 Store head, Rhead process(R) Critical section: Needs to be executed atomically by one consumer  locks What is wrong with this code? November 23, 2015 http://www.csg.csail.mit.edu/6.175

Locks or Semaphores E. W. Dijkstra, 1965 Process i lock(s) <critical section> unlock(s) The execution of the critical section is protected by lock s. Only one process can hold the lock. Suppose the lock s can have only two values: s=0 means that no process has the lock s=1 means that exactly one process has the lock and therefore can access the critical section Once a process successfully acquires a lock, it executes the critical section and then sets s to zero by executing unlock(s) Implementation of locks is quite difficult using just Loads and Stores. ISAs provide special atomic instructions to implement locks November 23, 2015 http://www.csg.csail.mit.edu/6.175

atomic read-modify-write instructions m is a memory location, R is a register Test&Set m, R: R  M[m]; if R==0 then M[m] 1; Location m can be set to one only if it contains a zero Swap m, R: Rt  M[m]; M[m] R; R  Rt; Location m is first read and then set to the new value; the old value is returned in a register November 23, 2015 http://www.csg.csail.mit.edu/6.175

Multiple Consumers Example using the Test&Set Instruction lock: Test&Set mutex, Rtemp if (Rtemp=1) goto lock Load Rhead, head spin: Load Rtail, tail if Rhead==Rtail goto spin Load R, (Rhead) Rhead=Rhead+1 Store head, Rhead unlock: Store mutex, 0 process(R) Critical Section What if the process stops or is swapped out while in the critical section? November 23, 2015 http://www.csg.csail.mit.edu/6.175

Nonblocking Synchronization Load-reserve & Store-conditional Special register(s) to hold reservation flag and address, and the outcome of store-conditional Load-reserve R, m: <flag, adr>  <1, m>; R  M[m]; Store-conditional m, R: if <flag, adr> == <1, m> then cancel other procs’ reservation on m; M[m] R; status succeed; else status fail; try: Load-reserve Rhead, head spin: Load Rtail, tail if Rhead==Rtail goto spin Load R, (Rhead) Rhead = Rhead + 1 Store-conditional head, Rhead if (status==fail) goto try process(R) The corresponding instructions in RISC V are called lr and sc, respectively November 23, 2015 http://www.csg.csail.mit.edu/6.175

Nonblocking Synchronization Load-reserve R, (m): <flag, adr>  <1, m>; R  M[m]; Store-conditional (m), R: if <flag, adr> == <1, m> then cancel other procs’ reservation on m; M[m] R; status succeed; else status fail; The flag is cleared in other processors on a Store using the CC protocol’s invalidation mechanism Usually address m is not remembered by Load-reserve; the flag is cleared on any invalidation works as long as the Load-reserve instructions are not used in a nested manner These instructions won’t work properly if Loads and Stores can be dynamically reordered November 23, 2015 http://www.csg.csail.mit.edu/6.175

Memory Fences Instructions to sequentialize memory accesses Processors with weak or non-sequentially-consistent memory models need to provide memory fence instructions to force the serialization of memory accesses Consumer: Load Rhead, (head) spin: Load Rtail, (tail) if Rhead==Rtail goto spin MembarLL Load R, (Rhead) Rhead=Rhead+1 Store head, Rhead process(R) Producer posting Item x: Load Rtail, (tail) Store (Rtail), x MembarSS Rtail=Rtail+1 Store tail, Rtail ensures that tail ptr is not updated before x has been stored ensures that R is not loaded before x has been stored RISC-V has one instruction called “fence” November 23, 2015 http://www.csg.csail.mit.edu/6.175