Download presentation
Presentation is loading. Please wait.
1
Lecture 21 Synchronization
2
Directory Based Protocols
All accesses to potentially shared memory locations are sent to all processors Not a very scalable proposition computation and communication Amdahl’s law costs of synchronization For correct operation: hardware: cache coherence software: synchronization hw and sw: memory consistency all three affect performance, too © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois
3
Where are the synchronization Variables
Distinct Special Memory Locks in registers accessed by special instructions Fast access to a set of special registers, often via a separate bus Alliant FX/80, CRAY X-MP, and Sequent Balance SLIC (System Link and Interconnect) gates, and SGI MIPS-based machines. Often implemented with distributed copies updating each other via a separate synch bus Restricted to privileged users or allocated by OS via library functions May provide library functions to access a set of virtual virtual synch variables multiplexed on a small set of physical synch variables. This however, defeats the original goal of fast access. © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois
4
Where are the synchronization Variables (Cont.)
Mapped Special memory Looks in registers mapped to user address space Sequent Balance ALM (Atomic Lock Memory) and Alliant FX/2800 Provides a limited number of physical synch variables managed the same way as virtual memory Virtual memory provides protection among unprivileged users Removes the need for library functions to access locks Often implemented as a special memory module accessed via the normal memory bus. © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois
5
Where are the synchronization Variables
Normal Normal Memory Potentially all virtual memory could contain synch variables Locks in normal memory and indistinguishable from normal data manipulated with special atomic instructions Encore Multimax and Sequent Symmetry © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois
6
Where are the synchronization variables?
Tagged Normal Memory Locks in tag fields of normal memory accessed by special synch instruction HEP full/empty bits and IBM 801 © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois
7
Example of an implicitly synchronized program
Processor 0 Processor 1 A = 1; If (B == 0) { Critical sec. A = 0; } B = 1; If (A == 0){ Critical Sec. B = 0; } If Processor 0 reorders the write to A and the read from B, both processors can end up in the critical section at the same time. There is no local basis to know that write to A and read from B must be strictly ordered due to the needs of multiprocessor execution © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois
8
How do caches react to synchronization
Non-cached Use memory management to prevent synch variables from entering the cache Often implemented by placing synch variables in non-cacheable pages Bypass cache Does not require special OS and library functions to place synch variables into non-cacheable pages. Allows synch variables to enter caches if accessed by normal instructions. RMW bypasses the cache: it goes to the bus even if the synch variable is the local cache RMW invalidates address from all caches and does not place synch variable into the local cache. 68030, 88X00, i860, 80486, SPARC © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois
9
How do caches react to synchronization
Exclusive copy Read-invalidate bus transaction to acquire exclusive copy Synch performed locally on exclusive copy Problem with test-and-set ping-pong effects due to spin lock Sequent Symmetry and recent Encore Multimax © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois
10
How do write buffers react to synchronization
Write buffers for write-through caches Write buffers for write-back caches Write buffers for bus arbitration Invalidation/update buffers for snooping caches Invalidation/update buffers Cache often has only one access port shared between CPU and bus snooper Normally CPU has priority, invalidate requests are queued in a buffer to wait for a free cache cycle © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois
11
Sender Synchronization
Flush all local write buffers before releasing a lock Ensure that all data generated before lock release are visible to the other processors The sender CPU could continue execution in its local cache while its write buffers are being flushed. However, its write buffers will not accept new entries until they are all emptied. In this case, a processor simply makes sure its local invalidation/update buffer is empty before acquiring a lock. Requires a special instruction to release a lock (rather than a normal memory write). E.g. release consistency in Stanford DASH © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois
12
Receiver Synchronization
Flush all write buffers in the system when acquiring a lock Ensure that all potential supplier of data to be used after acquiring the lock have flushed out their data. The receiver cannot continue execution until all write buffers are empty. Should be avoided due to its performance penalty. © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois
13
How does system bus support synchronization?
Exclusive bus lock system bus locked out for the entire RMW transaction Split RMW system bus freed between read and write memory board locks the RMW location for the entire RMW transaction Requires intelligent memory board controller. Need to decide the granularity of locking: whole board, individual locations, somewhere in between. E.g., in VAX 6200, a small associative store is used to record the locked locations in each memory board. © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois
14
How does system bus support synchronization?
Exchange with memory out-of-order RMW RMW addr → New data → old data eliminates one address cycle cache coherence protocol modified if synch variables allowed in cache for update protocols Load-store-fixed implied write eliminate one more data transmission © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois
15
Where is the synchronization done?
Lock by holding the bus, CPU does computation Locus: processor (where the computation is done) Domain: bus and single ported memory (exclusivity) Bypass local cache Invalidate all other caches Other processors still have access to their local caches Lock the memory board, memory board does computation Locus: memory Domain: memory board Only one Read-modify-write request Invalidate all caches Intelligence on memory board No locking on bus © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.