Lecture 21 Synchronization

Slides:



Advertisements
Similar presentations
Cache Coherence. Memory Consistency in SMPs Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has.
Advertisements

L.N. Bhuyan Adapted from Patterson’s slides
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Multiple Processor Systems
Synchron. CSE 471 Aut 011 Some Recent Medium-scale NUMA Multiprocessors (research machines) DASH (Stanford) multiprocessor. –“Cluster” = 4 processors on.
Computer Architecture II 1 Computer architecture II Lecture 9.
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Communication Models for Parallel Computer Architectures 4 Two distinct models have been proposed for how CPUs in a parallel computer system should communicate.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
Vir. Mem II CSE 471 Aut 011 Synonyms v.p. x, process A v.p. y, process B v.p # index Map to same physical page Map to synonyms in the cache To avoid synonyms,
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Multiprocessors – Locks
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Translation Lookaside Buffer
CMSC 611: Advanced Computer Architecture
COSC6385 Advanced Computer Architecture
Outline Introduction Centralized shared-memory architectures (Sec. 5.2) Distributed shared-memory and directory-based coherence (Sec. 5.4) Synchronization:
Cache Organization of Pentium
Software Coherence Management on Non-Coherent-Cache Multicores
Outline Paging Swapping and demand paging Virtual memory.
Lecture 19: Coherence and Synchronization
Page Table Implementation
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 18: Coherence and Synchronization
12.4 Memory Organization in Multiprocessor Systems
The University of Adelaide, School of Computer Science
Lecture 28: Virtual Memory-Address Translation
CMSC 611: Advanced Computer Architecture
Example Cache Coherence Problem
The University of Adelaide, School of Computer Science
Lecture 2: Snooping-Based Coherence
Lecture 5: Snooping Protocol Design Issues
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Multiprocessors - Flynn’s taxonomy (1966)
Lecture 21: Synchronization and Consistency
Translation Lookaside Buffer
Lecture: Coherence and Synchronization
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
CS 213 Lecture 11: Multiprocessor 3: Directory Organization
Lecture 2 Part 2 Process Synchronization
Lecture 25: Multiprocessors
Lecture 10: Consistency Models
High Performance Computing
Lecture 25: Multiprocessors
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Lecture 8: Efficient Address Translation
Lecture 24: Multiprocessors
CSE 471 Autumn 1998 Virtual memory
Lecture: Coherence Topics: wrap-up of snooping-based coherence,
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture: Coherence and Synchronization
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Synonyms v.p. x, process A v.p # index Map to same physical page
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 11: Consistency Models
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

Lecture 21 Synchronization

Directory Based Protocols All accesses to potentially shared memory locations are sent to all processors Not a very scalable proposition computation and communication Amdahl’s law costs of synchronization For correct operation: hardware: cache coherence software: synchronization hw and sw: memory consistency all three affect performance, too © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois

Where are the synchronization Variables Distinct Special Memory Locks in registers accessed by special instructions Fast access to a set of special registers, often via a separate bus Alliant FX/80, CRAY X-MP, and Sequent Balance SLIC (System Link and Interconnect) gates, and SGI MIPS-based machines. Often implemented with distributed copies updating each other via a separate synch bus Restricted to privileged users or allocated by OS via library functions May provide library functions to access a set of virtual virtual synch variables multiplexed on a small set of physical synch variables. This however, defeats the original goal of fast access. © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois

Where are the synchronization Variables (Cont.) Mapped Special memory Looks in registers mapped to user address space Sequent Balance ALM (Atomic Lock Memory) and Alliant FX/2800 Provides a limited number of physical synch variables managed the same way as virtual memory Virtual memory provides protection among unprivileged users Removes the need for library functions to access locks Often implemented as a special memory module accessed via the normal memory bus. © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois

Where are the synchronization Variables Normal Normal Memory Potentially all virtual memory could contain synch variables Locks in normal memory and indistinguishable from normal data manipulated with special atomic instructions Encore Multimax and Sequent Symmetry © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois

Where are the synchronization variables? Tagged Normal Memory Locks in tag fields of normal memory accessed by special synch instruction HEP full/empty bits and IBM 801 © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois

Example of an implicitly synchronized program Processor 0 Processor 1 A = 1; If (B == 0) { Critical sec. A = 0; } B = 1; If (A == 0){ Critical Sec. B = 0; } If Processor 0 reorders the write to A and the read from B, both processors can end up in the critical section at the same time. There is no local basis to know that write to A and read from B must be strictly ordered due to the needs of multiprocessor execution © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois

How do caches react to synchronization Non-cached Use memory management to prevent synch variables from entering the cache Often implemented by placing synch variables in non-cacheable pages Bypass cache Does not require special OS and library functions to place synch variables into non-cacheable pages. Allows synch variables to enter caches if accessed by normal instructions. RMW bypasses the cache: it goes to the bus even if the synch variable is the local cache RMW invalidates address from all caches and does not place synch variable into the local cache. 68030, 88X00, i860, 80486, SPARC © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois

How do caches react to synchronization Exclusive copy Read-invalidate bus transaction to acquire exclusive copy Synch performed locally on exclusive copy Problem with test-and-set ping-pong effects due to spin lock Sequent Symmetry and recent Encore Multimax © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois

How do write buffers react to synchronization Write buffers for write-through caches Write buffers for write-back caches Write buffers for bus arbitration Invalidation/update buffers for snooping caches Invalidation/update buffers Cache often has only one access port shared between CPU and bus snooper Normally CPU has priority, invalidate requests are queued in a buffer to wait for a free cache cycle © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois

Sender Synchronization Flush all local write buffers before releasing a lock Ensure that all data generated before lock release are visible to the other processors The sender CPU could continue execution in its local cache while its write buffers are being flushed. However, its write buffers will not accept new entries until they are all emptied. In this case, a processor simply makes sure its local invalidation/update buffer is empty before acquiring a lock. Requires a special instruction to release a lock (rather than a normal memory write). E.g. release consistency in Stanford DASH © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois

Receiver Synchronization Flush all write buffers in the system when acquiring a lock Ensure that all potential supplier of data to be used after acquiring the lock have flushed out their data. The receiver cannot continue execution until all write buffers are empty. Should be avoided due to its performance penalty. © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois

How does system bus support synchronization? Exclusive bus lock system bus locked out for the entire RMW transaction Split RMW system bus freed between read and write memory board locks the RMW location for the entire RMW transaction Requires intelligent memory board controller. Need to decide the granularity of locking: whole board, individual locations, somewhere in between. E.g., in VAX 6200, a small associative store is used to record the locked locations in each memory board. © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois

How does system bus support synchronization? Exchange with memory out-of-order RMW RMW addr → New data → old data eliminates one address cycle cache coherence protocol modified if synch variables allowed in cache for update protocols Load-store-fixed implied write eliminate one more data transmission © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois

Where is the synchronization done? Lock by holding the bus, CPU does computation Locus: processor (where the computation is done) Domain: bus and single ported memory (exclusivity) Bypass local cache Invalidate all other caches Other processors still have access to their local caches Lock the memory board, memory board does computation Locus: memory Domain: memory board Only one Read-modify-write request Invalidate all caches Intelligence on memory board No locking on bus © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois