Lecture 11: Consistency Models

Slides:



Advertisements
Similar presentations
1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,
Advertisements

1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04 Selective, Accurate,
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.
By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 7: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Lecture 13: Consistency Models
Computer Architecture II 1 Computer architecture II Lecture 9.
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Memory Consistency Models
1 Lecture 12: Relaxed Consistency Models Topics: sequential consistency recap, relaxing various SC constraints, performance comparison.
Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Meenaktchi Venkatachalam.
1 Lecture 22: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Sunita Marathe.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Memory Consistency Zhonghai Lu Outline Introduction What is a memory consistency model? Who should care? Memory consistency models Strict.
CS533 Concepts of Operating Systems Jonathan Walpole.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
1 Lecture 3: Coherence Protocols Topics: consistency models, coherence protocol examples.
1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04.
1 Lecture: Coherence Topics: snooping-based coherence, directory-based coherence protocols (Sections )
Lecture 8: Snooping and Directory Protocols
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 20: Consistency Models, TM
COSC6385 Advanced Computer Architecture
Memory Consistency Models
Lecture 19: Coherence and Synchronization
Lecture 11: Consistency Models
Memory Consistency Models
Lecture 18: Coherence and Synchronization
Example Cache Coherence Problem
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 2: Snooping-Based Coherence
Shared Memory Consistency Models: A Tutorial
Lecture 12: TM, Consistency Models
Lecture 5: Snooping Protocol Design Issues
Lecture: Consistency Models, TM
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 4: Update Protocol
Lecture 21: Synchronization and Consistency
Lecture 22: Consistency Models, TM
Lecture: Coherence and Synchronization
Shared Memory Consistency Models: A Tutorial
Lecture 25: Multiprocessors
Lecture 9: Directory-Based Examples
Lecture 10: Consistency Models
Lecture 4: Synchronization
Lecture 25: Multiprocessors
Lecture 24: Virtual Memory, Multiprocessors
Lecture 24: Multiprocessors
Lecture: Coherence, Synchronization
Lecture: Coherence Topics: wrap-up of snooping-based coherence,
Lecture 21: Synchronization & Consistency
Lecture: Coherence and Synchronization
Lecture: Consistency Models, TM
Lecture 19: Coherence and Synchronization
Lecture 11: Relaxed Consistency Models
Lecture 18: Coherence and Synchronization
Presentation transcript:

Lecture 11: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency

Sequential Consistency A multiprocessor system is sequentially consistent if the result of any execution is the same as if the operations of all processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program Atomicity: each processor sees operations complete instantaneously in the same order Program order is preserved within each processor

Example Programs Initially, Flag1 = Flag2 = 0 P1 P2 if (Flag2 == 0) if (Flag1 == 0) critical section critical section Initially, A = B = 0 P1 P2 P3 A = 1 if (A == 1) B = 1 if (B == 1) register = A

Write Buffers with Bypassing Assume an architecture without caches Writes by a processor are inserted in the write buffer and the processor proceeds without waiting for the write to complete – subsequent reads have priority for mem-access This can result in both processors entering the critical section in example-1 Illustrates the importance of program order (wr  rd dependence)

Write Buffer P1 P2 Rd Flag2 t1 Rd Flag1 t2 Wr Flag1 t3 Wr Flag2 t4 Shared Bus Memory Flag1: 0 Flag2: 0

Overlapping Writes Architecture without caches, multiple memory modules, general interconnect (non-bus), writes are issued and the processor continues without waiting for them to finish Again, sequential consistency is violated in next example Illustrates the importance of program order (wr  wr dependence) To enforce ordering, processors must wait for write acknowledgments before proceeding

Overlapped Writes P1 P2 Read Data t3 Read Head t2 General Interconnect Write Head Write Data t1 t4 Memory Head: 0 Memory Data: 0 P1 P2 Data = 2000 while (Head == 0) { } Head = 1 … = Data

Non-Blocking Reads Assume writes complete atomically and in program order If reads issue (or complete) out of order, sequential consistency is violated Illustrates the importance of rdrd program order

Non-Blocking Reads P1 P2 Write Head t3 Write Data t2 General Interconnect Read Head Read Data t4 t1 Memory Head: 0 Memory Data: 0 P1 P2 Data = 2000 while (Head == 0) { } Head = 1 … = Data

Architectures with Caches The earlier examples only violated program order – writes were still atomic and seen by all processors in the same order The latter condition can be easily violated if each processor has a cache (in spite of cache coherence) Recall that cache coherence simply guarantees write propagation and write serialization to the same memory location – it does not guarantee that writes to different locations are seen in the same order

Maintaining Atomicity To preserve program order, we will not allow a processor to proceed unless it receives the write acknowledgment (all other processors have seen invalidates or updates) Two conditions can ensure the appearance of write atomicity: write serialization to each location stalling reads until all processors have seen the last update to that location

Write Serialization Example P1 P2 P3 P4 A = 1 A = 2 while (B != 1) { } while (B != 1) { } B = 1 C = 1 while (C != 1) { } while (C != 1) { } register1 = A register2 = A register1 and register2 having different values is a violation of sequential consistency – possible if updates to A appear in different orders Cache coherence guarantees write serialization to a single memory location

Non-Atomic Write Updates Initially, A = B = 0 P1 P2 P3 A = 1 if (A == 1) B = 1 if (B == 1) register = A P2 reads new A before update reaches P3 Update of B reaches P3 before update of A P3 reads B and then A before update of A arrives Assume each processor executes operations in program order (waiting for acks) and we have write serialization to the same memory location

Implementing Atomic Updates The above problem can be eliminated by not allowing a read to proceed unless all processors have seen the last update to that location Easy in an invalidate-based system: memory will not service the request unless it has received acks from all processors In an update-based system: a second set of messages is sent to all processors informing them that all acks have been received; reads cannot be serviced until the processor gets the second message

Summary To preserve sequential consistency: hardware must preserve program order for all memory operations (including waiting for acks) writes to a location must be serialized the value of a write cannot be read unless all have seen the write (it is ok if writes to different locations are not seen in the same order as long as conflicting reads do not happen)

Performance Optimizations Program order is a major constraint – the following try to get around this constraint without violating seq. consistency if a write has been stalled, prefetch the block in exclusive state to reduce traffic when the write happens allow out-of-order reads with the facility to rollback if the ROB detects a violation Get rid of sequential consistency in the common case and employ relaxed consistency models – if one really needs sequential consistency in key areas, insert fence instructions between memory operations Next class: consistency models by relaxing constraints

Title Bullet