1 Lecture 12: Relaxed Consistency Models Topics: sequential consistency recap, relaxing various SC constraints, performance comparison.

Slides:



Advertisements
Similar presentations
1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,
Advertisements

1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Memory Consistency Models Sarita Adve Department of Computer Science University of Illinois at Urbana-Champaign Ack: Previous tutorials.
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
© Krste Asanovic, 2014CS252, Spring 2014, Lecture 12 CS252 Graduate Computer Architecture Spring 2014 Lecture 12: Synchronization and Memory Models Krste.
1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04 Selective, Accurate,
Is SC + ILP = RC? Presented by Vamshi Kadaru Chris Gniady, Babak Falsafi, and T. N. VijayKumar - Purdue University Spring 2005: CS 7968 Parallel Computer.
CS492B Analysis of Concurrent Programs Consistency Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Cache Coherence in Scalable Machines (IV) Dealing with Correctness Issues Serialization of operations Deadlock Livelock Starvation.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Memory consistency models Presented by: Gabriel Tanase.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 7: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Lecture 13: Consistency Models
Computer Architecture II 1 Computer architecture II Lecture 9.
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Memory Consistency Models
Synchron. CSE 4711 The Need for Synchronization Multiprogramming –“logical” concurrency: processes appear to run concurrently although there is only one.
Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Meenaktchi Venkatachalam.
1 Lecture 22: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Sunita Marathe.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Evaluation of Memory Consistency Models in Titanium.
Lecture 4. Memory Consistency Models
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
“Shared Memory Consistency Models: A Tutorial” By Sarita Adve, Kourosh Gharachorloo WRL Research Report, 1995 Presentation: Vince Schuster.
Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
1 Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5)
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.
Anshul Kumar, CSE IITD ECE729 : Advance Computer Architecture Lecture 26: Synchronization, Memory Consistency 25 th March, 2010.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Release Consistency Yujia Jin 2/27/02. Motivations Place partial order on memory accesses for correct parallel program behavior Relax partial order for.
CS 295 – Memory Models Harry Xu Oct 1, Multi-core Architecture Core-local L1 cache L2 cache shared by cores in a processor All processors share.
Memory Consistency Zhonghai Lu Outline Introduction What is a memory consistency model? Who should care? Memory consistency models Strict.
CS533 Concepts of Operating Systems Jonathan Walpole.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
1 Lecture 3: Coherence Protocols Topics: consistency models, coherence protocol examples.
1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04.
An Evaluation of Memory Consistency Models for Shared- Memory Systems with ILP processors Vijay S. Pai, Parthsarthy Ranganathan, Sarita Adve and Tracy.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 20: Consistency Models, TM
Lecture: Large Caches, Virtual Memory
Memory Consistency Models
Lecture 11: Consistency Models
Memory Consistency Models
Lecture 18: Coherence and Synchronization
Relaxed Consistency models and software distributed memory
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Cache Coherence Protocols 15th April, 2006
Shared Memory Consistency Models: A Tutorial
Lecture 12: TM, Consistency Models
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture: Cache Innovations, Virtual Memory
Presented to CS258 on 3/12/08 by David McGrogan
Lecture 22: Consistency Models, TM
Background for Debate on Memory Consistency Models
Lecture 10: Consistency Models
Memory Consistency Models
Relaxed Consistency Part 2
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 21: Synchronization & Consistency
Lecture: Consistency Models, TM
Lecture 11: Relaxed Consistency Models
Lecture 18: Coherence and Synchronization
Lecture 11: Consistency Models
Presentation transcript:

1 Lecture 12: Relaxed Consistency Models Topics: sequential consistency recap, relaxing various SC constraints, performance comparison

2 Relaxed Memory Models Recall that sequential consistency has two requirements: program order and write atomicity Different consistency models can be defined by relaxing some of the above constraints  this can improve performance, but the programmer must have a good understanding of the program and the hardware

3 Potential Relaxations Program Order: (all refer to different memory locations)  Write to Read program order  Write to Write program order  Read to Read and Read to Write program orders Write Atomicity: (refers to same memory location)  Read others’ write early Write Atomicity and Program Order:  Read own write early

4 Write  Read Program Order Consider three example implementations that relax the write to read program order:  IBM 370: a read can complete before an earlier write to a different address, but a read cannot return the value of a write unless all processors have seen the write  SPARC V8 Total Store Ordering (TSO): a read can complete before an earlier write to a different address, but a read cannot return the value of a write by another processor unless all processors have seen the write (it returns the value of own write before others see it)  Processor Consistency (PC): a read can complete before an earlier write (by any processor to any memory location) has been made visible to all

5 Relaxations  IBM 370: a read can complete before an earlier write to a different address, but a read cannot return the value of a write unless all processors have seen the write  SPARC V8 Total Store Ordering (TSO): a read can complete before an earlier write to a different address, but a read cannot return the value of a write by another processor unless all processors have seen the write (it returns the value of own write before others see it)  Processor Consistency (PC): a read can complete before an earlier write (by any processor to any memory location) has been made visible to all RelaxationW  R Order W  W Order R  RW Order Rd others’ Wr early Rd own Wr early IBM 370X TSOXX PCXXX

6 Examples Initially, A=Flag1=Flag2=0 Initially, A=B=0 P1 P2 P1 P2 P3 Flag1=1 Flag2=1 A=1 A=1 A=2 if (A==1) register1=A register3=A B=1 register2=Flag2 register4=Flag1 if (B==1) register1=A Result: reg1=1;reg3=2;reg2=reg4=0 Result: B=1,reg1=0 RelaxationW  R Order W  W Order R  RW Order Rd others’ Wr early Rd own Wr early IBM 370X TSOXX PCXXX

7 Safety Nets To explicitly enforce sequential consistency, safety nets or fence instructions can be used Note that read-modify-write operations can double up as fence instructions – replacing the read or write with a r-m-w effectively achieves sequential consistency – the read and write of the r-m-w can have no intervening operations and successive reads or successive writes must be ordered in some of the memory models

8 Optimizations Enabled W  R : takes writes off the critical path W  W: memory parallelism (bandwidth utilization) R  WR: non-blocking caches, overlaps other useful work with a read miss

9 Weak Ordering An example of a model that relaxes all of the above constraints (except reading others’ write early) Operations are classified as data and synchronization A counter tracks the number of outstanding data operations and does not issue a synchronization until the counter is zero; data ops cannot begin unless the previous synchronization op has completed

10 Release Consistency RCsc relaxes constraints similar to WO, while RCpc also allows reading others’ writes early More distinctions among memory operations  RCsc maintains SC between special, while RCpc maintains PC between special ops  RCsc maintains orders: acquire  all, all  release, special  special  RCpc maintains orders: acquire  all, all  release, special  special, except for sp.wr followed by sp.rd shared specialordinary syncnsync acquirerelease

11 Programmer Viewpoint Weak ordering will yield high performance, but the programmer has to identify data and synch operations An operation is defined as a synch operation if it forms a race with another operation in any seq. consistent execution Given a seq. consistent execution, an operation forms a race with another operation if the two operations access the same location, at least one of them is a write, and there are no other intervening operations between them P1 P2 Data = 2000 while (Head == 0) { } Head = 1 … = Data

12 Performance Comparison Taken from Gharachorloo, Gupta, Hennessy, ASPLOS’91 Studies three benchmark programs and three different architectures:  MP3D: 3-D particle simulator  LU: LU-decomposition for dense matrices  PTHOR: logic simulator  LFC: aggressive; lockup-free caches, write buffer with bypassing  RDBYP: only write buffer with bypassing  BASIC: no write buffer, no lockup-free caches

13 Performance Comparison

14 Summary Sequential Consistency restricts performance (even more when memory and network latencies increase relative to processor speeds) Relaxed memory models relax different combinations of the five constraints for SC Most commercial systems are not sequentially consistent and rely on the programmer to insert appropriate fence instructions to provide the illusion of SC

15 Title Bullet