Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.

Slides:

Advertisements

Similar presentations

Symmetric Multiprocessors: Synchronization and Sequential Consistency.

Advertisements

1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,

Shared Memory Consistency

1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )

Memory Consistency Models Sarita Adve Department of Computer Science University of Illinois at Urbana-Champaign Ack: Previous tutorials.

SE-292 High Performance Computing

CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,

© Krste Asanovic, 2014CS252, Spring 2014, Lecture 12 CS252 Graduate Computer Architecture Spring 2014 Lecture 12: Synchronization and Memory Models Krste.

1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04 Selective, Accurate,

Is SC + ILP = RC? Presented by Vamshi Kadaru Chris Gniady, Babak Falsafi, and T. N. VijayKumar - Purdue University Spring 2005: CS 7968 Parallel Computer.

CS492B Analysis of Concurrent Programs Consistency Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.

1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.

Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.

By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.

Memory consistency models Presented by: Gabriel Tanase.

1 Lecture 7: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.

Lecture 13: Consistency Models

Computer Architecture II 1 Computer architecture II Lecture 9.

1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.

Memory Consistency Models

1 Lecture 12: Relaxed Consistency Models Topics: sequential consistency recap, relaxing various SC constraints, performance comparison.

CS510 Concurrent Systems Class 5 Threads Cannot Be Implemented As a Library.

Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Meenaktchi Venkatachalam.

1 Lecture 22: Synchronization & Consistency Topics: synchronization, consistency models (Sections )

Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Sunita Marathe.

Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.

Evaluation of Memory Consistency Models in Titanium.

Lecture 4. Memory Consistency Models

Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.

“Shared Memory Consistency Models: A Tutorial” By Sarita Adve, Kourosh Gharachorloo WRL Research Report, 1995 Presentation: Vince Schuster.

Foundations of the C++ Concurrency Memory Model Hans-J. Boehm Sarita V. Adve HP Laboratories UIUC.

Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.

By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.

Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.

Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.

Anshul Kumar, CSE IITD ECE729 : Advance Computer Architecture Lecture 26: Synchronization, Memory Consistency 25 th March, 2010.

Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.

Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451.

Release Consistency Yujia Jin 2/27/02. Motivations Place partial order on memory accesses for correct parallel program behavior Relax partial order for.

CS 295 – Memory Models Harry Xu Oct 1, Multi-core Architecture Core-local L1 cache L2 cache shared by cores in a processor All processors share.

DISTRIBUTED COMPUTING

Memory Consistency Zhonghai Lu Outline Introduction What is a memory consistency model? Who should care? Memory consistency models Strict.

ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.

CS533 Concepts of Operating Systems Jonathan Walpole.

Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.

1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04.

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Lecture 20: Consistency Models, TM

Software Coherence Management on Non-Coherent-Cache Multicores

Memory Consistency Models

Lecture 11: Consistency Models

Memory Consistency Models

Threads and Memory Models Hal Perkins Autumn 2011

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Cache Coherence Protocols 15th April, 2006

Shared Memory Consistency Models: A Tutorial

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Presented to CS258 on 3/12/08 by David McGrogan

Introduction to High Performance Computing Lecture 20

Shared Memory Programming

Background for Debate on Memory Consistency Models

Shared Memory Consistency Models: A Tutorial

Lecture 10: Consistency Models

Memory Consistency Models

Relaxed Consistency Finale

Programming with Shared Memory Specifying parallelism

Lecture 11: Relaxed Consistency Models

Problems with Locks Andrew Whitaker CSE451.

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 19 Memory Consistency Models Krste Asanovic Electrical Engineering.

Lecture 11: Consistency Models

Presentation transcript:

Memory Consistency Models Kevin Boos

Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995 All figures taken from the above paper. Memory Models: A Case for Rethinking Parallel Languages and Hardware – Sarita V. Adve & Hans-J. Boehm – August

Roadmap  Memory Consistency Primer  Sequential Consistency  Implementation w/o caches  Implementation with caches  Compiler issues  Relaxed Consistency 3

What is Memory Consistency? 4

Memory Consistency  Formal specification of memory semantics  Guarantees as to how shared memory will behave in the presence of multiple processors/nodes  Ordering of reads and writes  How does it appear to the programmer … ? 5

Why Bother?  Memory consistency models affect everything  Programmability  Performance  Portability  Model must be defined at all levels  Programmers and system designers care 6

Uniprocessor Systems  Memory operations occur:  One at a time  In program order  Read returns value of last write  Only matters if location is the same or dependent  Many possible optimizations  Intuitive! 7

Sequential Consistency 8

 The result of any execution is the same as if all operations were executed on a single processor  Operations on each processor occur in the sequence specified by the executing program P1P2P3Pn … Memory 9

Why do we need S.C.? Initially, Flag1 = Flag2 = 0 P1P2 Flag1 = 1Flag2 = 1 if (Flag2 == 0)if (Flag1 == 0) enter CS enter CS 10

Why do we need S.C.? Initially, A = B = 0 P1P2P3 A = 1 if (A == 1) B = 1 if (B == 1) register1 = A 11

Implementing Sequential Consistency (without caches) 12

Write Buffers P1P2 Flag1 = 1Flag2 = 1 if (Flag2 == 0)if (Flag1 == 0) enter CS enter CS 13

Overlapping Writes P1P2 Data = 2000while (Head == 0) {;} Head = 1... = Data 14

Non-Blocking Read P1P2 Data = 2000while (Head == 0) {;} Head = 1... = Data 15

Implementing Sequential Consistency (with caches) 16

Cache Coherence  A mechanism to propagate updates from one (local) cache copy to all other (remote) cache copies  Invalidate vs. Update  Coherence vs. Consistency?  Coherence: ordering of ops. at a single location  Consistency: ordering of ops. at multiple locations  Consistency model places bounds on propagation 17

Write Completion P1P2 (has “Data” in cache) Data = 2000while (Head == 0) {;} Head = 1... = Data Write- through cache 18

Write Atomicity  Propagating changes among caches is non-atomic P1 P2 P3 P4 A = 1 A = 2 while (B != 1) { } while (B != 1) { } B = 1 C = 1 while (C != 1) { } while (C != 1) { } register1 = A register2 = A register1 == register2? 19

Write Atomicity Initially, all caches contain A and B P1P2P3 A = 1 if (A == 1) B = 1 if (B == 1) register1 = A 20

Compilers  Compilers make many optimizations P1P2 Data = 2000while (Head == 0) { } Head = 1... = Data 21

Sequential Consistency … wrapping things up … 22

Overview of S.C.  Program Order  A processor’s previous memory operation must complete before the next one can begin  Write Atomicity (cache systems only)  Writes to the same location must be seen by all other processors in the same location  A read must not return the value of a write until that write has been propagated to all processors  Write acknowledgements are necessary 23

S.C. Disadvantages  Difficult to implement!  Huge lost potential for optimizations  Hardware (cache) and software (compiler)  Be conservative: err on the safe side  Major performance hit 24

Relaxed Consistency 25

Relaxed Consistency  Program Order relaxations (different locations)  W  R; W  W; R  R/W  Write Atomicity relaxations  Read returns another processor’s Write early  Combined relaxations  Read your own Write (okay for S.C.)  Safety Net – available synchronization operations  Note: assume one thread per core 26

Comparison of Models 27

Write  Read  Can be reordered: same processor, different locations  Hides write latency  Different processors? Same location? 1. IBM 370  Any write must be fully propagated before reading 2. SPARC V8 – Total Store Ordering (TSO)  Can read its own write before that write is fully propagated  Cannot read other processors’ writes before full propagation 3. Processor Consistency (PC)  Any write can be read before being fully propagated 28

Example: Write  Read P1 P2 F1 = 1F2 = 1 A = 1A = 2 Rg1 = ARg3 = A Rg2 = F2Rg4 = F1 Rg1 = 1 Rg3 = 2 Rg2 = 0 Rg4 = 0 P1 P2 P3 A = 1 if(A==1) B = 1 if (B==1) Rg1 = A Rg1 = 0, B = 1 29 PC onlyTSO and PC

Write  Write  Can be reordered: same processor, different locations  Multiple writes can be pipelined/overlapped  May reach other processors out of program order  Partial Store Ordering (PSO)  Similar to TSO  Can read its own write early  Cannot read other processors’ writes early 30

Example: Write  Write 31 P1 P2 Data = 2000 while (Head == 0) {;} Head = 1... = Data PSO = non sequentially consistent … can we fix that? P1 P2 Data = 2000 while (Head == 0) {;} STBAR // write barrier Head = 1... = Data

Relaxing All Program Orders 32

Read  Read/Write  All program orders have been relaxed  Hides both read and write latency  Compiler can finally take advantage  All models: Processor can read its own write early  Some models: can read others’ writes early  RCpc, PowerPC  Most models ensure write atomicity  Except RCsc 33

Weak Ordering (WO)  Classifies memory operations into two categories:  Data operation  Synchronization operation  Can only enforce Program Order with sync operations data data sync data data sync  Sync operations are effectively safety nets  Write atomicity is guaranteed (to the programmer) 34

 More classifications than Weak Ordering  Sync operations access a shared location (lock)  Acquire – read operation on a shared location  Release – write operation on a shared location Release Consistency 35 shared ordinary special nsync sync acquire release

R.C. Flavors RCsc  Maintains sequential consistency among “special” operations  Program Order Rules:  acquire  all  all  release  special  special RCpc  Maintains processor consistency among “special” operations  Program Order Rules:  acquire  all  all  release  special  special (except sp. W  sp. R) 36

Other Relaxed Models  Similar relaxations as WO and RC  Different types of safety nets (fences)  Alpha – MB and WMB  SPARC V9 RMO – MEMBAR with 4-bit encoding  PowerPC – SYNC  Like MEMBAR, but does not guarantee R  R (use isync)  These models all guarantee write atomicity  Except PowerPC, the most relaxed model of all  Allows a write to be seen early by another processor’s read 37

Relaxed Consistency … wrapping things up … 38

Relaxed Consistency Overview  Sequential Consistency ruins performance  Why assume that the hardware knows better than the programmer?  Less strict rules = more optimizations  Compiler works best with all Program Order requirements relaxed  WO, RC, and more give it full flexibility  Puts more power into the hands of programmers and compiler designers  With great power comes great responsibility 39

A Programmer’s View  Sequential Consistency is (clearly) the easiest  Relaxed Consistency is (dangerously) powerful  Programmers must properly classify operations  Data/Sync operations when using WO and RCsc,pc  Can’t classify? Use manual memory barriers  Must be conservative – forego optimizations   High-level languages try to abstract the intricacies P1 P2 Data = 2000 while (Head == 0) {;} Head = 1... = Data 40

Final Thoughts 41

Concluding Remarks  Memory Consistency models affect everything  Sequential Consistency  Ensures Program Order & Write Atomicity  Intuitive and easy to use  Implementation, no optimizations, bad performance  Relaxed Consistency  Doesn’t ensure Program Order  Added complexity for programmers and compilers  Allows more optimizations, better performance  Wide variety of models offers maximum flexibility 42

Modern Times  Multiple threads per core  What can threads see, and when?  Cache levels and optimizations 43

Questions? 44