Shared Memory Consistency Models. Quiz (1)  Let’s define shared memory.

Slides:

Advertisements

Similar presentations

Symmetric Multiprocessors: Synchronization and Sequential Consistency.

Advertisements

1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,

1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )

Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.

Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.

CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,

D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.

Chapter 6: Process Synchronization

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 5: Process Synchronization.

Mutual Exclusion.

Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture.

CS492B Analysis of Concurrent Programs Consistency Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.

Consistency Models Based on Tanenbaum/van Steen’s “Distributed Systems”, Ch. 6, section 6.2.

Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.

Ordering and Consistent Cuts Presented By Biswanath Panda.

By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.

CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.

Computer Architecture 2011 – coherency & consistency (lec 7) 1 Computer Architecture Memory Coherency & Consistency By Dan Tsafrir, 11/4/2011 Presentation.

1 Lecture 7: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.

Lecture 13: Consistency Models

Computer Architecture II 1 Computer architecture II Lecture 9.

1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.

1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.

Memory Consistency Models

Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture.

CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.

1 Lecture 22: Synchronization & Consistency Topics: synchronization, consistency models (Sections )

Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Sunita Marathe.

Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.

A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace.

© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.

Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation.

Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.

CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 3 (26/01/2006) Instructor: Haifeng YU.

ECE200 – Computer Organization Chapter 9 – Multiprocessors.

Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.

Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 12: May 3, 2003 Shared Memory.

By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.

Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.

Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.

Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451.

Fundamentals of Parallel Computer Architecture - Chapter 71 Chapter 7 Introduction to Shared Memory Multiprocessors Yan Solihin Copyright.

ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.

CS533 Concepts of Operating Systems Jonathan Walpole.

Agenda  Quick Review  Finish Introduction  Java Threads.

1 Programming with Shared Memory - 3 Recognizing parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Jan 22, 2016.

740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University.

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Lecture 20: Consistency Models, TM

Memory Consistency Models

Lecture 11: Consistency Models

Memory Consistency Models

Chapter 5: Process Synchronization

Threads and Memory Models Hal Perkins Autumn 2011

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Introduction to High Performance Computing Lecture 20

Threads and Memory Models Hal Perkins Autumn 2009

Multiprocessor Highlights

Lecture 22: Consistency Models, TM

Shared Memory Consistency Models: A Tutorial

Lecture 10: Consistency Models

Memory Consistency Models

Relaxed Consistency Part 2

Chapter 6: Synchronization Tools

Relaxed Consistency Finale

Programming with Shared Memory - 3 Recognizing parallelism

Programming with Shared Memory Specifying parallelism

Lecture 11: Consistency Models

Presentation transcript:

Shared Memory Consistency Models

Quiz (1)  Let’s define shared memory

We often use figures like this…  But perhaps shared memory is not about how CPUs/memory are wired… memory CPU

Is this shared memory? memory module CPU memory module memory module CROSSBAR SWITCH

And we have a cache, too  Is this still a “shared” memory? memory CPU $ $$$

Observation  Defining shared memory in terms of how CPUs and memory are physically organized does not seem feasible  Moreover, it is not necessary either, at least from programs’ point of view

From programs’ point of view  What matters is the behavior of memory observed by programs  Vaguely, if a value written by a process is seen by another process, they share a memory no matter how this behavior is implemented  We try to define shared memory along this idea

Defining shared memory by its behavior  We try to define “possible behaviors  outcome of read operations  of memory system” in the presence of processes concurrently accessing them  We call such behaviors “consistency model” of shared memory

But why are we bothered? (1)  Otherwise we can never (formally) reason about the correctness of shared-memory programs  Implementation of a shared memory (either by HW or SW) needs such a definition too draw the boundary between legitimate optimizations and illegal ones

But why are we bothered? (2)  What we (most of us) consider “the natural definition” of shared memory turns out very difficult to implement efficiently we have caches (replicas) that make implementation far from trivial many optimizations violate the natural behavior  Most part of most shared memory programs can work with more relaxed behaviors

But why are we bothered? (3)  Therefore many definitions of consistency models have been invented and implemented  They are called relaxed consistency models, relaxed memory models, etc.

Sequential consistency  The first “formally defined” behavior of shared memory by Lamport Lamport, "How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs," IEEE Trans. Computers, Vol. C-28, No. 9, Sept. 1979, pp  Presumably most of us consider it natural  Before defining it, let’s see how natural it is

Quiz (2)  What are possible outputs? List all. P: x = 1; printf(“y = %d\n”, y); Initially: x = 0; y = 0; Q: y = 1; printf(“x = %d\n”, x);

Which of the following four are possible? x = 0x = 1 y = 0 y = 1

(0, 0) seems impossible…  P : x = 1; read y;  Q : y = 1; read x;  Possible orderings x = 1; read y; y = 1; read x;  (1, 0) x = 1; y = 1; read y; read x;  (1, 1) x = 1; y = 1; read x; read y;  (1, 1) y = 1; x = 1; read y; read x;  (1, 1) y = 1; x = 1; read x; read y;  (1, 1) y = 1; read x; x = 1; read y;  (0, 1)

Or more concisely,  P : x = 1; read y;  Q : y = 1; read x;  if P reads zero, then y = 1 by Q comes after read y. The only possible sequence in this case is: x = 1; read y; y = 1; read x  Q reads 1  Thus (x, y) = (0, 0) cannot happen

By the way,  This is the basic of a classical mutual exclusion algorithm found in OS textbooks /* Entry section for P2 */ Q2 := True; TURN := 2; wait while Q1 and TURN = 2; /* Exit section for P2 */ Q2 := False; /* Entry section for P1 */ Q1 := True; TURN := 1; wait while Q2 and TURN = 1; /* Exit section for P1 */ Q1 := False;

 Somewhat outdated material no longer works in relaxed models today’s CPUs has supports more straightforward ways to implement mutual exclusion (compare-swap, LL/SC, etc.)

Back to the subject  The assumption underlying the above discussion is the very definition of “sequential consistency”

Definition of sequential consistency (preliminary)  Processes access memory, by issuing: a = x /* write x to variable a */ a /* read from variable a */  An execution of a program generates events of the following two kinds: WRITE P (a, x) /* P writes x to variable a */ READ P (a, x) /* P reads x from variable a */  We use “processes” and “processors” interchangeably

Definition  A shared memory is sequentially consistent (SC) iff for any execution, there is a total order < among all READ/WRITE events such that: if a process p performs e 1 before e 2 then e 1 < e 2 preserve the program order for each READ P (a, x), if we let WRITE Q (a, y) be the last write to a in the above total order, then x = y read returns the last write

Informally, it says:  to reason about possible outcome of the program, interleave all reads/writes in all possible ways and assume each read gets the value of the last write to the read location P’s accesses Q’s accesses

So far so good  We will see a reasonable optimization easily breaks SC  Let’s assume we are implementing a shared memory multiprocessor of two CPUs, with caches $ mem $

 Recall the previous program and assume both CPUs cache x and y main memory is not important in this example x=0 y=0 x=0 y=0

 P writes 1 to x. It will need update (or invalidate) the other cache P x=1 y=0 Q x=0 y=0

 A processor does not want to block while update/invalidation is in progress (a reasonable optimization for an architect)  P may then get 0 from y in the cache P x=1 y=0 Q x=0 y=0

 Q may experience a similar sequence and get 0 from x in the cache P x=1 y=0 Q x=0 y=1

 We ended up with both processors’ reading zeros  This violates SC P0P0 x=1 y=1 Q0Q0 x=1 y=1

Looking back (1)  P writes 1 to x P  P sends an update msg to Q  P reads 0 from y P  Q writes 1 to y Q  Q sends an update msg to P  Q reads 0 from x Q  P receives an update msg and write 1 to y P  Q receives an update msg and write 1 to x Q

Looking back (2)  In intuitive terms, “a write is not atomic”, because a single write must update multiple locations (caches)  Definition of SC (total order among R/W events) can be interpreted as saying “a write is atomic”

What if we do not have caches?  Assume there are no caches, but there are multiple memory modules  Assume there is no single bus that serializes every access P x=0 y=0 Q

How to fix it (one possibility)  A write by processor P first gets an exclusive access to the bus  P sends an update/invalidate msg to the other cache  The other cache replies with an acknowledgement after updating the variable  P blocks (does not issue further memory accesses) until it receives the acknowledgement  P updates its cache and releases the bus Essentially, really serialize all accesses

Illustrated  During (1) and (2), P blocks (stalls) The bus won’t be granted for other accesses PQ (1) update/invalidate (2) ack

Can you prove this implements SC?  For simplicity, assume No main memory (cache only) Data are always on both caches An update protocol A write sends the new value to the other cache  Reads never miss. It immediately returns the value currently on the cache

Outline  Model the protocol as a distributed-memory (asynchronous message passing) program define relevant events (acquire_bus, recv_update, recv_ack, release_bus, read) call them micro-events an execution of such a protocol generates a total order of such micro-events. from the execution, construct a total order of READs/WRITEs satisfying the definition of SC

Relaxed Memory Consistency Models  So many “weaker” consistency models have been proposed both for multiprocessors, software shared memory, programming languages, file systems,...  They are generically called “relaxed memory consistency”

Models in the literature  processor consistency  total store order, partial store order, relaxed memory ordering  weak consistency  release consistency  lazy release consistency ...

How they are generally defined  Which memory accesses may be reordered A processor Q may observe another processor P’s writes differently from the order P issues them  Writes may not be atomic Processors Q and R may observe another processor P’s writes differently from each other

Memory barrier  Processors not supporting SC usually have separate “memory barrier” instructions to enforce ordering/completion of instructions  usually called “memory barrier” or “fence” sfence, lfence, mfence (Pentium) membar (SPARC) wmb, mb (Alpha) etc.

Variants  Different instructions enforce ordering between different kinds (load/store) of memory accesses  e.g., SPARC “membar #StoreLoad” ensures following loads do not bypass previous stores  e.g., Pentium “lfence” ensures following loads do not bypass previous loads

Semantics of memory barrier R W R membar R W R...  if processor P issues “a ; membar ; b” in this order, another processor Q will observe a before b  all membar events are totally ordered and the order preserves the program order

In implementation terms  membar will stall processor until all previous accesses have been completed e.g., until in-transit load instructions have returned values, and in-transit cache invalidations have been acknowledged

Memory consistency for programming languages  So far we have been dealing with semantics of “processors” (or machine languages)  Ideally, all programming languages should define precise consistency models too, but they rarely do

Today’s common practice (1) C/C++  “you know which expression access memory” *p, p->x, p[0],... they are not actually trivial at all! global variables non-pointer structs optimizations eliminating memory accesses Programmers somehow control/predict them by inserting volatile etc.

Today’s common practice (2) most high-level languages  Do not write programs for which subtle consistency semantics matter only use supported idioms mutex, cond_var,..., for synchronization, to guarantee “there are no races”  What if there are races ?  undefined (rarely stated explicitly)

High-level languages  What are races? conflicting accesses to the same data  What are conflicting accesses? not separated by supported synchronization idioms (unlock -> lock, cond_signal -> cond_wait) and one of them is a “write”

The third way : Java  We will see presentation of the last week  Java has “synchronized” (lock), and wait/notify (condition variable) used for most synchronization operations  At the same time, Java also defines behavior under races (memory consistency model) discussion in the community revealed how intricated it is