Presentation is loading. Please wait.

Presentation is loading. Please wait.

Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture.

Similar presentations


Presentation on theme: "Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture."— Presentation transcript:

1 Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture of shared memory systems: R/W of Misses + Cache Invalidate CPU0CPU1CPU2CPU3 Shared Memory Read/ Write Local Cache Local Cache Local Cache Local Cache Symmetric Multi-Processor (SMP): CPU0CPU1CPU2CPU3 Local Memory Module Local Memory Module Local Memory Module Local Memory Module Network Distributed Shared Memory (DSM):

2 The Million $$s Question: How/When Does One Process Read Other Process’s Writes? CPUi Write value x to local copy of shared variable V W V,x Assumption: Initial value of shared variables is always 0. CPUj R V,0? R V,x? Read V from local copy Why is this a question? Because temporal order relations like “before/after” do not necessarily hold in a distributed system.

3 Non-Atomic write s/ read s (also called load s /store s) A read by Pi is considered performed with respect to Pk at a point in time when the issuing of a write to the same address by Pk cannot affect the value returned by the read. A write by Pi is considered performed with respect to Pk at a point in time when an issued read to the same address by Pk returns the value defined by this write (or a subsequent write to the same location). An access is performed when it is performed with respect to all processors. A read is globally performed if it is performed and if the write that is the source of the returned value has been performed. In what follows, we will think of atomic read/write but these definitions can be used to generalize.

4 Why Memory Model? a=0,b=0 Print(b)Print(a) a=1b=1 Printed:0,0? Printed:1,0? Printed:1,1? Answers the question: “Which writes by a process are seen by which reads of the other processes?”

5 Memory Consistency Models Pi: R V; W V,7; R V; R V Pj: R V; W V,13; R V; R V Example program: A consistency/memory model is an “agreement” between the execution environment (H/W, OS, middleware) and the processes. Runtime guarantees to the application certain properties on the way values written to shared variables become visible to reads. This determines the memory model, what’s valid, what’s not. Example execution: Pi: R V,0; W V,7; R V,7; R V,13 Pj: R V,0; W V,13; R V,13; R V,7 Order of writes to V as seen to Pi: (1) W V,7; (2) W V,13 Order of writes to V as seen to Pj: (1) W V,13; (2) W V,7

6 Memory Model: Coherence Coherence is the memory model in which (the runtime guarantees to the program that) writes performed by the processes for every specific variable are viewed by all processes in the same full order. Example program:All valid executions under Coherence: Pi: W V,7 R V Pj: W V,13 R V The Register Property: the view of a process consists of the values it “sees” in its reads, and the writes it performs. If a R V in P which is later than W V,x in P sees value different than x, then a later R V cannot see x. Pi: W V,7 R V,7 Pj: W V,13 R V,13 R V,7 Pi: W V,7 R V,7 Pj: W V,13 R V,7 Pi: W V,7 R V,7 R V,13 Pj: W V,13 R V,13 Pi: W V,7 R V,13 Pj: W V,13 R V,13 Pi: W V,7 R V,7 Pj: W V,13 R V,13

7 Formal definition of Coherence Program Order: The order in which instructions appear in each process. This is a partial order on all the instructions in the program. A serialization: A full order on all the instructions (reads/writes) of all the processes, which is consistent with the program order. A legal serialization: A serialization in which each read X returns the value written by the latest write X in the full order. Let P be a program; let P X be the “sub-program” of P which contains all the read X/write X operations on X only. Coherence: P is said to be coherent if for every variable X there exists a legal serialization of P X. (Note: a process cannot distinguish one such serialization from another for a given execution)

8 Examples Coherent. Serializations: x: write x,1, read x,1 y: write y,1, read y,1 Not Coherent. Cycle of dependencies. Cannot be serialized. Not Coherent. Cannot be serialized. Process 1 read x,1 write x,2 Process 2 read x,2 write x,1 Process 1 write x,1 write x,2 Process 2 read x,2 read x,1 Process 2 read y,1 write x,1 Process 1 read x,1 write y,1 Process 2 read y,1 write x,1

9 Sequential Consistency [Lamport 1979] Sequential Consistency is the memory model in which all reads/writes performed by the processes are viewed by all processes in the same full order. Coherent. Not Sequentially consistent. Coherent. Not Sequentially consistent. Process 1 write x,1 write y,1 Process 2 read y,1 read x,0 Process 1 read x,1 write y,1 Process 2 read y,1 write x,1

10 Strict (Strong) Memory Models a=0,b=0 Print(b)Print(a) a=1b=1 Printed:0,0 or 0,1 or 1,0 Printed:1,1 Sequential Consistency: Given an execution, there exists an order of reads/writes which is consistent with all program orders. Coherence: For any variable x, there exists an order of read x/write x consistent with all p.o.s.

11 Formal definition of Sequential Consistency Let P be a program. Sequential Consistency: P is said to be sequentially consistent if there exists a legal serialization of all reads/writes in P. Observation: Every program which is sequentially consistent is also coherent. Conclusion: Sequential Consistency has stronger requirements and we thus say that it is stronger than Coherence. In general: A consistency model A is said to be (strictly) stronger than B if all executions which are valid under A are also valid under B.

12 The problem of strong consistency models The runtime system should ensure the existence of legal serialization, and the same consistent view for all processes. This requires lots of expensive coordination  degrades performance! P1: Print(U) Write V,1 P2: Print(V) Write U,1 SC: Hardware cannot reorder locally in each thread for this will result in a possible printing 1,1. HW may reorder anyway and postpone writes, but then why reorder in the first place?

13 Coherence Forbids Reordering p.x = 0 p.x=1 a=p.x b=q.x assert(a  b) Once thread sees an update – cannot “forget” it has seen it.  Cannot reorder two reads of the same memory location. q.x is aliased to p.x. Reordering may make assignment to b early (seeing 0) and that to a late (seeing 1). The right thread see order of writes different from left thread.

14 Coherence makes read s prevent common compiler optimizations p and q might point to same object p.x = 0 p.x=1 a=p.x b=q.x assert(p==q  a  b  c) Cannot put c=ac=p.x reads can make a process see writes by another process. The read “kills” later reuse of local values.

15 15 Release Consistency [Gharachorloo et al. 1990, DASH] Introduces a special type of variables, called synchronization variables or locks. Locks cannot be read or written. They can be acquired and released, denoted acquire(L) and release(L) for a lock L. A process that acquired a lock L but has not released it, holds it. No more than one process can hold a lock L, while others wait. K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J.L. Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15--26. IEEE, May 1990.

16 16 Using release and acquire to define execution-flow synchronization primitives Let a set of processes release tokens by reaching the operation Release in their program order. Let another set (possibly with overlap) acquire those tokens by performing acquire operation, where acquire can proceed only when all tokens have already arrived from all releasing processes. 2-way synchronization = lock-unlock, 1 release, 1 acquire n-way synchronization = barrier, n releases, n acquires PARC’s synch = k-way synchronization

17 17 Model of Atomicity A read by P i is considered performed with respect to process P k at a point in time when the issuing of a write to the same address by P k can not affect the value returned by the read. A write by P i is considered performed with respect to process P k at a point in time when an issued read to the same address by P k returns the value defined by this write (or a later value). An access is performed when it is performed with respect to all processes. An acquire(L) by P i is performed when P i receives exclusive ownership of L (before any other requester). A release(L) by P i is performed when P i gives away its exclusive ownership of L.

18 18 Formal Definition of Release Consistency Conditions for Release Consistency: (A)Before a read or write access is allowed to perform with respect to any other process, all previous acquire accesses must be performed, and (B)Before a release access is allowed to perform with respect to any other process, all previous read or write accesses must be performed, and (C) acquire and release accesses are sequentially consistent.

19 19 (Almost Formal) Definition of Release Consistency Assuming atomic read/write/acquire/release (never the case. For simplicity only) (A)Before a read or write access is allowed to perform, all preceding (program order) acquire accesses must be performed, and (B)Before a release access is allowed to perform, all preceding (program order) read or write accesses must be performed, and (C) acquire and release accesses are sequentially consistent.

20 20 A B rel(L 1 ) w(x)1 t t r(x)0 r(x)? acq(L 1 ) r(x)1 Understanding RC From this point in this execution all processes must see the value 1 in X It is undefined what value is read here. It can be any value written by some process. Here it can be 0 or 1. According to rule (B): 1 is read in the current execution. However, the programmer cannot be sure 1 will be read in all executions. According to rules (A) and (C), the programmer knows that in all executions this read returns 1.

21 21 Acquire and Release release serves as a memory-synch operation, or a flush of the local modifications to all other processes. acquire and release are not only used for synchronization of execution, but also for synchronization of memory, i.e. for propagation of writes from/to other processes. –This allows to overlap the two expensive types of synchronization. –This turns out also simpler on the programmer from semantic point of view.

22 22 Acquire and Release (cont.) A release followed by an acquire of the same lock guarantees (the programmer) that all writes previous to the release will be seen by all reads following the acquire. The idea is to let the programmer decide which blocks of operations need be synchronized, and put them between matching pair of acquire-release operations. In the absence of release/acquire pairs, there is no assurance that modifications will ever propagate between processes.

23 23 Consistency of synchronization operations Note the relations of the release/acquire operations to themselves also define an independent memory consistency scheme. –The rule (C) defined it to be Sequential Consistency. There are other flavors of RC in which the consistency of synchronization operations defined to be some consistency x (e.g., Coherence). Such a memory model is denoted by RCx. RCx is weaker than RCy if x is weaker than y. For simplicity, we deal only with RCsc.

24 24 Happened-Before relation induced by acquire/release Redefine the happened-before relation using acquire and release instead of receive and send respectively. We say that event e happened before event e’ (and denote it by e  e’ or e < e’) if one of the following properties holds: Processor Order: e precedes e’ in the same process Release-Acquire: e is a release and e’ is the following acquire of the same lock Transitivity: exists e’’ s.t. e < e’’ and e’’< e’

25 25 Happened-Before relation induced by acquire/release A B rel(L 2 ) w(x) t t acq(L 1 ) r(x) acq(L 2 ) rel(L 1 )acq(L 2 ) w(y) r(y) rel(L 2 ) w(y) rel(L 1 )r(x) C r(y) w(x)

26 26 Competing Accesses Two memory accesses are not synchronized if they are independent events according to the previously defined happened-before relationship. Two memory accesses are conflicting if they are accesses to the same memory location, and at least one of them is a write. Conflicting accesses are said to be competing if there exists an execution in which they are not synchronized. Competing accesses form a race condition as they may be executed concurrently.

27 27 Data Races in RC Release Consistency does not guarantee anything about ordered propagation of updates Initially : grades = oldDatabase; updated = false; grades = newDatabase; updated = true; while (updated == false); X:=grades.gradeOf(lecturersSon); Thread T.A. Thread Lecturer If the modification of variable updated is passed to Lecturer while the modification of grades is not, then Lecturer looks at the old database! This is possible in Release Consistency, but not in Sequential Consistency.

28 28 Expressiveness of Release Consistency [Gharachorloo et.al 1990] Let a properly-labeled (PL) program be such that has no competing accesses. Theorem: RCsc = SC for PL programs. Should make sure there are no data-races.

29 29 Expressiveness of Release Consistency [Gharachorloo et.al 1990] Theorem: RC = SC for programs having no data-races. Given a data-race-free program P, the set of valid executions for P is the same on systems providing RC and those providing SC. Conclusion (OpenMP, Java, C++, etc): System provide RC (performance) Programmer avoid data-races (program verification)  Best of both worlds!

30 30 Lazy Release Consistency [Keleher et al., Treadmarks 1992]* Postpone modifications until remote process “really” needs them More relaxed than RC P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenopol. Treadmarks: Distributed shared memory on standard workstations and operating systems. In Proceedings of the 1994 Winter Usenix Conference, pages 115--132, Jan. 1994. (*)

31 31 Formal Definition of Lazy Release Consistency (A)Before a read or write access is allowed to perform with respect to any other process, all previous acquire accesses must be performed with respect to that other process, and (B)Before a release access is allowed to perform with respect to any other process, all previous read or write accesses must be performed with respect to that other process, and (C) acquire and release accesses are sequentially consistent.

32 32 A B rel(L 1 ) w(x)1 t t r(x)0 r(x)? acq(L 1 ) r(x)?r(x)1 Understanding the LRC Memory Model C r(x)0 r(x)? acq(L 2 ) r(x)? It is guaranteed that the acquirer of the same lock sees the modification that precede the release in program order.

33 33 Understanding the LRC Memory Model: Transitivity The process C sees the modification of x by A. A B rel(L 1 ) w(x)1 t t acq(L 1 ) C acq(L 2 ) w(y)1 rel(L 2 ) r(x)1 r(y)1 acq(L 2 ) rel(L 1 )

34 34 Implementation of LRC Satisfying the happened-before relation between all operations is enough to satisfy LRC. –Maintenance and usage of such a detailed ordering would be expensive. Instead, the ordering is applied to process intervals. –Intervals are segments of time in the execution of a single process. –New interval begins each time a process executes a synchronization operation.

35 35 Intervals P2P2 acq(L 1 ) rel(L 2 ) acq(L 2 ) rel(L 1 ) P1P1 t t P3P3 acq(L 3 ) acq(L 2 ) rel(L 3 ) 1 23 1 2 3 3 2 1 4 5

36 36 Happened-before of Intervals A happened before partial order is defined between intervals. An interval i 1 precedes an interval i 2 according to happened-before of intervals, if all accesses in i 1 precede accesses in i 2 according to the happened-before of accesses.

37 37 Vector Timestamps An interval is said to be performed at a process if all interval’s accesses have been performed at that process. Each process p has vector timestamp V p that tracks which intervals have been performed at that process. –A vector timestamp consists of a set of interval indices, one per process in the system.

38 38 Management of Vector Timestamps Vector timestamps are managed like vector clocks. –send and receive events are replaced by release and acquire (of the same lock) respectively. –A lock grant message (that is sent from releaser to acquirer to give acquire the exclusive ownership) contains the current timestamp of the releaser 1.Just before executing a release or acquire in p: V p [q]:= V p [q] + 1 2.A lock grant message m is time-stamped with t(m)=V p. 3.Upon acquire for every q: V p [q]:= max{ V p [q], t(m)[q] }

39 39 Vector Timestamps (cont.) A process updates its vector timestamp at the end of an interval. We denote the vector timestamp of process p at interval i by V p i. The entry for process q  p is denoted by V p i [q]. –It specifies the most recent interval of process q that has been performed at process p. –Entry V p i [p] is always equal to i. An interval x of process q is said to be covered by V p i if V p i [q]  x


Download ppt "Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture."

Similar presentations


Ads by Google