L AWS OF ORDER : EXPENSIVE SYNCHRONIZATION IN CONCURRENT ALGORITHMS CANNOT BE ELIMINATED POPL '11 Hagit Attiya, Rachid Guerraoui, Danny Hendler, Petr Kuznetsov, Maged M. Michael, Martin Vechev Presenter: Michael Gorelik 1
C ONTENTS. Motivation RAW & AWAR patterns Relaxed Memory Models Mutual Exclusion Examples Linearizability Examples Relaxed semantics Examples 2
M OTIVATION Building correct and efficient concurrent algorithms is known to be a difficult. To achieve efficiency, designers spend significant time trying to remove unnecessary and costly synchronization. 3
M OTIVATION RAW & AWAR PATTERNS Two common synchronization patterns that frequently arise in the design of concurrent algorithms are: read after write (RAW) atomic write after read (AWAR). 4
M OTIVATION E XPENSIVE RMW OPERATIONS We will see that many of the expensive synchronization operations like; locks, CAS, fences etc, uses RAW or AWAR patterns. Those operation are much slower then regular read/write (sometimes 50 times slower) 5
M OTIVATION M UTUAL EXCLUSION AND L INEARIZABILITY If we are to build a mutual exclusion algorithm or a linearizable algorithm, then in certain sequential executions of that algorithm, we must use either RAW or AWAR. If all executions of the algorithm do not use RAW or AWAR, then the algorithm is incorrect. 6
M OTIVATION AVOIDING EXPENSIVE SYNCHRONIZATION When can we avoid RAW and AWAR patterns? When we have no choice other then using those patterns? 7
C ONTENTS. Motivation RAW & AWAR patterns Sequential consistency Relaxed Memory Models Mutual Exclusion Examples Linearizability Examples Relaxed semantics Examples 8
RAW AND AWAR PATTERN RAW ( READ AFTER WRITE ) The RAW pattern consists of a process writing to some shared variable A, followed by the same process reading a different shared variable B, without that process writing to B in between. Write to ARead from B 9 Time line
ATOMIC RAW AND AWAR PATTERN AWAR ( ATOMIC WRITE AFTER READ ) The AWAR pattern consists of a process reading some shared variable followed by the process writing to the same shared variable, The entire read write sequence is atomic. (RMW) Read from AWrite to A 10 Time line
A memory consistency model for a shared address space specifies constraints on the order in which memory operations must appear to be performed. 11 MEMORY CONSISTENCY MODEL P1:P2:(A, flag are zero initial) A=1while(flag == 0); flag=1print A;
R ELAXED M EMORY M ODELS PROGRAM ORDER Intuitively, a read should return the value of the “last” write to the same memory location. In uniprocessors, “last” is precisely defined by program order, i.e., the order in which memory operations appear in the program. This is not the case in multiprocessors 12
R ELAXED M EMORY M ODELS SEQUENTIAL CONSISTENCY An intuitive extension of the uniprocessor model can be applied to the multiprocessor case. This model is called sequential consistency. Sequential consistency requires that all memory operations appear to execute one at a time, and the operations of a single processor appear to execute in the order described by that processor’s program. 13
R ELAXED M EMORY M ODELS SEQUENTIAL CONSISTENCY Sequential consistency disallows many hardware and compiler optimizations that are possible in uniprocessors by enforcing a strict order among shared memory operations. 14
VIOLATION OF SEQUENTIAL CONSISTENCY E XAMPLES We will see a violation of sequential consistency, over familiar hardware optimization that exist today. Buffer writing. Caching. Compiler optimization (will not be demonstrated here) 15
VIOLATION OF SEQUENTIAL CONSISTENCY BUFFER WRITING Write strategy is an important part of cache design. Buffering scheme is frequently used to reduce the overhead associated with write operations. 16
VIOLATION OF SEQUENTIAL CONSISTENCY D EKKER ’ S ALGORITHM ( BUFFER WRITING ) 17 The write is buffered, so that both process read 0 value. Violation of mutual exclusion
VIOLATION OF SEQUENTIAL CONSISTENCY D EKKER ’ S ALGORITHM ( BUFFER WRITING ) What pattern (AWAR or RAW) have we saw here? RAW 18
VIOLATION OF SEQUENTIAL CONSISTENCY CACHING 19
VIOLATION OF SEQUENTIAL CONSISTENCY CACHING 20 Updates for the writes of A by processors P1 and P2 may reach processors P3 and P4 in a different order. Processor P3 and P4 can return different values for their reads of A, making the writes of A appear non-atomic (write(1) followed by write(2) vs. writ(2) followed by write(1)).
R ELAXED M EMORY M ODELS Modern processor architectures use relaxed memory models, where guaranteeing RAW order among accesses to independent memory locations requires the execution of memory ordering instructions–often called memory fences or memory barriers–that enforce RAW order. 21
RELAXED MEMORY CONSISTENCY MODELS 22
C ONTENTS. Motivation RAW & AWAR patterns Sequential consistency Relaxed Memory Models Mutual Exclusion Examples Linearizability Examples Relaxed semantics Examples 23
MUTUAL EXCLUSION Mutual Exclusion: we cannot have multiple processes in their critical section at the same time. We will show that whenever a process has sequentially executed its lock section, then this execution must use RAW or AWAR. Otherwise, the algorithm does not satisfy the mutual exclusion specification and is incorrect. First we will show that a process have to write to a shared memory. Then we will show that mutual exclusion fails when avoiding RAW and AWAR. 24
N-P ROCESS MUTUAL EXCLUSION WITHOUT WRITING TO SHARED MEMORY Process i: Lock_i: … CS_i: … Unlock_i: … Process j: Lock_j: … CS_j: … Unlock_j: … Without writing to shared memory, there is no way for process j to know where process i is Process i does not write to shared memory Mutual Exclusion Fails 25
N-P ROCESS MUTUAL EXCLUSION WITHOUT USING RAW AND AWAR Process i: Lock_i: … CS_i: … Unlock_i: … Process j: Lock_j: … CS_j: … Unlock_j: … Process j performs a full sequential execution of its lock_j (process i still have not written to shared memory) Process i stops before writing to shared memory X (not using AWAR) Mutual Exclusion Fails Process i resumes its lock_i section, and performs the shared write to X (it over-writes any changes to X done by process j), if other shared memory was used by process j, process i can’t read this shared location without using RAW 26
N-P ROCESS MUTUAL EXCLUSION E XAMPLES One of the most common lock implementation is based on the test-and-set atomic sequence. What obvious pattern (RAW or AWAR) can be seen here?? Lock_i: while(CAS( lock, free, busy)==false) AWAR 27
2-P ROCESS MUTUAL EXCLUSION E XAMPLES Dekker’s mutual exclusion algorithm for 2- process is also a type of lock implementation. What obvious pattern (RAW or AWAR) can be seen here?? Lock_i: flag[i]=true; while (flag[1-i]) {….} RAW 28
C ONTENTS. Motivation RAW & AWAR patterns Sequential consistency Relaxed Memory Models Mutual Exclusion Examples Linearizability Examples Relaxed semantics Examples 29
L INEARIZABILITY AN INTUITIVE DEFINITION An algorithm is linearizable with respect to a sequential specification if each execution of the algorithm is equivalent to some sequential execution of the specification, where the order between the non-overlapping methods is preserved. The equivalence is defined by comparing the arguments and results of method invocations. 30
time q.enq(x) q.enq(y)q.deq(x) q.deq(y) linearizable q.enq(x) q.enq(y)q.deq(x) q.deq(y) time L INEARIZABILITY E XAMPLE ( A QUEUE ) 31
L INEARIZABILITY U SE OF RAW OR AWAR In the case of Linearizability, only some sequential executions of specific methods must use either RAW or AWAR. Unlike mutual exclusion where all sequential executions of a certain method (i.e., the lock section) must use either RAW or AWAR 2 properties of sequential execution are defined: Deterministic sequential specification. Strongly non-commutative methods. 32
L INEARIZABILITY D ETERMINISTIC SEQUENTIAL SPECIFICATIONS A sequential specification is deterministic if a method executes from the same state will always produce the same result. Many classic abstract data types have deterministic specification: sets, queues, etc. 33
L INEARIZABILITY S TRONGLY NON - COMMUTATIVE METHODS A method m1 is said to be strongly non- commutative if there exists some state in the specification from which m1 executed sequentially by process p can influence the result of a method m2 executed sequentially by process q, q = ̸ p, and vice versa, m2 can influence the result of m1 from the same state. m1 and m2 are performed by different processes. 34
L INEARIZABILITY S TRONGLY NON - COMMUTATIVE METHODS E XAMPLE (S ET ) Sequential specification of Set: Contains(k) Add(k) Remove(k) 35
L INEARIZABILITY S TRONGLY NON - COMMUTATIVE METHODS E XAMPLE (S ET ) Add(k): is it a Strongly non-commutative method? Yes. Set P1: Add(5) : trueP2: Add(5) : false P1: Add(5) : falseP2: Add(5) : true 5 There exists another method where both method invocations influence each other’s result starting from some state. 36
L INEARIZABILITY S TRONGLY NON - COMMUTATIVE METHODS E XAMPLE (S ET ) Contains(k): is it a Strongly non-commutative method? No. Set P1: Add(5) : trueP2: Contains(5) :true P1: Add(5) : trueP2: Contains(5) :false 5 The result of contains method can be influenced by a preceding add or remove. However, it’s execution can’t influence the result of the other methods. 37
L INEARIZABILITY U SE OF RAW OR AWAR I NFORMAL PROOF If a method is strongly non-commutative, then any of its sequential executions must perform a shared write, why? Otherwise, there is no way for the method to influence the result of any other method that is executed after it, and hence the method cannot be strongly non-commutative. 38
L INEARIZABILITY ADD(K) (RAW AND AWAR ARE NOT PRESENT ) CONT ’ 39 Process i: Add(k): {… Return res} Process j: Add(k): {… Return res} Process j performs a full sequential execution of Add(k) and returns true (process i still have not written to shared memory) Process i stops before writing to shared memory X (not using AWAR) Process i resumes and performs the shared write to X (it over- writes any changes to X done by process j), if other shared memory was used by process j, process i can’t read this shared location without using RAW, process i returns true
L INEARIZABILITY ADD(K) (RAW AND AWAR ARE NOT PRESENT ) CONT ’ If the algorithm is linearizable, there could only be two valid linearizations to a concurrent execution of Add(k). 40 Add(k) P1 P2 Linearization 1Linearization 2 Both execution will return different results, Add() will not be deterministic.
L INEARIZABILITY E XAMPLES (CAS) CAS( m, o, n ) can be implemented trivially with a linearizable algorithm that uses an atomic hardware instruction (also called CAS) and in that cast it includes AWAR pattern (just as was shown in the test-and-set case). CAS can be implemented by a linearizable algorithm which avoids AWAR, but uses RAW (Luchangco et al). 41 Bool WFCAS(Val ev, Val nv){ if (ev==nv) return WFRead()==ev; Blk b = L; b.X = p; if (b.Y) goto 27; ….
C ONTENTS. Motivation RAW & AWAR patterns Sequential consistency Relaxed Memory Models Mutual Exclusion Examples Linearizability Examples Relaxed semantics Examples 42
R ELAXED SEMANTICS P RACTICAL I MPLICATIONS How it is still possible to avoid RAW and AWAR??? By relaxing one or more of following dimensions: Deterministic Specification. Strong Non-commutativity. Single-Owner. Execution Detectors. 43
R ELAXED SEMANTICS E XAMPLES (I DEMPOTENT W ORK STEALING ) Idempotent Work Stealing: from deterministic to non-deterministic specification : Deterministic specification relaxation is exemplified by the idempotent work stealing introduced by Michael et al. ( by allowing each inserted item to be extracted at least once ) This relaxation allows to avoid RAW and AWAR in the owner’s methods. 44
R ELAXED SEMANTICS E XAMPLES (I DEMPOTENT W ORK STEALING ) CONT ’ WorkItem take() { 1. h = head; 2. t = tail; 3. if (h = t) return EMPTY; 4. task = tasks.array[h%tasks.size]; 5. head = h+1; 6. return task; } 45
R ELAXED SEMANTICS E XAMPLES (FIFO Q UEUE ) In examining concurrent algorithms for multi- consumer FIFO queues, one notes that either locking or CAS is used in the common path of nontrivial dequeue methods that return a dequeued item. We shown that locking uses either RAW or AWAR. We also shown that CAS uses either RAW or AWAR. 46
R ELAXED SEMANTICS E XAMPLES (FIFO Q UEUE ) FIFO Queue: from deterministic to non- deterministic specification: Dequeue can be executed by a single process, and therefore there is no need in RAW or AWAR. 47 Data dequeue() { if (tail = head) return EMPTY; Data data = Q[head mod m]; head = head +1 mod m; return data; } Lamport’s FIFO queue
48 Q UESTIONS ?