L AWS OF ORDER : EXPENSIVE SYNCHRONIZATION IN CONCURRENT ALGORITHMS CANNOT BE ELIMINATED POPL '11 Hagit Attiya, Rachid Guerraoui, Danny Hendler, Petr Kuznetsov,

Slides:



Advertisements
Similar presentations
Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Advertisements

Symmetric Multiprocessors: Synchronization and Sequential Consistency.
1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,
IBM T. J. Watson Research Center Conditions for Strong Synchronization Maged Michael IBM T J Watson Research Center Joint work with: Martin Vechev, Hagit.
1 Chapter 5 Concurrency: Mutual Exclusion and Synchronization Principals of Concurrency Mutual Exclusion: Hardware Support Semaphores Readers/Writers Problem.
1 Chapter 4 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
“FENDER” AUTOMATIC MEMORY FENCE INFERENCE Presented by Michael Kuperstein, Technion Joint work with Martin Vechev and Eran Yahav, IBM Research 1.
© Krste Asanovic, 2014CS252, Spring 2014, Lecture 12 CS252 Graduate Computer Architecture Spring 2014 Lecture 12: Synchronization and Memory Models Krste.
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Chapter 6: Process Synchronization
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 5: Process Synchronization.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Multiprocessor Synchronization Algorithms ( ) Lecturer: Danny Hendler The Mutual Exclusion problem.
CH7 discussion-review Mahmoud Alhabbash. Q1 What is a Race Condition? How could we prevent that? – Race condition is the situation where several processes.
Operating Systems CMPSC 473 Mutual Exclusion Lecture 13: October 12, 2010 Instructor: Bhuvan Urgaonkar.
Lock-free Cache-friendly Software Queue for Decoupled Software Pipelining Student: Chen Wen-Ren Advisor: Wuu Yang 學生 : 陳韋任 指導教授 : 楊武 Abstract Multicore.
“THREADS CANNOT BE IMPLEMENTED AS A LIBRARY” HANS-J. BOEHM, HP LABS Presented by Seema Saijpaul CS-510.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.
Introduction to Lock-free Data-structures and algorithms Micah J Best May 14/09.
6: Process Synchronization 1 1 PROCESS SYNCHRONIZATION I This is about getting processes to coordinate with each other. How do processes work with resources.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Martin Vechev IBM T.J. Watson Research Center Joint work with: Hagit Attiya, Rachid Guerraoui, Danny Hendler, Petr Kuznetsov, Maged Michael.
Computer Architecture II 1 Computer architecture II Lecture 9.
Synchronization in Java Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Memory Consistency Models
Operating Systems CSE 411 CPU Management Oct Lecture 13 Instructor: Bhuvan Urgaonkar.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Evaluation of Memory Consistency Models in Titanium.
Linearizability By Mila Oren 1. Outline  Sequential and concurrent specifications.  Define linearizability (intuition and formal model).  Composability.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 3 (26/01/2006) Instructor: Haifeng YU.
Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.
By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Fence Scoping Changhui Lin †, Vijay Nagarajan*, Rajiv Gupta † † University of California, Riverside * University of Edinburgh.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs.
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
Memory Consistency Zhonghai Lu Outline Introduction What is a memory consistency model? Who should care? Memory consistency models Strict.
Complexity Implications of Memory Models. Out-of-Order Execution Avoid with fences (and atomic operations) Shared memory processes reordering buffer Hagit.
CS533 Concepts of Operating Systems Jonathan Walpole.
CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.
An algorithm of Lock-free extensible hash table Yi Feng.
740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 20: Consistency Models, TM
Memory Consistency Models
Memory Consistency Models
A Lock-Free Algorithm for Concurrent Bags
Threads and Memory Models Hal Perkins Autumn 2011
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Consistency Models.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Module 7a: Classic Synchronization
Threads and Memory Models Hal Perkins Autumn 2009
Lecture 22: Consistency Models, TM
Distributed Shared Memory
Shared Memory Consistency Models: A Tutorial
Sitting on a Fence: Complexity Implications of Memory Reordering
Lecture 10: Consistency Models
Memory Consistency Models
CSE 153 Design of Operating Systems Winter 19
Chapter 6: Synchronization Tools
Programming with Shared Memory Specifying parallelism
Lecture: Consistency Models, TM
Lecture 11: Consistency Models
Presentation transcript:

L AWS OF ORDER : EXPENSIVE SYNCHRONIZATION IN CONCURRENT ALGORITHMS CANNOT BE ELIMINATED POPL '11 Hagit Attiya, Rachid Guerraoui, Danny Hendler, Petr Kuznetsov, Maged M. Michael, Martin Vechev Presenter: Michael Gorelik 1

C ONTENTS. Motivation RAW & AWAR patterns Relaxed Memory Models Mutual Exclusion Examples Linearizability Examples Relaxed semantics Examples 2

M OTIVATION Building correct and efficient concurrent algorithms is known to be a difficult. To achieve efficiency, designers spend significant time trying to remove unnecessary and costly synchronization. 3

M OTIVATION RAW & AWAR PATTERNS Two common synchronization patterns that frequently arise in the design of concurrent algorithms are: read after write (RAW) atomic write after read (AWAR). 4

M OTIVATION E XPENSIVE RMW OPERATIONS We will see that many of the expensive synchronization operations like; locks, CAS, fences etc, uses RAW or AWAR patterns. Those operation are much slower then regular read/write (sometimes 50 times slower) 5

M OTIVATION M UTUAL EXCLUSION AND L INEARIZABILITY If we are to build a mutual exclusion algorithm or a linearizable algorithm, then in certain sequential executions of that algorithm, we must use either RAW or AWAR. If all executions of the algorithm do not use RAW or AWAR, then the algorithm is incorrect. 6

M OTIVATION AVOIDING EXPENSIVE SYNCHRONIZATION When can we avoid RAW and AWAR patterns? When we have no choice other then using those patterns? 7

C ONTENTS. Motivation RAW & AWAR patterns Sequential consistency Relaxed Memory Models Mutual Exclusion Examples Linearizability Examples Relaxed semantics Examples 8

RAW AND AWAR PATTERN RAW ( READ AFTER WRITE ) The RAW pattern consists of a process writing to some shared variable A, followed by the same process reading a different shared variable B, without that process writing to B in between. Write to ARead from B 9 Time line

ATOMIC RAW AND AWAR PATTERN AWAR ( ATOMIC WRITE AFTER READ ) The AWAR pattern consists of a process reading some shared variable followed by the process writing to the same shared variable, The entire read write sequence is atomic. (RMW) Read from AWrite to A 10 Time line

A memory consistency model for a shared address space specifies constraints on the order in which memory operations must appear to be performed. 11 MEMORY CONSISTENCY MODEL P1:P2:(A, flag are zero initial) A=1while(flag == 0); flag=1print A;

R ELAXED M EMORY M ODELS PROGRAM ORDER Intuitively, a read should return the value of the “last” write to the same memory location. In uniprocessors, “last” is precisely defined by program order, i.e., the order in which memory operations appear in the program. This is not the case in multiprocessors 12

R ELAXED M EMORY M ODELS SEQUENTIAL CONSISTENCY An intuitive extension of the uniprocessor model can be applied to the multiprocessor case. This model is called sequential consistency. Sequential consistency requires that all memory operations appear to execute one at a time, and the operations of a single processor appear to execute in the order described by that processor’s program. 13

R ELAXED M EMORY M ODELS SEQUENTIAL CONSISTENCY Sequential consistency disallows many hardware and compiler optimizations that are possible in uniprocessors by enforcing a strict order among shared memory operations. 14

VIOLATION OF SEQUENTIAL CONSISTENCY E XAMPLES We will see a violation of sequential consistency, over familiar hardware optimization that exist today. Buffer writing. Caching. Compiler optimization (will not be demonstrated here) 15

VIOLATION OF SEQUENTIAL CONSISTENCY BUFFER WRITING Write strategy is an important part of cache design. Buffering scheme is frequently used to reduce the overhead associated with write operations. 16

VIOLATION OF SEQUENTIAL CONSISTENCY D EKKER ’ S ALGORITHM ( BUFFER WRITING ) 17 The write is buffered, so that both process read 0 value. Violation of mutual exclusion

VIOLATION OF SEQUENTIAL CONSISTENCY D EKKER ’ S ALGORITHM ( BUFFER WRITING ) What pattern (AWAR or RAW) have we saw here? RAW 18

VIOLATION OF SEQUENTIAL CONSISTENCY CACHING 19

VIOLATION OF SEQUENTIAL CONSISTENCY CACHING 20 Updates for the writes of A by processors P1 and P2 may reach processors P3 and P4 in a different order. Processor P3 and P4 can return different values for their reads of A, making the writes of A appear non-atomic (write(1) followed by write(2) vs. writ(2) followed by write(1)).

R ELAXED M EMORY M ODELS Modern processor architectures use relaxed memory models, where guaranteeing RAW order among accesses to independent memory locations requires the execution of memory ordering instructions–often called memory fences or memory barriers–that enforce RAW order. 21

RELAXED MEMORY CONSISTENCY MODELS 22

C ONTENTS. Motivation RAW & AWAR patterns Sequential consistency Relaxed Memory Models Mutual Exclusion Examples Linearizability Examples Relaxed semantics Examples 23

MUTUAL EXCLUSION Mutual Exclusion: we cannot have multiple processes in their critical section at the same time. We will show that whenever a process has sequentially executed its lock section, then this execution must use RAW or AWAR. Otherwise, the algorithm does not satisfy the mutual exclusion specification and is incorrect. First we will show that a process have to write to a shared memory. Then we will show that mutual exclusion fails when avoiding RAW and AWAR. 24

N-P ROCESS MUTUAL EXCLUSION WITHOUT WRITING TO SHARED MEMORY Process i: Lock_i: … CS_i: … Unlock_i: … Process j: Lock_j: … CS_j: … Unlock_j: … Without writing to shared memory, there is no way for process j to know where process i is Process i does not write to shared memory Mutual Exclusion Fails 25

N-P ROCESS MUTUAL EXCLUSION WITHOUT USING RAW AND AWAR Process i: Lock_i: … CS_i: … Unlock_i: … Process j: Lock_j: … CS_j: … Unlock_j: … Process j performs a full sequential execution of its lock_j (process i still have not written to shared memory) Process i stops before writing to shared memory X (not using AWAR) Mutual Exclusion Fails Process i resumes its lock_i section, and performs the shared write to X (it over-writes any changes to X done by process j), if other shared memory was used by process j, process i can’t read this shared location without using RAW 26

N-P ROCESS MUTUAL EXCLUSION E XAMPLES One of the most common lock implementation is based on the test-and-set atomic sequence. What obvious pattern (RAW or AWAR) can be seen here?? Lock_i: while(CAS( lock, free, busy)==false) AWAR 27

2-P ROCESS MUTUAL EXCLUSION E XAMPLES Dekker’s mutual exclusion algorithm for 2- process is also a type of lock implementation. What obvious pattern (RAW or AWAR) can be seen here?? Lock_i: flag[i]=true; while (flag[1-i]) {….} RAW 28

C ONTENTS. Motivation RAW & AWAR patterns Sequential consistency Relaxed Memory Models Mutual Exclusion Examples Linearizability Examples Relaxed semantics Examples 29

L INEARIZABILITY AN INTUITIVE DEFINITION An algorithm is linearizable with respect to a sequential specification if each execution of the algorithm is equivalent to some sequential execution of the specification, where the order between the non-overlapping methods is preserved. The equivalence is defined by comparing the arguments and results of method invocations. 30

time q.enq(x) q.enq(y)q.deq(x) q.deq(y) linearizable q.enq(x) q.enq(y)q.deq(x) q.deq(y) time L INEARIZABILITY E XAMPLE ( A QUEUE ) 31

L INEARIZABILITY U SE OF RAW OR AWAR In the case of Linearizability, only some sequential executions of specific methods must use either RAW or AWAR. Unlike mutual exclusion where all sequential executions of a certain method (i.e., the lock section) must use either RAW or AWAR 2 properties of sequential execution are defined: Deterministic sequential specification. Strongly non-commutative methods. 32

L INEARIZABILITY D ETERMINISTIC SEQUENTIAL SPECIFICATIONS A sequential specification is deterministic if a method executes from the same state will always produce the same result. Many classic abstract data types have deterministic specification: sets, queues, etc. 33

L INEARIZABILITY S TRONGLY NON - COMMUTATIVE METHODS A method m1 is said to be strongly non- commutative if there exists some state in the specification from which m1 executed sequentially by process p can influence the result of a method m2 executed sequentially by process q, q = ̸ p, and vice versa, m2 can influence the result of m1 from the same state. m1 and m2 are performed by different processes. 34

L INEARIZABILITY S TRONGLY NON - COMMUTATIVE METHODS E XAMPLE (S ET ) Sequential specification of Set: Contains(k) Add(k) Remove(k) 35

L INEARIZABILITY S TRONGLY NON - COMMUTATIVE METHODS E XAMPLE (S ET ) Add(k): is it a Strongly non-commutative method? Yes. Set P1: Add(5) : trueP2: Add(5) : false P1: Add(5) : falseP2: Add(5) : true 5 There exists another method where both method invocations influence each other’s result starting from some state. 36

L INEARIZABILITY S TRONGLY NON - COMMUTATIVE METHODS E XAMPLE (S ET ) Contains(k): is it a Strongly non-commutative method? No. Set P1: Add(5) : trueP2: Contains(5) :true P1: Add(5) : trueP2: Contains(5) :false 5 The result of contains method can be influenced by a preceding add or remove. However, it’s execution can’t influence the result of the other methods. 37

L INEARIZABILITY U SE OF RAW OR AWAR I NFORMAL PROOF If a method is strongly non-commutative, then any of its sequential executions must perform a shared write, why? Otherwise, there is no way for the method to influence the result of any other method that is executed after it, and hence the method cannot be strongly non-commutative. 38

L INEARIZABILITY ADD(K) (RAW AND AWAR ARE NOT PRESENT ) CONT ’ 39 Process i: Add(k): {… Return res} Process j: Add(k): {… Return res} Process j performs a full sequential execution of Add(k) and returns true (process i still have not written to shared memory) Process i stops before writing to shared memory X (not using AWAR) Process i resumes and performs the shared write to X (it over- writes any changes to X done by process j), if other shared memory was used by process j, process i can’t read this shared location without using RAW, process i returns true

L INEARIZABILITY ADD(K) (RAW AND AWAR ARE NOT PRESENT ) CONT ’ If the algorithm is linearizable, there could only be two valid linearizations to a concurrent execution of Add(k). 40 Add(k) P1 P2 Linearization 1Linearization 2 Both execution will return different results, Add() will not be deterministic.

L INEARIZABILITY E XAMPLES (CAS) CAS( m, o, n ) can be implemented trivially with a linearizable algorithm that uses an atomic hardware instruction (also called CAS) and in that cast it includes AWAR pattern (just as was shown in the test-and-set case). CAS can be implemented by a linearizable algorithm which avoids AWAR, but uses RAW (Luchangco et al). 41 Bool WFCAS(Val ev, Val nv){ if (ev==nv) return WFRead()==ev; Blk b = L; b.X = p; if (b.Y) goto 27; ….

C ONTENTS. Motivation RAW & AWAR patterns Sequential consistency Relaxed Memory Models Mutual Exclusion Examples Linearizability Examples Relaxed semantics Examples 42

R ELAXED SEMANTICS P RACTICAL I MPLICATIONS How it is still possible to avoid RAW and AWAR??? By relaxing one or more of following dimensions: Deterministic Specification. Strong Non-commutativity. Single-Owner. Execution Detectors. 43

R ELAXED SEMANTICS E XAMPLES (I DEMPOTENT W ORK STEALING ) Idempotent Work Stealing: from deterministic to non-deterministic specification : Deterministic specification relaxation is exemplified by the idempotent work stealing introduced by Michael et al. ( by allowing each inserted item to be extracted at least once ) This relaxation allows to avoid RAW and AWAR in the owner’s methods. 44

R ELAXED SEMANTICS E XAMPLES (I DEMPOTENT W ORK STEALING ) CONT ’ WorkItem take() { 1. h = head; 2. t = tail; 3. if (h = t) return EMPTY; 4. task = tasks.array[h%tasks.size]; 5. head = h+1; 6. return task; } 45

R ELAXED SEMANTICS E XAMPLES (FIFO Q UEUE ) In examining concurrent algorithms for multi- consumer FIFO queues, one notes that either locking or CAS is used in the common path of nontrivial dequeue methods that return a dequeued item. We shown that locking uses either RAW or AWAR. We also shown that CAS uses either RAW or AWAR. 46

R ELAXED SEMANTICS E XAMPLES (FIFO Q UEUE ) FIFO Queue: from deterministic to non- deterministic specification: Dequeue can be executed by a single process, and therefore there is no need in RAW or AWAR. 47 Data dequeue() { if (tail = head) return EMPTY; Data data = Q[head mod m]; head = head +1 mod m; return data; } Lamport’s FIFO queue

48 Q UESTIONS ?