Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs.

Slides:



Advertisements
Similar presentations
Operating Systems Semaphores II
Advertisements

Symmetric Multiprocessors: Synchronization and Sequential Consistency.
Inherent limitations facilitate design and verification of concurrent programs Hagit Attiya Technion.
IBM T. J. Watson Research Center Conditions for Strong Synchronization Maged Michael IBM T J Watson Research Center Joint work with: Martin Vechev, Hagit.
1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Operating Systems Part III: Process Management (Process Synchronization)
© 2005 P. Kouznetsov Computing with Reads and Writes in the Absence of Step Contention Hagit Attiya Rachid Guerraoui Petr Kouznetsov School of Computer.
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
1 Chapter 4 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
“FENDER” AUTOMATIC MEMORY FENCE INFERENCE Presented by Michael Kuperstein, Technion Joint work with Martin Vechev and Eran Yahav, IBM Research 1.
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Chapter 6 Process Synchronization Bernard Chen Spring 2007.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
CPSC 668Set 18: Wait-Free Simulations Beyond Registers1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Safety Definitions and Inherent Bounds of Transactional Memory Eshcar Hillel.
Inherent limitations on DAP TMs 1 Inherent Limitations on Disjoint-Access Parallel Transactional Memory Hagit Attiya, Eshcar Hillel, Alessia Milani Technion.
Inherent limitations facilitate design & verification of concurrent programs Hagit Attiya Technion.
Concurrent Data Structures in Architectures with Limited Shared Memory Support Ivan Walulya Yiannis Nikolakopoulos Marina Papatriantafilou Philippas Tsigas.
By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.
Formalisms and Verification for Transactional Memories Vasu Singh EPFL Switzerland.
1 Martin Vechev IBM T.J. Watson Research Center Joint work with: Hagit Attiya, Rachid Guerraoui, Danny Hendler, Petr Kuznetsov, Maged Michael.
Lecture 13: Consistency Models
Contention in shared memory multiprocessors Multiprocessor synchronization algorithms ( ) Lecturer: Danny Hendler Definitions Lower bound for consensus.
Contention in shared memory multiprocessors Multiprocessor synchronization algorithms ( ) Lecturer: Danny Hendler Definitions Lower bound for consensus.
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
CS510 Concurrent Systems Class 5 Threads Cannot Be Implemented As a Library.
Copyright © 2010, Oracle and/or its affiliates. All rights reserved. Who’s Afraid of a Big Bad Lock Nir Shavit Sun Labs at Oracle Joint work with Danny.
An Introduction to Software Transactional Memory
Parallel Programming Philippas Tsigas Chalmers University of Technology Computer Science and Engineering Department © Philippas Tsigas.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
6.3 Peterson’s Solution The two processes share two variables: Int turn; Boolean flag[2] The variable turn indicates whose turn it is to enter the critical.
A Consistency Framework for Iteration Operations in Concurrent Data Structures Yiannis Nikolakopoulos A. Gidenstam M. Papatriantafilou P. Tsigas Distributed.
Semaphores and Bounded Buffer. Semaphores  Semaphore is a type of generalized lock –Defined by Dijkstra in the last 60s –Main synchronization primitives.
L AWS OF ORDER : EXPENSIVE SYNCHRONIZATION IN CONCURRENT ALGORITHMS CANNOT BE ELIMINATED POPL '11 Hagit Attiya, Rachid Guerraoui, Danny Hendler, Petr Kuznetsov,
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.
11/18/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam.
Consider the program fragment below left. Assume that the program containing this fragment executes t1() and t2() on separate threads running on separate.
O(log n / log log n) RMRs Randomized Mutual Exclusion Danny Hendler Philipp Woelfel PODC 2009 Ben-Gurion University University of Calgary.
Complexity Implications of Memory Models. Out-of-Order Execution Avoid with fences (and atomic operations) Shared memory processes reordering buffer Hagit.
CS533 Concepts of Operating Systems Jonathan Walpole.
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 20: Consistency Models, TM
COT 4600 Operating Systems Fall 2009
Background on the need for Synchronization
Atomic Operations in Hardware
Atomic Operations in Hardware
O(log n / log log n) RMRs Randomized Mutual Exclusion
Symmetric Multiprocessors: Synchronization and Sequential Consistency
CIS 720 Concurrency Control.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Yiannis Nikolakopoulos
Lecture 22: Consistency Models, TM
Lecture 2 Part 2 Process Synchronization
Shared Memory Consistency Models: A Tutorial
Sitting on a Fence: Complexity Implications of Memory Reordering
CSE 451: Operating Systems Autumn 2003 Lecture 7 Synchronization
CSE 451: Operating Systems Autumn 2005 Lecture 7 Synchronization
CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization
CSE 153 Design of Operating Systems Winter 19
CS333 Intro to Operating Systems
Relaxed Consistency Part 2
Lecture 20: Synchronization
Problems with Locks Andrew Whitaker CSE451.
Process/Thread Synchronization (Part 2)
Presentation transcript:

Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs

STM is about ease-of-programming and efficiency What is “efficient“ in a concurrent system?

4 Cost metrics  Space: used memory Cheap Advanced garbage-collection  Time: the number of reads and writes (per operation) the number of stalls

5 Relaxed memory models Memory is much slower than CPU Read: check the cache -> read the memory Write: invalidate the caches -> update the memory To overcome “stalled writes” – reorder operations Reordering may result in inconsistency

6 What is inconsistency? Process P: Write(X,1) Read(Y) Process Q: Write(Y,1) Read(X) P Q W(Y,1) R(Y) W(X,1) R(X) W(X,1)

7 Possible outcomes PQ P reads before Q writes P reads after Q writes Q reads after P writes Q reads before P writes Out-of-order

8 Fixing out-of-order  Memory fences: read-after-write (RAW) write(X,1) fence() // enforce the order read(Y) P Q W(Y,1) R(Y)W(X,1) R(X)

9 Fixing out-of-order  Atomic operations: atomic-write-after-read atomic{ read(Y) … write(X,1) } E.g., CAS, TAS, Fetch&Add,… RAW/AWAR fences take ~60 RMRs

10 Our result 10 Any concurrent program in a certain class must use RAW/AWARs

11 What programs?  Concurrent data types: queues, counters, hash tables, trees,… Non-commutative operations Linearizable solo-terminating implementations  Mutual exclusion

12 Non-commutative operations Operation A is non-commutative if there exists operation B where (applied to some state): A influences B and B influences A

13 Example: Queue  enq(v) – add v to the end of the queue  deq() – dequeues the item at the head of the queue Q=1;2 Q.deq():1;Q.deq():2 vs. Q.deq():2;Q.deq():1 deq() influence each other Q.enq(3):ok;Q.deq():1 vs. Q.deq():1;Q.enq(3):ok enq() is commutative

14 Proof sketch  A non-commutative operation must write  Suppose not deq():1 1;2 there must be a write! w

15 Proof sketch  Let w be the first write  Suppose there are no AWAR deq():1 1;2 A(w) - the longest atomic construct containing w w w must be the first base-object event in A(w)!

16 Proof sketch  Suppose there are no RAWs deq():1 1;2 No RAW - no difference for deq()! deq():1 A(w)

17 Mutual exclusion Lock() – acquire the lock Unlock() – release the lock  (Mutex) No two process holds the lock at the same time  (Deadlock-freedom) If at least one process executes Lock() and no active process fails, at least one process acquires the lock Two Lock() operations influence each other!

18 Our result 18 In any implementation of mutual exclusion or a concurrent data type with a non- commutative operation op, a complete execution of op or lock() contains a RAW or AWAR Every successful lock acquire incurs a RAW/AWAR fence

19 Why do we care?  Hardware design: what primitives must be optimized?  API design: returned values matter Set with add returning fail vs. returning ok  Verification – early catch of obviously incorrect algorithm

20 What’s next?  Weaker primitives? Idempotent Work Stealing [Michael et al,PPoPP’09 ]  Tight lower bounds? How many RAW/AWAR fences are incurred?  Other patterns Read-after-read Write-after-write Multi-RAW: write(X i,1) collect(X 1,..,X n )

21 References  H. Attiya, R. Guerraoui, D. Hendler, P. Kuznetsov, M. Michael, M. Vechev Laws of Order: Expensive Synchronization in Concurrent Algorithms Cannot be Eliminated In POPL 2011  Srivatsan’s talk on STM fence complexity, TR on the way

22 QUESTIONS?