Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.

Slides:



Advertisements
Similar presentations
Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas.
Advertisements

Wait-Free Linked-Lists Shahar Timnat, Anastasia Braginsky, Alex Kogan, Erez Petrank Technion, Israel Presented by Shahar Timnat 469-+
Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
Håkan Sundell, Chalmers University of Technology 1 Evaluating the performance of wait-free snapshots in real-time systems Björn Allvin.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Scalable and Lock-Free Concurrent Dictionaries
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
Håkan Sundell, Chalmers University of Technology 1 Space Efficient Wait-free Buffer Sharing in Multiprocessor Real-time Systems Based.
Critical Section chapter3.
Locality-Conscious Lock-Free Linked Lists Anastasia Braginsky & Erez Petrank 1.
Concurrent Data Structures in Architectures with Limited Shared Memory Support Ivan Walulya Yiannis Nikolakopoulos Marina Papatriantafilou Philippas Tsigas.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
TOWARDS A SOFTWARE TRANSACTIONAL MEMORY FOR GRAPHICS PROCESSORS Daniel Cederman, Philippas Tsigas and Muhammad Tayyab Chaudhry.
“THREADS CANNOT BE IMPLEMENTED AS A LIBRARY” HANS-J. BOEHM, HP LABS Presented by Seema Saijpaul CS-510.
Lock-free Cuckoo Hashing Nhan Nguyen & Philippas Tsigas ICDCS 2014 Distributed Computing and Systems Chalmers University of Technology Gothenburg, Sweden.
Introduction to Lock-free Data-structures and algorithms Micah J Best May 14/09.
Computer Laboratory Practical non-blocking data structures Tim Harris Computer Laboratory.
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
OS Spring 2004 Concurrency: Principles of Deadlock Operating Systems Spring 2004.
CS510 Concurrent Systems Class 2 A Lock-Free Multiprocessor OS Kernel.
OS Fall’02 Concurrency: Principles of Deadlock Operating Systems Fall 2002.
CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
Synchronization (other solutions …). Announcements Assignment 2 is graded Project 1 is due today.
SUPPORTING LOCK-FREE COMPOSITION OF CONCURRENT DATA OBJECTS Daniel Cederman and Philippas Tsigas.
1 Lock-Free Linked Lists Using Compare-and-Swap by John Valois Speaker’s Name: Talk Title: Larry Bush.
Practical and Lock-Free Doubly Linked Lists Håkan Sundell Philippas Tsigas.
CS510 Concurrent Systems Introduction to Concurrency.
Parallel Programming Philippas Tsigas Chalmers University of Technology Computer Science and Engineering Department © Philippas Tsigas.
Simple Wait-Free Snapshots for Real-Time Systems with Sporadic Tasks Håkan Sundell Philippas Tsigas.
Software Transactional Memory for Dynamic-Sized Data Structures Maurice Herlihy, Victor Luchangco, Mark Moir, William Scherer Presented by: Gokul Soundararajan.
CS510 Concurrent Systems Jonathan Walpole. A Lock-Free Multiprocessor OS Kernel.
Håkan Sundell, Chalmers University of Technology 1 Using Timing Information on Wait-Free Algorithms in Real-Time Systems (2 papers)
Håkan Sundell, Chalmers University of Technology 1 NOBLE: A Non-Blocking Inter-Process Communication Library Håkan Sundell Philippas.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
November 15, 2007 A Java Implementation of a Lock- Free Concurrent Priority Queue Bart Verzijlenberg.
Håkan Sundell, Chalmers University of Technology 1 Simple and Fast Wait-Free Snapshots for Real-Time Systems Håkan Sundell Philippas.
Challenges in Non-Blocking Synchronization Håkan Sundell, Ph.D. Guest seminar at Department of Computer Science, University of Tromsö, Norway, 8 Dec 2005.
Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
1 Chapter 9 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.
1 Contention Management and Obstruction-free Algorithms Niloufar Shafiei.
Chapter 6 – Process Synchronisation (Pgs 225 – 267)
CSE 425: Concurrency II Semaphores and Mutexes Can avoid bad inter-leavings by acquiring locks –Guard access to a shared resource to take turns using it.
A Methodology for Creating Fast Wait-Free Data Structures Alex Koganand Erez Petrank Computer Science Technion, Israel.
DOUBLE INSTANCE LOCKING A concurrency pattern with Lock-Free read operations Pedro Ramalhete Andreia Correia November 2013.
Practical concurrent algorithms Mihai Letia Concurrent Algorithms 2012 Distributed Programming Laboratory Slides by Aleksandar Dragojevic.
Darko Makreshanski Department of Computer Science ETH Zurich
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects Maged M. Michael Presented by Abdulai Sei.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
CS510 Concurrent Systems Jonathan Walpole. A Methodology for Implementing Highly Concurrent Data Objects.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
Techniques and Structures in Concurrent Programming Wilfredo Velazquez.
Implementing Lock. From the Previous Lecture  The “too much milk” example shows that writing concurrent programs directly with load and store instructions.
1 Critical Section Problem CIS 450 Winter 2003 Professor Jinhua Guo.
CS510 Concurrent Systems Jonathan Walpole. Introduction to Concurrency.
NB-FEB: A Universal Scalable Easy- to-Use Synchronization Primitive for Manycore Architectures Phuong H. Ha (Univ. of Tromsø, Norway) Philippas Tsigas.
Implementing Mutual Exclusion Andy Wang Operating Systems COP 4610 / CGS 5765.
Håkan Sundell Philippas Tsigas
Challenges in Concurrent Computing
A Lock-Free Algorithm for Concurrent Bags
Anders Gidenstam Håkan Sundell Philippas Tsigas
Yiannis Nikolakopoulos
Implementing Mutual Exclusion
Software Transactional Memory Should Not be Obstruction-Free
Implementing Mutual Exclusion
Multicore programming
Process/Thread Synchronization (Part 2)
Presentation transcript:

Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009

Outline Synchronization Methods Multi-Word Compare-And-Swap Problems New Wait-Free Algorithm Experiments Conclusions

Synchronization Shared memory easily enables shared (e.g. multi-thread accessible) data structures Shared data structures needs synchronization ! Accesses and especially updates must be coordinated to establish consistency. Updates should be done as atomic transactions. T1T1 T2T2 T3T3

Hardware Synchronization Primitives Consensus 1 Atomic Read/Write Consensus 2 Atomic Test-And-Set (TAS), Fetch-And-Add (FAA), Swap Consensus Infinite Atomic Compare-And-Swap (CAS) Atomic Load-Linked/Store-Conditionally Read Write Read M=f(M,…)

Universal and Conditional Synchronization primitive Compare-And-Swap (CAS) bool CAS(int *p, int old, int new) { atomic { if(*p == old) { *p=new; return true; } else return false; } } This single-word transaction primitive is supported (or equivalent) in hardware on all contemporary systems However, multi-word transactions must be done in software

Mutual Exclusion Mutual exclusion (e.g. locks) can be used for multi-word atomicity in software Access to shared data will be atomic because of lock Reduced parallelism by definition Blocking, Danger of priority inversion and deadlocks. Solutions exists, but with high overhead, especially for multi-processor systems T1T1 T2T2 T3T3

Non-blocking Synchronization Avoids blocking by performing the operation/changes using atomic primitives Lock-Free Synchronization Optimistic approach Retries until succeeding Guarantees progress of at least one operation Wait-Free Synchronization Always finishes in a finite number of its own steps Requires coordination with all concurrent operations

Wait-Free Synchronization Wait-Free Algorithms Usually very complex Hard to design and prove correct Offers strong real-time guarantees Usually offers significantly worse average performance than lock-free. Dynamic memory allocation needs wait-free memory management By definition, all sub-operations of a wait-free operation also has to be wait-free Atomic primitives are assumed to be wait-free

Multi-Word Compare-And- Swap Operations: bool CASN(int *p 1, int o 1, int n 1, …, int *p N, int o N, int n N,); int Read(int *p); Not supported by hardware Contemporary hardware only supports atomic update of one memory word Achieved by lifting abstraction level All operations on affected memory words has to go via the new abstraction layer Using the underlying hardware primitives

Multi-Word Compare-And- Swap Standard setup: Assign a lock to each invididual memory word. Standard algoritmic approach ( CASN ): 1. Try to acquire a lock on all positions of interest. 2. If already taken, help (i.e. perform) corresponding operation 3. If all taken and all match, change status of operation 4. Remove locks and possibly write new values Concurrent Read() must check if word is locked or not in order to decide the current value

Conflict resolution The concurrent CASN operations possibly need to lock a subset of same words If done in different order (i.e. not sorting the pointers arguments of the CASN call, p 1 <…<p N ) this can lead to deadlock scenarios

Wait-Free Multi-Word Compare-And-Swap New Approach Wait-free memory management (IPDPS 2005) for handling descriptor (used for representing the ongoing CASN state) allocation. Improved performance Greedy helping Never help more than absolutely necessary to continue Fast look-up of word’s current value Improves Read operation performance Improves CASN operation performance Allow un-sorted pointers arguments Grabbing Help until definitive conflict and then apply deterministic lock stealing and lock hand-over to resolve the deadlock

Descriptor structure allowing fast look-up of current value Allows 31 bits of a 32-bit memory word to represent the actual value. The corresponding old or new value can be indexed ( 1…N ) directly

Experiments Micro benchmark using Read() and CASN() Each thread repeatedly performs updates of N memory words Runs for 5 seconds, and number of successful updates are measured For each experiment varies parameters N is either 2, 4, 8, or 16 words N is selected (for each update) randomly from 2,4, …, words Number of threads is varied between 8, 16 or 32

Experiments In each micro benchmark compares with 2 of the latest (e.g. fastest) CASN implementations in the literature Harris et al, Lock-Free. Allows 30 bits of word to be used for value. Requires pointers arguments to be sorted. Ha and Tsigas, Lock-Free. Needs underlying LL/SC primitive implementation (e.g. Michael 2004). Allows (with selected LL/SC) any number of bits for value. Requires pointers arguments to be sorted.

Experiments – Some results

Conclusions New Wait-Free Algorithm for Multi- Word Compare-And-Swap Greedy helping and Grabbing Extraordinary Performance Even better average performance than corresponding lock-free in many scenarios! Especially in high contention