A Lock-Free Algorithm for Concurrent Bags

Slides:



Advertisements
Similar presentations
Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Advertisements

Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas.
Wait-Free Queues with Multiple Enqueuers and Dequeuers
On Dynamic Load Balancing on Graphics Processors Daniel Cederman and Philippas Tsigas Chalmers University of Technology.
Maged M. Michael, “Hazard Pointers: Safe Memory Reclamation for Lock- Free Objects” Presentation Robert T. Bauer.
Scalable and Lock-Free Concurrent Dictionaries
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
Håkan Sundell, Chalmers University of Technology 1 Space Efficient Wait-free Buffer Sharing in Multiprocessor Real-time Systems Based.
Scalable Synchronous Queues By William N. Scherer III, Doug Lea, and Michael L. Scott Presented by Ran Isenberg.
Elad Gidron, Idit Keidar, Dmitri Perelman, Yonathan Perez
E81 CSE 532S: Advanced Multi-Paradigm Software Development Chris Gill Department of Computer Science and Engineering Washington University in St. Louis.
Locality-Conscious Lock-Free Linked Lists Anastasia Braginsky & Erez Petrank 1.
Concurrent Data Structures in Architectures with Limited Shared Memory Support Ivan Walulya Yiannis Nikolakopoulos Marina Papatriantafilou Philippas Tsigas.
Lock-free Cuckoo Hashing Nhan Nguyen & Philippas Tsigas ICDCS 2014 Distributed Computing and Systems Chalmers University of Technology Gothenburg, Sweden.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Introduction to Lock-free Data-structures and algorithms Micah J Best May 14/09.
Computer Laboratory Practical non-blocking data structures Tim Harris Computer Laboratory.
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
CS510 Concurrent Systems Class 2 A Lock-Free Multiprocessor OS Kernel.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
SUPPORTING LOCK-FREE COMPOSITION OF CONCURRENT DATA OBJECTS Daniel Cederman and Philippas Tsigas.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Practical and Lock-Free Doubly Linked Lists Håkan Sundell Philippas Tsigas.
Chapter 12 Data Structure Associate Prof. Yuh-Shyan Chen Dept. of Computer Science and Information Engineering National Chung-Cheng University.
Parallel Programming Philippas Tsigas Chalmers University of Technology Computer Science and Engineering Department © Philippas Tsigas.
Simple Wait-Free Snapshots for Real-Time Systems with Sporadic Tasks Håkan Sundell Philippas Tsigas.
Håkan Sundell, Chalmers University of Technology 1 Using Timing Information on Wait-Free Algorithms in Real-Time Systems (2 papers)
Understanding Performance of Concurrent Data Structures on Graphics Processors Daniel Cederman, Bapi Chatterjee, Philippas Tsigas Distributed Computing.
Håkan Sundell, Chalmers University of Technology 1 NOBLE: A Non-Blocking Inter-Process Communication Library Håkan Sundell Philippas.
November 15, 2007 A Java Implementation of a Lock- Free Concurrent Priority Queue Bart Verzijlenberg.
Håkan Sundell, Chalmers University of Technology 1 Simple and Fast Wait-Free Snapshots for Real-Time Systems Håkan Sundell Philippas.
A Consistency Framework for Iteration Operations in Concurrent Data Structures Yiannis Nikolakopoulos A. Gidenstam M. Papatriantafilou P. Tsigas Distributed.
Challenges in Non-Blocking Synchronization Håkan Sundell, Ph.D. Guest seminar at Department of Computer Science, University of Tromsö, Norway, 8 Dec 2005.
Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD.
Internet Software Development Controlling Threads Paul J Krause.
Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.
11/18/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam.
A Methodology for Creating Fast Wait-Free Data Structures Alex Koganand Erez Petrank Computer Science Technion, Israel.
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
Priority Queues Dan Dvorin Based on ‘The Art of Multiprocessor Programming’, by Herlihy & Shavit, chapter 15.
Scalable lock-free Stack Algorithm Wael Yehia York University February 8, 2010.
Review Array Array Elements Accessing array elements
Chapter 12 – Data Structures
Data Structure Interview Question and Answers
Background on the need for Synchronization
Data Structure and Algorithms
Håkan Sundell Philippas Tsigas
Stacks and Queues.
Queues Queues Queues.
Data Structures Interview / VIVA Questions and Answers
Distributed Algorithms (22903)
Distributed Algorithms (22903)
Data Structure Interview
Challenges in Concurrent Computing
CS510 Concurrent Systems Jonathan Walpole.
Distributed Algorithms (22903)
Practical Non-blocking Unordered Lists
Anders Gidenstam Håkan Sundell Philippas Tsigas
Arrays and Linked Lists
Concurrent Data Structures Concurrent Algorithms 2017
CS510 Concurrent Systems Jonathan Walpole.
CS510 - Portland State University
Yiannis Nikolakopoulos
Processes Hank Levy 1.
NOBLE: A Non-Blocking Inter-Process Communication Library
Dr. Mustafa Cem Kasapbaşı
A Concurrent Lock-Free Priority Queue for Multi-Thread Systems
CSE 153 Design of Operating Systems Winter 19
CS510 Concurrent Systems Jonathan Walpole.
Processes Hank Levy 1.
Presentation transcript:

A Lock-Free Algorithm for Concurrent Bags Håkan Sundell Anders Gidenstam Marina Papatriantafilou Philippas Tsigas School of business and informatics University of Borås Distributed Computing and Systems group, Department of Computer Science and Engineering, Chalmers University of Technology

Anders Gidenstam, University of Borås Outline Introduction Lock-free synchronization The Problem & Related work The new lock-free bag algorithm Experiments Conclusions 16/09/2018 Anders Gidenstam, University of Borås

Synchronization on a shared object Lock-free synchronization Allows concurrent operations without enforcing mutual exclusion Avoids: Blocking (or busy waiting), convoy effects, priority inversion and risk of deadlock Progress Guarantee At least one operation always makes progress P1 P2 P3 Op A Op B P4 16/09/2018 Anders Gidenstam, University of Borås

Correctness of a concurrent object Desired semantics of a shared data object Linearizability [Herlihy & Wing, 1990] For each operation invocation there must be one single time instant during its duration where the operation appears to take effect. The observed effects should be consistent with a sequential execution of the operations in that order. O2 O3 O1 16/09/2018 Anders Gidenstam, University of Borås

Correctness of a concurrent object Desired semantics of a shared data object Linearizability [Herlihy & Wing, 1990] For each operation invocation there must be one single time instant during its duration where the operation appears to take effect. The observed effects should be consistent with a sequential execution of the operations in that order. O2 O3 O1 16/09/2018 Anders Gidenstam, University of Borås

Correctness of a concurrent object Desired semantics of a shared data object Linearizability [Herlihy & Wing, 1990] For each operation invocation there must be one single time instant during its duration where the operation appears to take effect. The observed effects should be consistent with a sequential execution of the operations in that order. O2 O3 O1 16/09/2018 Anders Gidenstam, University of Borås

Anders Gidenstam, University of Borås System Model Processes can read/write single memory words Model: Sequential consistency Synchronization primitives Built into CPU and memory system Atomic read-modify-write (i.e. a critical section of one instruction) Examples: Compare-and-Swap, Load-Linked / Store-Conditional CPU CPU Shared Memory 16/09/2018 Anders Gidenstam, University of Borås

Anders Gidenstam, University of Borås Outline Introduction Lock-free synchronization The Problem & Related work The new lock-free bag algorithm Experiments Conclusions 16/09/2018 Anders Gidenstam, University of Borås

The Problem: Concurrent bag shared data object Basic operations: Add() and TryRemoveAny() Elements in the bag are unordered Desired Properties Linearizable and lock-free Linearizable means: Add(A) -> TryRemoveAny() returns A; if TryRemoveAny() returns EMPTY then the bag really was empty Dynamic size (maximum only limited by available memory) Bounded memory usage (in terms of live contents) Fast on current systems E C B D A F G 16/09/2018 Anders Gidenstam, University of Borås

The Problem: Concurrent bag shared data object Motivation Useful for communication/work distribution, e.g. Implementation of Parallel foreach / forall Between (unordered) pipeline stages Abstract data type available in some languages Bag / Producer-Consumer Collection .NET C# … A concurrent bag can be implemented with other data structures, e.g. queue, stack, … But “better” service comes at a price 16/09/2018 Anders Gidenstam, University of Borås

Related Work: Lock-free Multi-P/C Queues [Michael & Scott, 1996] Linked-list, one element/node Global shared head and tail pointers [Tsigas & Zhang, 2001] Static circular array of elements Two different NULL values for distinguishing initially empty from dequeued elements Global shared head and tail indices, lazily updated [Michael & Scott, 1996] + Elimination [Moir, Nussbaum, Shalev & Shavit, 2005] Same as the above + elimination of concurrent pairs of enqueue and dequeue when the queue is near empty [Hoffman, Shalev & Shavit, 2007] Baskets queue Reduces contention between concurrent enqueues after conflict Uses stronger memory management than M&S (SLFRC or Beware&Cleanup) N-1 16/09/2018 Anders Gidenstam, University of Borås

Related Work: Lock-free Multi-P/C Stacks and Pools L-F stacks [Michael, 2004] Linked-list, one element/node Global shared head pointer [Michael, 2004] + Elimination [Hendler, Shavit & Yerushalami, 2004] Same as the above + elimination of concurrent pairs of push and pop. L-F pool [Afek, Korland, Natanzon & Shavit, 2010] Tree of balancers with elimination + queues. 16/09/2018 Anders Gidenstam, University of Borås

Anders Gidenstam, University of Borås Outline Introduction Lock-free synchronization The Problem & Related work The new lock-free bag algorithm Experiments Conclusions 16/09/2018 Anders Gidenstam, University of Borås

The Algorithm Basic Idea Linked-lists of array blocks One list per thread Always used by Add() TryRemoveAny() looks there first Add()s by different threads do not contend A TryRemoveAny() has a large number of blocks to choose from Low risk for contention Static thread-local storage (TLS) Used to avoid reading/writing shared state 16/09/2018 Anders Gidenstam, University of Borås

The Algorithm Basic Idea Add() Item is inserted in an empty slot in the first array block in the thread’s list A new first block is added when all slots have been used The current slot index is stored in TLS 16/09/2018 Anders Gidenstam, University of Borås

The Algorithm Basic Idea TryRemoveAny() The thread first scans the first block in its list (from the current index in TLS) When an item is found it is removed via CAS If the block is empty it is removed from the list 16/09/2018 Anders Gidenstam, University of Borås

Anders Gidenstam, University of Borås The Algorithm Issues Finding items when own list is empty Detecting that the bag is empty Managing the linked lists 16/09/2018 Anders Gidenstam, University of Borås

Anders Gidenstam, University of Borås The Algorithm Finding items when the own list is empty Steal items from blocks belonging to other threads Hence, CAS needed to remove items Never leave a block until it is empty Help removing empty blocks 16/09/2018 Anders Gidenstam, University of Borås

Anders Gidenstam, University of Borås The Algorithm Detecting that the bag is empty No single place to look Scan all blocks of all threads Items may be added concurrently Items may be removed concurrently 16/09/2018 Anders Gidenstam, University of Borås

Anders Gidenstam, University of Borås The Algorithm Notification mechanism Per-block bit field One bit for each thread All bits cleared by Add() Thieves set their bit before scanning the block If the bit is still set for all blocks when the thief rescans the bag is empty? 16/09/2018 Anders Gidenstam, University of Borås

Anders Gidenstam, University of Borås The Algorithm Notification mechanism No, there can still be one pending Add per other thread Cleared the notify bits before the thief started scanning Items can show up and disappear (removed) during the scan 16/09/2018 Anders Gidenstam, University of Borås

Anders Gidenstam, University of Borås The Algorithm Notification mechanism No, there can still be one pending Add per other thread Cleared the notify bits before the thief started scanning Items can show up and disappear (removed) during the scan Rescan everything #threads+1 times if found empty in all scans it truly was empty Pending Add removed before rescan reaches here Pending Add after rescan was here 16/09/2018 Anders Gidenstam, University of Borås

The Algorithm Managing the linked lists Removing blocks When the block is scanned and found empty it is marked logically deleted, with mark1 By owner No problem By thief Must not be the first block in the linked-list since owner may add items there Mark the preceding block with mark2 first The block cannot be the first Prevents the block from becoming the first block Seeing mark1 or mark2 invokes helping Memory management (Modified) Hazard pointers scheme [Michael, 2002] 16/09/2018 Anders Gidenstam, University of Borås

Anders Gidenstam, University of Borås The Algorithm Managing the linked lists Properties Only the owner can remove the first block The last block of each linked-list cannot be removed Thieves can remove any other block found empty So After an linked list has been scanned by TryRemoveAny() there can be at most 2 empty blocks in it Hence, a thread finding the bag empty will have no more than 2*#threads blocks to traverse once it has helped any pending removals (at most 1 per thread) 16/09/2018 Anders Gidenstam, University of Borås

Anders Gidenstam, University of Borås Outline Introduction Lock-free synchronization The Problem & Related work The new lock-free bag algorithm Experiments Conclusions 16/09/2018 Anders Gidenstam, University of Borås

Experimental evaluation Micro benchmark Threads execute Add and TryRemoveAny operations on a shared bag High contention Test Configurations Random 50% / 50%, initial size 0 1 Producer / N-1 Consumers, initial size 0 N-1 Producers / 1 Consumer, initial size 0 N/2 Producers / N/2 Consumers, initial size 0 Measured throughput in items/sec #TryRemoveAny not returning EMPTY Application Parallel computation of Mandelbrot set Producer/Consumer pattern 16/09/2018 Anders Gidenstam, University of Borås

Experimental evaluation Algorithms L-F queue [Michael & Scott, 1996] L-F queue [Michael & Scott, 1996] + Elimination [Moir, Nussbaum, Shalev & Shavit, 2005] L-F queue [Tsigas & Zhang, 2001] L-F queue [Hoffman, Shalev & Shavit, 2007] L-F stack [Michael, 2004] L-F stack [Michael, 2004] + Elimination [Hendler, Shavit & Yerushalami, 2010] L-F pool [Afek, Korland, Natanzon & Shavit, 2010] The new L-F bag [Gidenstam, Sundell, Papatriantafilou & Tsigas, 2011] PC Platform CPU: 2x Intel Xeon X5660 @ 2.8 GHz 6 cores per CPU with 2 hardware threads each => 12 cores, 24 hw threads RAM: 12 GB DDR3 @ 1333 MHz Windows 7 64-bit 16/09/2018 Anders Gidenstam, University of Borås

Experimental evaluation (i) 16/09/2018 Anders Gidenstam, University of Borås

Experimental evaluation (ii) 16/09/2018 Anders Gidenstam, University of Borås

Experimental evaluation (iii) 16/09/2018 Anders Gidenstam, University of Borås

Experimental evaluation (iv) 16/09/2018 Anders Gidenstam, University of Borås

Anders Gidenstam, University of Borås Experimental evaluation (v) Parallel application for the Mandelbrot set 16x16 chunks: Large work units => Low contention on the shared data structure (bag) 2x2 chunks: Small work units => High contention. The bag implementation matters 16/09/2018 Anders Gidenstam, University of Borås

Anders Gidenstam, University of Borås Conclusions Lock free and linearizable algorithm for a concurrent bag producer/consumer collection data structure Distributed design, promoting access-parallelism. Exploiting thread-local static storage. Dynamic in size via lock-free memory management. Only requires atomic primitives available in contemporary systems. 16/09/2018 Anders Gidenstam, University of Borås

Thank you for listening! Questions? 16/09/2018 Anders Gidenstam, University of Borås