Scalable and Lock-Free Concurrent Dictionaries

Slides:



Advertisements
Similar presentations
Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas.
Advertisements

Håkan Sundell, Chalmers University of Technology 1 Evaluating the performance of wait-free snapshots in real-time systems Björn Allvin.
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Concurrency: Deadlock and Starvation Chapter 6. Deadlock Permanent blocking of a set of processes that either compete for system resources or communicate.
Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community.
Concurrency: Deadlock and Starvation Chapter 6. Deadlock Permanent blocking of a set of processes that either compete for system resources or communicate.
Concurrency: Mutual Exclusion and Synchronization Chapter 5.
Maged M. Michael, “Hazard Pointers: Safe Memory Reclamation for Lock- Free Objects” Presentation Robert T. Bauer.
(C) Ph. Tsigas © Ph. Tsigas Algorithm Engineering of Parallel Algorithms and Parallel Data Structures Philippas Tsigas.
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
Håkan Sundell, Chalmers University of Technology 1 Space Efficient Wait-free Buffer Sharing in Multiprocessor Real-time Systems Based.
Scalable Synchronous Queues By William N. Scherer III, Doug Lea, and Michael L. Scott Presented by Ran Isenberg.
Locality-Conscious Lock-Free Linked Lists Anastasia Braginsky & Erez Petrank 1.
ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing.
TOWARDS A SOFTWARE TRANSACTIONAL MEMORY FOR GRAPHICS PROCESSORS Daniel Cederman, Philippas Tsigas and Muhammad Tayyab Chaudhry.
Lock-free Cuckoo Hashing Nhan Nguyen & Philippas Tsigas ICDCS 2014 Distributed Computing and Systems Chalmers University of Technology Gothenburg, Sweden.
Introduction to Lock-free Data-structures and algorithms Micah J Best May 14/09.
Computer Laboratory Practical non-blocking data structures Tim Harris Computer Laboratory.
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
CS510 Concurrent Systems Class 2 A Lock-Free Multiprocessor OS Kernel.
CPSC 4650 Operating Systems Chapter 6 Deadlock and Starvation
1 Concurrency: Deadlock and Starvation Chapter 6.
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.
Skip Lists1 Skip Lists William Pugh: ” Skip Lists: A Probabilistic Alternative to Balanced Trees ”, 1990  S0S0 S1S1 S2S2 S3S3 
Comparison Under Abstraction for Verifying Linearizability Daphna Amit Noam Rinetzky Mooly Sagiv Tom RepsEran Yahav Tel Aviv UniversityUniversity of Wisconsin.
SUPPORTING LOCK-FREE COMPOSITION OF CONCURRENT DATA OBJECTS Daniel Cederman and Philippas Tsigas.
Concurrency: Deadlock and Starvation Chapter 6. Goal and approach Deadlock and starvation Underlying principles Solutions? –Prevention –Detection –Avoidance.
1 Concurrency: Deadlock and Starvation Chapter 6.
1 Lock-Free Linked Lists Using Compare-and-Swap by John Valois Speaker’s Name: Talk Title: Larry Bush.
Practical and Lock-Free Doubly Linked Lists Håkan Sundell Philippas Tsigas.
Parallel Programming Philippas Tsigas Chalmers University of Technology Computer Science and Engineering Department © Philippas Tsigas.
Simple Wait-Free Snapshots for Real-Time Systems with Sporadic Tasks Håkan Sundell Philippas Tsigas.
CS510 Concurrent Systems Jonathan Walpole. A Lock-Free Multiprocessor OS Kernel.
Håkan Sundell, Chalmers University of Technology 1 Using Timing Information on Wait-Free Algorithms in Real-Time Systems (2 papers)
Håkan Sundell, Chalmers University of Technology 1 NOBLE: A Non-Blocking Inter-Process Communication Library Håkan Sundell Philippas.
November 15, 2007 A Java Implementation of a Lock- Free Concurrent Priority Queue Bart Verzijlenberg.
Håkan Sundell, Chalmers University of Technology 1 Applications of Non-Blocking Data Structures to Real-Time Systems Seminar for the.
Håkan Sundell, Chalmers University of Technology 1 Simple and Fast Wait-Free Snapshots for Real-Time Systems Håkan Sundell Philippas.
A Consistency Framework for Iteration Operations in Concurrent Data Structures Yiannis Nikolakopoulos A. Gidenstam M. Papatriantafilou P. Tsigas Distributed.
1 Announcements The fixing the bug part of Lab 4’s assignment 2 is now considered extra credit. Comments for the code should be on the parts you wrote.
Challenges in Non-Blocking Synchronization Håkan Sundell, Ph.D. Guest seminar at Department of Computer Science, University of Tromsö, Norway, 8 Dec 2005.
Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.
Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.
Non-Blocking Concurrent Data Objects With Abstract Concurrency By Jack Pribble Based on, “A Methodology for Implementing Highly Concurrent Data Objects,”
Skip Lists 二○一七年四月二十五日
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
Practical concurrent algorithms Mihai Letia Concurrent Algorithms 2012 Distributed Programming Laboratory Slides by Aleksandar Dragojevic.
A Simple Optimistic skip-list Algorithm Maurice Herlihy Brown University & Sun Microsystems Laboratories Yossi Lev Brown University & Sun Microsystems.
Range Queries in Non-blocking k-ary Search Trees Trevor Brown Hillel Avni.
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects Maged M. Michael Presented by Abdulai Sei.
CS510 Concurrent Systems Jonathan Walpole. A Methodology for Implementing Highly Concurrent Data Objects.
CS510 Concurrent Systems Jonathan Walpole. RCU Usage in Linux.
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.
November 27, 2007 Verification of a Concurrent Priority Queue Bart Verzijlenberg.
SkipLists and Balanced Search The Art Of MultiProcessor Programming Maurice Herlihy & Nir Shavit Chapter 14 Avi Kozokin.
An algorithm of Lock-free extensible hash table Yi Feng.
Slides created by: Professor Ian G. Harris Operating Systems  Allow the processor to perform several tasks at virtually the same time Ex. Web Controlled.
Scalable lock-free Stack Algorithm Wael Yehia York University February 8, 2010.
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.
Håkan Sundell Philippas Tsigas
A Lock-Free Algorithm for Concurrent Bags
Practical Non-blocking Unordered Lists
Anders Gidenstam Håkan Sundell Philippas Tsigas
Concurrent Data Structures Concurrent Algorithms 2017
NOBLE: A Non-Blocking Inter-Process Communication Library
A Concurrent Lock-Free Priority Queue for Multi-Thread Systems
Kernel Synchronization II
Multicore programming
Presentation transcript:

Scalable and Lock-Free Concurrent Dictionaries Håkan Sundell Philippas Tsigas

Outline Synchronization Methods Dictionaries Concurrent Dictionaries Previous results New Lock-Free Algorithm Experiments Conclusions

Synchronization Shared data structures needs synchronization Synchronization using Locks Mutually exclusive access to whole or parts of the data structure P1 P2 P3 P1 P2 P3

Blocking Synchronization Drawbacks Blocking Priority Inversion Risk of deadlock Locks: Semaphores, spinning, disabling interrupts etc. Reduced efficiency because of reduced parallelism

Non-blocking Synchronization Lock-Free Synchronization Optimistic approach (i.e. assumes no interference) The operation is prepared to later take effect (unless interfered) using hardware atomic primitives Possible interference is detected via the atomic primitives, and causes a retry Can cause starvation Wait-Free Synchronization Always finishes in a finite number of its own steps.

Dictionaries (Sets) Fundamental data structure Works on a set of <key,value> pairs Three basic operations: Insert(k,v): Adds a new item v=FindKey(k): Finds the item <k,v> v=DeleteKey(k): Finds and removes the item <k,v>

Previous Non-blocking Dictionaries M. Michael: “High Performance Dynamic Lock-Free Hash Tables and List-Based Sets”, SPAA 2002 Based on Singly-Linked List Linear time complexity! Fast Lock-Free Memory Management Causes retries of concurrent search operations! Building-block of Hash Tables Assumes each branch is of length <<10. However, Hash Tables might not be uniformly distributed.

Randomized Algorithm: Skip Lists William Pugh: ”Skip Lists: A Probabilistic Alternative to Balanced Trees”, 1990 Layers of ordered lists with different densities, achieves a tree-like behavior Time complexity: O(log2N) – probabilistic! Head Tail … 25% 50% 1 2 3 4 5 6 7

New Lock-Free Concurrent Skip List Define node state to depend on the insertion status at lowest level as well as a deletion flag Insert from lowest level going upwards Set deletion flag. Delete from highest level going downwards 1 D 2 D 3 D 4 D 5 D 6 D 7 D 3 2 1 p 3 2 1 p D

Overlapping operations on shared data Insert 2 2 Example: Insert operation - which of 2 or 3 gets inserted? Solution: Compare-And-Swap atomic primitive: CAS(p:pointer to word, old:word, new:word):boolean atomic do if *p = old then *p := new; return true; else return false; 1 4 3 Insert 3

Concurrent Insert vs. Delete operations b) 1 2 4 Problem: - both nodes are deleted! Solution (Harris et al): Use bit 0 of pointer to mark deletion status a) Delete 3 Insert b) 1 2 * 4 a) c) 3

New Lock-Free Dictionary - Techniques Summary Based on Skip Lists Treated as layers of ordered lists Uses CAS atomic primitive Lock-Free memory management IBM Freelists Reference counting (Valois+Michael&Scott) Helping scheme Back-Off strategy All together proved to be linearizable

Experiments Experiment with 1-30 threads performed on systems with 2 respective 64 cpu’s. Each thread performs 20000 operations, whereof the first total 50-10000 operations are Insert’s, remaining are equally randomly distributed over Insert, FindKey and DeleteKey’s. Fixed Skiplist maximum level of 10. Compare with implementation by Michael, using same scenarios. Averaged execution time of 50 experiments.

SGI Origin 2000, 64 cpu’s.

Linux Pentium II, 2 cpu’s

Conclusions Our lock-free implementation also includes the value-oriented operations FindValue and DeleteValue. Our lock-free algorithm is suitable for both pre-emptive as well as systems with full concurrency Will be available as part of NOBLE software library, http://www.noble-library.org See Technical Report for full details, http://www.cs.chalmers.se/~phs

Questions? Contact Information: Address: Håkan Sundell vs. Philippas Tsigas Computing Science Chalmers University of Technology Email: <phs , tsigas> @ cs.chalmers.se Web: http://www.cs.chalmers.se/~phs/warp

Dynamic Memory Management Problem: System memory allocation functionality is blocking! Solution (lock-free), IBM freelists: Pre-allocate a number of nodes, link them into a dynamic stack structure, and allocate/reclaim using CAS Allocate Head Mem 1 Mem 2 … Mem n Reclaim Used 1

The ABA problem Problem: Because of concurrency (pre-emption in particular), same pointer value does not always mean same node (i.e. CAS succeeds)!!! Step 1: 1 6 7 4 Step 2: 2 3 7 4

The ABA problem Solution: (Valois et al) Add reference counting to each node, in order to prevent nodes that are of interest to some thread to be reclaimed until all threads have left the node 1 * 6 * New Step 2: 1 1 CAS Failes! 2 3 7 ? ? ? 4 1

Helping Scheme Threads need to traverse safely 1 2 * 4 1 2 * 4 1 2 * 4 Need to remove marked-to-be-deleted nodes while traversing – Help! Finds previous node, finish deletion and continues traversing from previous node or 1 2 * 4 1 2 * 4 ? ? 1 2 * 4

Back-Off Strategy For pre-emptive systems, helping is necessary for efficiency and lock-freeness For really concurrent systems, overlapping CAS operations (caused by helping and others) on the same node can cause heavy contention Solution: For every failed CAS attempt, back-off (i.e. sleep) for a certain duration, which increases exponentially

Non-blocking Synchronization Lock-Free Synchronization Avoids problems with locks Simple algorithms Fast when having low contention Wait-Free Synchronization Always finishes in a finite number of its own steps. Complex algorithms Memory consuming Less efficient in average than lock-free

Full SGI

Full Linux

The algorithm in more detail Insert: Create node with random height Search position (Remember drops) Insert or update on level 1 Insert on level 2 to top (unless already deleted) If already deleted then HelpDelete(1) All of this while keeping track of references, help deleted nodes etc.

The algorithm in more detail DeleteKey Search position (Remember drops) Mark node at level 1 as deleted, otherwise fail Mark next pointers on level 1 to top Delete on level top to 1 while detecting helping, indicate success Free node All of this while keeping track of references, help deleted nodes etc.

The algorithm in more detail HelpDelete(level) Mark next pointer at level to top Find previous node (info in node) Delete on level unless already helped, indicate success Return previous node All of this while keeping track of references, help deleted nodes etc.

Correctness Linearizability (Herlihy 1991) In order for an implementation to be linearizable, for every concurrent execution, there should exist an equal sequential execution that respects the partial order of the operations in the concurrent execution

Correctness Define precise sequential semantics Define abstract state and its interpretation Show that state is atomically updated Define linearizability points Show that operations take effect atomically at these points with respect to sequential semantics Creates a total order using the linearizability points that respects the partial order The algorithm is linearizable

Correctness Lock-freeness At least one operation should always make progress There are no cyclic loop depencies, and all potentially unbounded loops are ”gate-keeped” by CAS operations The CAS operation guarantees that at least one CAS will always succeed The algorithm is lock-free