Locality in Concurrent Data Structures Hagit Attiya Technion.

Slides:



Advertisements
Similar presentations
Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas.
Advertisements

Impossibilities for Disjoint-Access Parallel Transactional Memory : Alessia Milani [Guerraoui & Kapalka, SPAA 08] [Attiya, Hillel & Milani, SPAA 09]
Transaction Management: Concurrency Control CS634 Class 17, Apr 7, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
© 2005 P. Kouznetsov Computing with Reads and Writes in the Absence of Step Contention Hagit Attiya Rachid Guerraoui Petr Kouznetsov School of Computer.
1 Chapter 4 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
Multiprocessor Synchronization Algorithms ( ) Lecturer: Danny Hendler The Mutual Exclusion problem.
Scalable and Lock-Free Concurrent Dictionaries
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
CPSC 668Set 18: Wait-Free Simulations Beyond Registers1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Parallel Programming in Distributed Systems Or Distributed Systems in Parallel Programming Philippas Tsigas Chalmers University of Technology Computer.
Highly-Concurrent Data Structures Hagit Attiya and Eshcar Hillel Computer Science Department Technion.
Safety Definitions and Inherent Bounds of Transactional Memory Eshcar Hillel.
Inherent limitations on DAP TMs 1 Inherent Limitations on Disjoint-Access Parallel Transactional Memory Hagit Attiya, Eshcar Hillel, Alessia Milani Technion.
1 Chapter 3 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
Two Techniques for Proving Lower Bounds Hagit Attiya Technion TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.
Concurrent Data Structures in Architectures with Limited Shared Memory Support Ivan Walulya Yiannis Nikolakopoulos Marina Papatriantafilou Philippas Tsigas.
A Mile-High View of Concurrent Algorithms Hagit Attiya Technion.
TOWARDS A SOFTWARE TRANSACTIONAL MEMORY FOR GRAPHICS PROCESSORS Daniel Cederman, Philippas Tsigas and Muhammad Tayyab Chaudhry.
Lock-free Cuckoo Hashing Nhan Nguyen & Philippas Tsigas ICDCS 2014 Distributed Computing and Systems Chalmers University of Technology Gothenburg, Sweden.
Winter School: Hot Topics in Distributed Computing 2010 Algorithms that Adapt to Contention Hagit Attiya (Technion & EPFL)
1 Greedy Algorithms. 2 2 A short list of categories Algorithm types we will consider include: Simple recursive algorithms Backtracking algorithms Divide.
CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Algorithmics for Software Transactional Memory Hagit Attiya Technion.
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.
The Cost of Privatization Hagit Attiya Eshcar Hillel Technion & EPFLTechnion.
Comparison Under Abstraction for Verifying Linearizability Daphna Amit Noam Rinetzky Mooly Sagiv Tom RepsEran Yahav Tel Aviv UniversityUniversity of Wisconsin.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
1 © R. Guerraoui Seth Gilbert Professor: Rachid Guerraoui Assistants: M. Kapalka and A. Dragojevic Distributed Programming Laboratory.
Precision Going back to constant prop, in what cases would we lose precision?
An Introduction to Software Transactional Memory
Practical and Lock-Free Doubly Linked Lists Håkan Sundell Philippas Tsigas.
Parallel Programming Philippas Tsigas Chalmers University of Technology Computer Science and Engineering Department © Philippas Tsigas.
CHP-4 QUEUE.
Simple Wait-Free Snapshots for Real-Time Systems with Sporadic Tasks Håkan Sundell Philippas Tsigas.
Software Transactional Memory for Dynamic-Sized Data Structures Maurice Herlihy, Victor Luchangco, Mark Moir, William Scherer Presented by: Gokul Soundararajan.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 18: Wait-Free Simulations Beyond Registers 1.
Atomic Snapshots. Abstract Data Types Abstract representation of data & set of methods (operations) for accessing it Implement using primitives on base.
1 Nasser Alsaedi. The ultimate goal for any computer system design are reliable execution of task and on time delivery of service. To increase system.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
1 Chapter 9 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
SEMINAR OPEN PROBLEMS IN DISTRIBUTED COMPUTING Winter Hagit Attiya & Faith Ellen Introduction1.
Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.
1 Contention Management and Obstruction-free Algorithms Niloufar Shafiei.
On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University.
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
Review 1 Queue Operations on Queues A Dequeue Operation An Enqueue Operation Array Implementation Link list Implementation Examples.
Chapter 16 – Data Structures and Recursion. Data Structures u Built-in –Array –struct u User developed –linked list –stack –queue –tree Lesson 16.1.
CS510 Concurrent Systems Jonathan Walpole. A Methodology for Implementing Highly Concurrent Data Objects.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
A Program Logic for Concurrent Objects under Fair Scheduling Hongjin Liang and Xinyu Feng University of Science and Technology of China (USTC) To appear.
Chapter 11 Resource Allocation by Mikhail Nesterenko “Distributed Algorithms” by Nancy A. Lynch.
Asynchronous Exclusive Selection Bogdan Chlebus, U. Colorado Darek Kowalski, U. Liverpool.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 16: Distributed Shared Memory 1.
MULTIVIE W Slide 1 (of 21) Software Transactional Memory Should Not Be Obstruction Free Paper: Robert Ennals Presenter: Emerson Murphy-Hill.
“Towards Self Stabilizing Wait Free Shared Memory Objects” By:  Hopeman  Tsigas  Paptriantafilou Presented By: Sumit Sukhramani Kent State University.
Parallel Data Structures. Story so far Wirth’s motto –Algorithm + Data structure = Program So far, we have studied –parallelism in regular and irregular.
Academic Year 2014 Spring Academic Year 2014 Spring.
Introduction Wireless Ad-Hoc Network  Set of transceivers communicating by radio.
Introduction to operating systems What is an operating system? An operating system is a program that, from a programmer’s perspective, adds a variety of.
Window-Based Greedy Contention Management for Transactional Memory Gokarna Sharma (LSU) Brett Estrade (Univ. of Houston) Costas Busch (LSU) DISC
Process Management Deadlocks.
Background on the need for Synchronization
Håkan Sundell Philippas Tsigas
Faster Data Structures in Transactional Memory using Three Paths
On disjoint access parallelism
A Lock-Free Algorithm for Concurrent Bags
Anders Gidenstam Håkan Sundell Philippas Tsigas
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Yiannis Nikolakopoulos
Presentation transcript:

Locality in Concurrent Data Structures Hagit Attiya Technion

May 13, 2008 BGU 2 data Abstract Data Types (ADT) Abstract representation of data & set of methods (operations) for accessing it –Signature –Specification

May 13, 2008 BGU 3 Implementing High-Level ADT From lower-level ADTs High-level operations translate into primitives on base objects –Obvious: read, write –Common: compare&swap (CAS), LL/SC, –Double-CAS (DCAS), –Generic: read-modify-write (RMW), kRMW, kCAS, … Low-level operations can be implemented from more primitive operations –A hierarchy of implementations

May 13, 2008 BGU 4 Example: Binary Operations Atomically read and modify two memory locations (esp. DCAS) Simplify the writing of concurrent ADTs  kCAS is even better Virtual Locking: Simulate DCAS / kCAS with a single-item CAS –Acquire virtual locks on nodes in the data set Perhaps in some order –Handle conflicts with blocking operations (holding a required data item)

May 13, 2008 BGU 5 Virtual locking [Turek, Shasha, Prakash, 1992] [Barnes] Acquire locks by increasing addresses –Guarantees that the implementation is nonblocking Help the blocking operation to complete (recursively) May result in long helping chains

May 13, 2008 BGU 6 Virtual Locking: Reducing Contention [Shavit, Touitou, 1995] Release locks when the operation is blocked Help an immediate neighbor (only) & retry… Short helping chains But long delay chains

May 13, 2008 BGU 7 Randomization: How Often this Happens? [Ha, Tsigas, Wattenhofer, Wattenhofer, 2005] Operations choose locking order at random Chains’ length depends on log n / loglog n –Also experimentally –Better, and yet… Similar analysis for chains’ length when ops choose items at random –Depends on the operations’ density [Dragojevic, Guerraoui & Kapalka, 2008 ]

May 13, 2008 BGU 8 Color-Based Virtual Locking (Binary) [Attiya, Dagan, 1996] Operations on two data items (e.g., DCAS) Colors define the locking order –Inspired by the left-right dinning philosophers algorithm [Lynch, 1980] Color the items when the operation starts –Non-trivial… [Cole, Vishkin, 1986] Bound the length of delay chains –But the coloring stage is complicated <

May 13, 2008 BGU 9 [Afek, Merritt, Taubenfeld, Touitou, 1997] Implements operations on k items, for a fixed k Based on the memory addresses, the conflict graph is decomposed into trees & items are legally colored [Goldberg, Plotkin, Shannon] –Need to have the data set from the beginning –Recursive, with A&D at the basis and in each step –Even more complicated Color-Based Virtual Locking (Fixed k)

May 13, 2008 BGU 10 Virtual Locking: More Concurrency, Simpler [Attiya, Hillel, 2008] Acquire locks in arbitrary order –No need to know the data set (or its size) in advance –No pre-computation Possible ways to handle conflicts between operations contending for a data item –Wait for the other operation –Help the other operation –Reset the other operation

May 13, 2008 BGU 11 Conflict Resolution, How? Depends on the operations’ progress More advanced operation wins 1.How to gauge progress? 2.What to do on a tie?

May 13, 2008 BGU 12 Who’s More Advanced? The operation that locked more data items If a less advanced operation needs an item  help the conflicting operation or  wait (blocking in a limited radius) If a more advanced operation needs an item  reset the conflicting operation and claim the item

May 13, 2008 BGU 13 What about Ties? Happen when two transactions locked the same number of items A transaction has a descriptor and a lock Use DCAS to race for locking the two descriptors –Winner calls the shots… 22

May 13, 2008 BGU 14 Measuring Concurrency: Data Items and Operations A data structure is a collection of items An operation accesses a data set –not necessarily a pair A set of operations induces a conflict graph –Nodes represent items –Edges connect items of the same operation

May 13, 2008 BGU 15 Spatial Relations between Operations Disjoint access –Non adjacent edges –Distance is infinite Overlapping operations –Adjacent edges –Distance is 0 Chains of operations –Paths –Distance is length of path (here, 2) d-neighborhood of an operation: all operations at distance ≤ d

May 13, 2008 BGU 16 Interference between Operations Disjoint access Overlapping operations Non-overlapping operations Provides more concurrency & yields better throughput Interference inevitable no interference Should not interfere!

May 13, 2008 BGU 17 Measuring Concurrency: Locality d failure locality [Choi, Singh, 1992] some operation completes in the d-neighborhood unless a failure occurs in the d-neighborhood d-local nonblocking some operation completes in the d-neighborhood even if a failure occurs in the d-neighborhood 17

May 13, 2008 BGU 18 Quantitative Measures of Locality 18 [Afek, Merritt, Taubenfeld, Touitou, 1997] Distance in the conflict graph between overlapping operations that interfere d-local step complexity: Only operations at distance ≤ d delay each other d-local contention: Only operations at distance ≤ d access the same memory location

May 13, 2008 BGU 19 In Retrospect CommentsMemory locality Step localityAlgorithm O(n) Turek et al. O(1)O(n)Shavit, Touitou BinaryO(log*n) Attiya, Dagan Fixed kO(k+log*n) Afek et al. Flexible kO(k+log*n) Attiya, Hillel

May 13, 2008 BGU 20 Customized Virtual Locking Can be viewed as software transactional memory –High overhead Or handled by specialized algorithms –Ad-hoc and very delicate, mistakes happen Instead, design algorithms in a systematic manner…  Lock the items that have to be changed & apply a sequential implementation on these items  Lock items by colors to increase concurrency  No need to re-color at the start of each operations since with a specific data structure We manage a data structure since its infancy The data sets of operations are predictable

May 13, 2008 BGU 21 Example: Doubly-Linked Lists An important special case underlying many distributed data structures –E.g., priority queue is used as job queue Insert and Remove operations –Sometimes only at the ends (deques) –The data set is an item and its left / right neighbors (or left / right anchor)

May 13, 2008 BGU 22 Built-In Coloring for Doubly-Linked Lists An important special case underlying many distributed data structures –E.g., priority queue is used as job queue Insert and Remove operations –Sometimes only at the ends (deques) –The data set is an item and its left / right neighbors (or left / right anchor) [Attiya, Hillel, 2006]  Always maintain the list items legally colored –Adjacent items have different colors –Adjust colors when inserting or removing items –No need to color from scratch in each operation!  Constant locality: operations that access disjoint data sets do not delay each other

May 13, 2008 BGU 23 new item Insert Operation New items are assigned a temporary color Remove from the ends is similar to Insert –Locks three items, one of them an anchor < < <

May 13, 2008 BGU 24 Removing from the Middle Complicated: Need to lock three list items –Possibly two with same color A chain of Remove operations may lead to a long delay chain in a symmetric situation

May 13, 2008 BGU 25 To the Rescue: DCAS Again  Use DCAS to lock equally colored nodes

May 13, 2008 BGU 26 In Treatment DCAS??? –Supported in hardware, or at least with hardware transactional memory –Software implementation (e.g., A&D) Software transactional memory?! –Speculative execution –Read accesses Blocking?!!! –Not so bad w/ low failure locality –Can be translated to nonblocking locality

May 13, 2008 BGU 27 In Treatment DCAS??? –Supported in hardware, or at least with hardware transactional memory –Software implementation (e.g., A&D) Software transactional memory?! –Read accesses –Speculative execution Blocking?!!! –Not so bad with low failure locality –Can be translated to nonblocking locality

May 13, 2008 BGU 28 data Implementing High-Level ADT

May 13, 2008 BGU 29 Implementing High-Level ADT data Using lower-level ADTs & procedures

May 13, 2008 BGU 30 Correctness: Linearizability [Herlihy & Wing, 1990] For every concurrent execution there is a sequential execution that –Contains the same operations –Is legal (obeys the specification of the ADTs) –Preserves the real-time order of non-overlapping operations Each operation appears to takes effect instantaneously at some point between its invocation and its response (atomicity)

May 13, 2008 BGU 31 Liveness Conditions (Eventual) Wait-free: every operation completes within a finite number of (its own) steps  no starvation for mutex Nonblocking: some operation completes within a finite number of (some other process) steps  deadlock-freedom for mutex Obstruction-free: an operation (eventually) running solo completes within a finite number of (its own) steps –Also called solo termination wait-free  nonblocking  obstruction-free

May 13, 2008 BGU 32 Randomization: How Often this Happens? [Ha, Tsigas, Wattenhofer, Wattenhofer, 2005] Operations choose locking order at random Chains’ length depends on log n / loglog n –better, and yet… Similar analysis for chains’ length when ops choose items at random –Depends on the operations’ density [Dragojevic, Guerraoui & Kapalka, 2008 ]