Distributed Algorithms (22903)

Slides:



Advertisements
Similar presentations
Stacks, Queues, and Linked Lists
Advertisements

Scalable Flat-Combining Based Synchronous Queues Danny Hendler, Itai Incze, Nir Shavit and Moran Tzafrir Presentation by Uri Golani.
§3 The Stack ADT 1. ADT A stack is a Last-In-First-Out (LIFO) list, that is, an ordered list in which insertions and deletions are.
1 Chapter 4 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
1 Operating Systems, 122 Practical Session 5, Synchronization 1.
Maged M. Michael, “Hazard Pointers: Safe Memory Reclamation for Lock- Free Objects” Presentation Robert T. Bauer.
CPSC 668Set 18: Wait-Free Simulations Beyond Registers1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Concurrent Queues and Stacks The Art of Multiprocessor Programming Spring 2007.
Scalable Synchronous Queues By William N. Scherer III, Doug Lea, and Michael L. Scott Presented by Ran Isenberg.
Linked Lists. Outline Why linked lists? Linked lists basics Implementation Basic primitives ­Searching ­Inserting ­Deleting.
Concurrent Queues Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Art of Multiprocessor Programming1 Concurrent Queues Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified.
Concurrent Queues and Stacks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Introduction to Lock-free Data-structures and algorithms Micah J Best May 14/09.
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
A Dynamic Elimination-Combining Stack Algorithm Gal Bar-Nissan, Danny Hendler and Adi Suissa Department of Computer Science, BGU, January 2011 Presnted.
שירן חליבה Concurrent Queues. Outline: Some definitions 3 queue implementations : A Bounded Partial Queue An Unbounded Total Queue An Unbounded Lock-Free.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 18: Wait-Free Simulations Beyond Registers 1.
Concurrent Queues Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Gal Milman Based on Chapter 10 (Concurrent Queues and the ABA Problem) in The Art of Multiprocessor Programming by Herlihy and Shavit Seminar 2 (236802)
Data Structures. Abstract Data Type A collection of related data is known as an abstract data type (ADT) Data Structure = ADT + Collection of functions.
Concurrent Stacks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Distributed Algorithms (22903) Lecturer: Danny Hendler The wait-free hierarchy and the universality of consensus This presentation is based on the book.
DECS: A Dynamic Elimination-Combining Stack Algorithm Gal Bar-Nissan, Danny Hendler, Adi Suissa 1 OPODIS 2011.
Concurrent Computing Seminar Introductory Lecture Instructor: Danny Hendler
Scalable lock-free Stack Algorithm Wael Yehia York University February 8, 2010.
Distributed Algorithms (22903) Lecturer: Danny Hendler Lock-free stack algorithms.
Distributed Algorithms (22903)
Review Array Array Elements Accessing array elements
Threaded Programming in Python
Queues.
Data Structure Interview Question and Answers
Atomic Operations in Hardware
Atomic Operations in Hardware
Lecture 25 More Synchronized Data and Producer/Consumer Relationship
Concurrent Objects Companion slides for
Queues Queues Queues.
Data Structures Interview / VIVA Questions and Answers
Distributed Algorithms (22903)
Distributed Algorithms (22903)
A Lock-Free Algorithm for Concurrent Bags
CS510 Concurrent Systems Jonathan Walpole.
Anders Gidenstam Håkan Sundell Philippas Tsigas
Prof. Neary Adapted from slides by Dr. Katherine Gibson
Stacks.
Distributed Algorithms (22903)
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CS510 Concurrent Systems Jonathan Walpole.
Lesson Objectives Aims
CS510 - Portland State University
Producer-Consumer Problem
Multicore programming
Distributed Algorithms (22903)
Distributed Algorithms (22903)
Scalable lock-free Stack Algorithm
Barrier Synchronization
Multicore programming
Model Checking of a lock-free stack
Multicore programming
CS510 Advanced Topics in Concurrency
CS333 Intro to Operating Systems
Using a Queue Chapter 8 introduces the queue data type.
Using a Queue Chapter 8 introduces the queue data type.
Multicore programming
CS510 Concurrent Systems Jonathan Walpole.
Stacks, Queues, and Deques
CSCS-200 Data Structure and Algorithms
Data Structures.
Nir Shavit Multiprocessor Synchronization Spring 2003
LINEAR DATA STRUCTURES
Presentation transcript:

Distributed Algorithms (22903) Lock-free stack algorithms Lecturer: Danny Hendler

Treiber’s stack algorithm … val val val Top next next next Push(int v, Stack S) n := new NODE ;create node for new stack item n.val := v ;write item value do forever ;repeat until success node top := S.top n.next := top ;next points to current (LIFO order) if compare&swap(S, top, n) ; try to add new item return ; return if succeeded od

Treiber’s stack algorithm (cont’d) … val val val Top next next next Pop(Stack S) do forever top := S.top if top = null return empty if compare&swap(S, top, top.next) return-val=top.val free top return return-val od

Treiber’s stack algorithm (cont’d) It is easily seen that the alg is linearizable and non-blocking A disadvantage of the algorithms is that… It has a sequential bottleneck.

An elimination backoff stack algorithm Key idea: pairs of push/pop operations may collide and eliminate each other without accessing a central stack. Central stack … val val val Top next next next collision array

… Collision scenarios Collision array Central stack push pop push pop val val val Top next next next

Data structures Location array collision array Each stack operation is represented by a ThreadInfo structure struct ThreadInfo { id ;the identifier of the thread performing the operation op ;a PUSH/POP opcode cell ;a cell structure spin ; duration to spin } Struct Cell { ;a representation of stack item as in Treiber pnext ;pointer to the next cell pdata ;stack item } Location array p1 p2 p3 p4 pn Thread Info Thread Info collision array p1 p7

Elimination-backoff stack code void StackOp(ThreadInfo* pInfo) { if (TryPerformStackOP(p) == FALSE) ;if operation not applied to central stack LesOp(p) ;try to eliminate operation by colliding with an opposite-type operation return void LesOP(ThreadInfo * p) while (1) location[mypid]=p ;announce arrival pos=GetPosition(p) ;get a random position at the collision array him=collision[pos] ;read current value of that position while (!compare&swap(&collision[pos],him,mypid);try to write own ID him=collision[pos] ;continue till success if (him != empty) ;if read an ID of another thread q=location[him] ;read a pointer to the other thread’s info if (q!=NULL && q->id=him && q->op != p->op) ;if may collide if (compare&swap(&location[mypid],p,NULL) ;try to prevent unwanted collisions if (TryCollision(p,q)==true) ;if collided successfully return ;return code is already at ThreadInfo structure else goto stack ;try to apply operation to central stack else FinishCollision(p), return ;extract information and finish delay (p->spin) ;Wait for other thread to collide with me if (!compare&swap(&location[mypid],p,NULL) ;if someone collided with me FinishCollision(p), return;Extract information and finish stack: if (TryPerformStackOp(p)==TRUE) return ;try to apply operation to central stack

Elimination-backoff stack code (cont’d) void TryCollision(ThreadInfo* p, ThreadInfo *q) if (p->op==PUSH) if (compare&swap(&location[him],q,p)) ;give my record to other thread return TRUE else return FALSE if (compare&swap(&location[him],q,NULL)) p->cell=q->cell ;get pointer to PUSH operation’s cell location[mypid]=NULL; void FinishCollision(ThreadInfo* p) if (p->op==POP) p->pcell=location[mypid]->pcell location[mypid]=NULL

Elimination-backoff stack code (cont’d) Why is this implementation linearizable? Can a record be recycled once popped from stack?

Recycling: Simple Solution Each thread has a free list of unused queue nodes Allocate node: pop from list Free node: push onto list Use CAS for atomicity Deal with underflow somehow … The most reasonable solution is to have each thread manage its own pool of unused nodes. When a thread needs a new Node, it pops one from the list. No need for synchronization. When a thread fees A Node, it pushes the newly-freed Node onto the list. What do we do when a thread runs out of nodes? Perhaps we could just malloc() more, or perhaps we can devise a shared pool. Never mind. © Herlihy-Shavit 2007

Why Recycling is Hard Free pool Want to rediret tail from grey to red head tail Want to rediret tail from grey to red zzz… Now the green thread goes to sleep, and other threads dequeue the red object and the green object, sending the prior sentinel and prior red Node to the respective free pools of the dequeuing threads. Free pool © Herlihy-Shavit 2007

Why Recycling is Hard zzz Free pool head tail © Herlihy-Shavit 2007 Despite what you might think, we are perfectly safe, because any thread that tries to CAS the head field must fail, because that field has changed. Now assume that the enough deq() and enq() calls occur such that the original sentinel Node is recyled, and again becomes a sentinel Node for an empty queue. Free pool © Herlihy-Shavit 2007

Why Recycling is Hard Yawn! Free pool head tail © Herlihy-Shavit 2007 Now the green thread wakes up. It applies CAS to the queue’s head field .. Free pool © Herlihy-Shavit 2007

Why Recycling is Hard CAS OK, here I go! Free pool head tail Surprise! It works! The problem is that the bit-wise value of the head field is the same as before, even though its meaning has changed. This is a problem with the way the CAS operation is defined, nothing more. Free pool © Herlihy-Shavit 2007

Final State What went wrong? Free pool head tail © Herlihy-Shavit 2007 In the end, the tail pointer points to a sentinel node, while the head points to node in some thread’s free list. What went wrong? What went wrong? Free pool © Herlihy-Shavit 2007

The Dreaded ABA Problem head tail Head pointer has value A Thread reads value A © Herlihy-Shavit 2007

Head pointer has value B Dreaded ABA continued head tail zzz Head pointer has value B Node A freed © Herlihy-Shavit 2007

Head pointer has value A again Node A recycled & reinitialized Dreaded ABA continued head tail Yawn! Head pointer has value A again Node A recycled & reinitialized © Herlihy-Shavit 2007

Dreaded ABA continued CAS CAS succeeds because pointer matches head tail CAS succeeds because pointer matches even though pointer’s meaning has changed © Herlihy-Shavit 2007

The Dreaded ABA Problem Is a result of CAS() semantics (Sun, Intel, AMD) Does not arise with Load-Locked/Store-Conditional (IBM) © Herlihy-Shavit 2007

Dreaded ABA – A Solution Tag each pointer with a counter Unique over lifetime of node Pointer size vs word size issues Overflow? Don’t worry be happy? Bounded tags © Herlihy-Shavit 2007