Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects Maged M. Michael Presented by Abdulai Sei
Recap So far we have seen many lock-free dynamic objects algorithms How many reclaim memory of reused nodes? How many of these return memory to the operating system? What happens if a thread frees a node and another thread has a pointer to that node?
Related paper - Type Stable Memory (TSM) Similar concept to HP, non-blocking Three basic extension to conventional type implementation type descriptor remains a valid instance of the type even when inactive (in the free list) allows multiple memory allocation for the same type type change over time (descriptor has to be inactive before reallocation) Guarantee a pointer to point to a descriptor of the type readers can continue reading but does not guarantee correct data
Hazard Pointers (HP) A memory management methodology that allows memory reclamation for arbitrary reuse Memory is finite and needs to be reused Thread failures and dead lock leave chunks of memory unclaimed for reuse by the operating system
HP Claims Suitable for user-level application as well as system level program without dependency on kernel or scheduler Very efficient algorithm for dynamic lock-free object Requires only single-word reads and writes Solves the ABA problem Wait-free
ABA Problem Common problem with CAS, CAS2, DCAS Affects almost all lock-free algorithms Thread A reads a value (XX) from a shared memory, then another thread B reads the same value XX, change it to XY and then back to XX. When thread A reads the value for the second time, it sees XX and assume it was not changed Must be prevented regardless of memory reclamation Applying HP to ABA-safe algorithm makes it safe
Methodology Based on the observation that A thread holds a small number of references References do not need further validation for accessing the content of dynamic nodes References are not ABA-prone Core ideas Single-writer multi-reader shared pointers called HP HP varies among threads – typically one to two Communicates with the algorithm through HP Consists of two parts One for processing retired nodes Memory reclamation and ABA prevention
Algorithm – (reader only) add1* READER add2 HP List (add2) – dereference add1
Algorithm – (reader + writer) add1* READER add2add3 add2 HP List WRITER (add2) – dereference add1 allocate add3 Deference add1 copy add2 to add3 modify add3
Algorithm (reader + writer) add1* READER add2add3 add2 HP List WRITER (add2) – reader has already deference add1* before writer start allocate add3 copy add2 to add3 modify add3 CAS retired add2
Questions for class Is the reader reading the right data after the writer perform CAS? See slide 13, the conditions Can the garbage collector free add2? Any new problems? yes, readers have to write to the HP list others?
Code
The Condition A thread announces to other threads when it assigns a reference to one of it HP Other threads refrain from reclaiming or reusing node At time t, each node n is in one of the states Allocated, reachable, removed, retired, free, unavailable, undefined A thread holds a reference to a node when it is safe Question: Can a thread create a new hazardous reference to a node when it is retired?
Applying HP Apply to existing lock-free algorithms based on conditions Examine the target algorithm Identify HP and hazard reference they use Determine where hazard reference is created and last hazard pointer that uses it Determine the max # that can be hazardous Insert the following in target algorithm Write the address of nodes that are target for reference to an unavailable hazard reference Validate the node is safe Example of existing algorithms to apply the use of HP FIFO Queues LIFO Stacks List-Based Sets and Hash Tables Single-Writer Multireader Dynamic Structures
Applications – FIFO queues
Applications – Lock-free stack using HP
Experiment Comparing performance of new methodology to lock-free memory management system FIFI queues, LIFO stacks, chaining hash table – all implemented using HP Comparison with commonly use locks Using test-and-test and set with exponential backoff Hash table with 100 separate locks IBM RS /6000 multiprocessor with 4 processors Align cache data structure Locks, single-word, double-word CAS implemented using LL/SC All implementation compiled at highest optimization Ran each experiment 5 times and reported the average
Performance Comparison
Conclusion HP provided unrestricted memory reclamation for dynamic lock- free objects Takes constant amortized time per retired node Uses only single-word instructions Comparable performance to that of lock-based implementation Both under no contention and no multiprogramming Outperform them significantly under moderate multiprogramming and/or contention Guarantee progress even if a thread fails or delays (prevent deadlock)