Download presentation
Presentation is loading. Please wait.
Published byEmma Owen Modified over 9 years ago
1
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009
2
Outline Synchronization Methods Multi-Word Compare-And-Swap Problems New Wait-Free Algorithm Experiments Conclusions
3
Synchronization Shared memory easily enables shared (e.g. multi-thread accessible) data structures Shared data structures needs synchronization ! Accesses and especially updates must be coordinated to establish consistency. Updates should be done as atomic transactions. T1T1 T2T2 T3T3
4
Hardware Synchronization Primitives Consensus 1 Atomic Read/Write Consensus 2 Atomic Test-And-Set (TAS), Fetch-And-Add (FAA), Swap Consensus Infinite Atomic Compare-And-Swap (CAS) Atomic Load-Linked/Store-Conditionally Read Write Read M=f(M,…)
5
Universal and Conditional Synchronization primitive Compare-And-Swap (CAS) bool CAS(int *p, int old, int new) { atomic { if(*p == old) { *p=new; return true; } else return false; } } This single-word transaction primitive is supported (or equivalent) in hardware on all contemporary systems However, multi-word transactions must be done in software
6
Mutual Exclusion Mutual exclusion (e.g. locks) can be used for multi-word atomicity in software Access to shared data will be atomic because of lock Reduced parallelism by definition Blocking, Danger of priority inversion and deadlocks. Solutions exists, but with high overhead, especially for multi-processor systems T1T1 T2T2 T3T3
7
Non-blocking Synchronization Avoids blocking by performing the operation/changes using atomic primitives Lock-Free Synchronization Optimistic approach Retries until succeeding Guarantees progress of at least one operation Wait-Free Synchronization Always finishes in a finite number of its own steps Requires coordination with all concurrent operations
8
Wait-Free Synchronization Wait-Free Algorithms Usually very complex Hard to design and prove correct Offers strong real-time guarantees Usually offers significantly worse average performance than lock-free. Dynamic memory allocation needs wait-free memory management By definition, all sub-operations of a wait-free operation also has to be wait-free Atomic primitives are assumed to be wait-free
9
Multi-Word Compare-And- Swap Operations: bool CASN(int *p 1, int o 1, int n 1, …, int *p N, int o N, int n N,); int Read(int *p); Not supported by hardware Contemporary hardware only supports atomic update of one memory word Achieved by lifting abstraction level All operations on affected memory words has to go via the new abstraction layer Using the underlying hardware primitives
10
Multi-Word Compare-And- Swap Standard setup: Assign a lock to each invididual memory word. Standard algoritmic approach ( CASN ): 1. Try to acquire a lock on all positions of interest. 2. If already taken, help (i.e. perform) corresponding operation 3. If all taken and all match, change status of operation 4. Remove locks and possibly write new values Concurrent Read() must check if word is locked or not in order to decide the current value
11
Conflict resolution The concurrent CASN operations possibly need to lock a subset of same words If done in different order (i.e. not sorting the pointers arguments of the CASN call, p 1 <…<p N ) this can lead to deadlock scenarios
12
Wait-Free Multi-Word Compare-And-Swap New Approach Wait-free memory management (IPDPS 2005) for handling descriptor (used for representing the ongoing CASN state) allocation. Improved performance Greedy helping Never help more than absolutely necessary to continue Fast look-up of word’s current value Improves Read operation performance Improves CASN operation performance Allow un-sorted pointers arguments Grabbing Help until definitive conflict and then apply deterministic lock stealing and lock hand-over to resolve the deadlock
13
Descriptor structure allowing fast look-up of current value Allows 31 bits of a 32-bit memory word to represent the actual value. The corresponding old or new value can be indexed ( 1…N ) directly
14
Experiments Micro benchmark using Read() and CASN() Each thread repeatedly performs updates of N memory words Runs for 5 seconds, and number of successful updates are measured For each experiment varies parameters N is either 2, 4, 8, or 16 words N is selected (for each update) randomly from 2,4, …, 16384 words Number of threads is varied between 8, 16 or 32
15
Experiments In each micro benchmark compares with 2 of the latest (e.g. fastest) CASN implementations in the literature Harris et al, 2002. Lock-Free. Allows 30 bits of word to be used for value. Requires pointers arguments to be sorted. Ha and Tsigas, 2004. Lock-Free. Needs underlying LL/SC primitive implementation (e.g. Michael 2004). Allows (with selected LL/SC) any number of bits for value. Requires pointers arguments to be sorted.
16
Experiments – Some results
17
Conclusions New Wait-Free Algorithm for Multi- Word Compare-And-Swap Greedy helping and Grabbing Extraordinary Performance Even better average performance than corresponding lock-free in many scenarios! Especially in high contention
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.