Multicore programming Expanding hash tables, and synchronization primitives Week 3 – Wednesday Trevor Brown http://tbrown.pro
Last time Efficiently expanding a hash table Marking buckets to prevent errors Approximate counters for detecting table > 50% full
Implementation sketch With expansion struct hashmap 1 volatile char padding0[64]; 2 table * volatile currentTable; 3 volatile char padding1[64]; /* code for operations ... */ Without expansion struct hashmap 1 volatile char padding0[64]; 2 volatile uint64_t * data; 3 const int capacity; 4 volatile char padding1[64]; /* code for operations ... */ struct table 1 volatile char padding0[64]; 2 volatile uint64_t * data; 3 volatile uint64_t * old; 4 volatile int chunksClaimed; 5 volatile int size; 6 const int capacity; 7 volatile char padding1[64];
Implementation sketch Check if expansion is in progress, and help if so. Otherwise, create new struct table and CAS it into currentTable. int hashmap::insert(int key) table * t = currentTable; int h = hash(key); for (int i=0;i<capacity;++i) { if (isTooFull(t)) { tryExpand(); return insert(key); } int index = (h+i) % t->capacity; int found = t->data[index]; if (found & MARKED_MASK) { helpExpand(); return insert(key); } else if (found == key) return false; else if (found == NULL) { if (CAS(&t->data[index], NULL, key)) return true; else { found = t->data[index]; } } } assert(false); Check if expansion is in progress, and help if so. Check if expansion is in progress, and help if so. Could fail CAS become someone inserted, or because someone marked
Making migration more efficient Normal function to get bucket index from key: index = hash(key) % capacity When capacity doubles, indexes of keys are scrambled 193857111 % 12 = 3 193857111 % 24 = 15 bucket 3 bucket 15 817295042 % 12 = 2 817295042 % 24 = 1 bucket 2 bucket 1 Scaled index function index = floor( hash(key) / |hash universe| * capacity) When capacity doubles, indexes of keys are doubled bucket 3 -> bucket 6 bucket 2 bucket 4 bucket 15 bucket 30 With predictable indexes, can expand much more efficiently!
idea Old table 7 6 3 4 2 One thread can copy without synchronization 7 New table
More complex data structures
What else is worth understanding? We’ve seen hash tables… What about node based data structures? (That aren’t just a single pointer like stacks, or two pointers like queues) Singly-linked lists, doubly-linked lists, skip-lists, trees, tries, hash tries, … New challenges: Nodes get deleted when threads might be trying to work on them Operations may require atomic changes to multiple nodes
Lock-free singly-Linked lists: Attempting to use CAS Ordered set implemented with singly-linked list Delete(15) Traverse list, then CAS .next from to Insert(17) Traverse list, create node , then CAS .next from to 7 15 20 17 7 20 17 head 7 15 20 17
The problem What if the operations are concurrent? Delete(15): pause just before CAS .next from to Insert(17): traverse list, create node , then CAS .next from to Delete(15): resume and CAS .next from to 7 15 20 17 15 20 17 7 15 20 head 7 15 20 Erroneously deleted 17! 17
Solution: marking [Harris2001] Idea: prevent changes to nodes that will be deleted Before deleting a node, mark its next pointer How does this fix the Insert(17), Delete(15) example? Delete(15) marks before using CAS to delete it Insert(17) cannot modify .next because it is marked 15 Okay. We can do lists. 15 head 7 15 20 17
What about removing several nodes? Deleting consecutive nodes in a list… Delete(15 AND 20) Mark 15, then mark 20? What can go wrong… Or performing rotations in trees by replacing nodes… head 7 15 20 27 D D A A B C B C
Or changing two pointers at once? Doubly-linked list Insert(17) If the two pointer changes are not atomic Insertions and deletions could happen between them Example: after 15.next := 17, but before 20.pred := 17, someone inserts between 17 and 20 7 15 20 17
Easy Lock-based solution Doubly-linked list Insert(17) Simplest locking discipline Never access anything without locking it first Correct, but at what cost? To respect the locking discipline, we have to lock while searching! pred succ 7 15 20 17
Can we search without locks? Insert(k): Search without locking until we reach nodes pred & succ where pred.key < k <= succ.key If we found k, return false Lock pred, lock succ If pred.next != succ, unlock and retry Create new node n pred.next = n succ.prev = n Unlock all Insert(17) pred succ 7 15 20 17 Contains(k): pred = head succ = head Loop If succ == NULL or succ.key > k then return false If succ.key == k then return true succ = succ.next Where do we linearize?
What if we have different types of searches? Could imagine an application that wants a doubly linked list so: Some threads can search left-to-right (containsLR) Some threads can search right-to-left (containsRL) When do we linearize insertions in such an algorithm?
Bi-directional searches really complicate linearization… Insert(17) Insert(k): Search without locking until we reach nodes pred & succ where pred.key < k <= succ.key If we found k, return false Lock pred, lock succ If pred.next != succ, unlock and retry Create new node n pred.next = n succ.prev = n Unlock all pred succ 7 15 20 17 Where should we linearize? When we set pred.next? What if p is searching for 17 from left to right and q is searching for 17 from right to left? p sees 17 and returns true, q does not q must see 17 if we’ve linearized the insert When we set succ.prev? p shouldn’t see 17 until succ.prev has been set…
Summary Implementation of hash table Insert with expansion Making expansion more efficient (scaled index function) Singly- and doubly-linked lists Main challenges in node-based data structures Preventing changes to deleted nodes Atomically modifying two or more variables (Hard if you have searches that do not lock!)