Concurrent Search Structure Algorithms Dennis Shasha
What is a Search Structure? Data structure (typically a B tree, hash structure, R-tree, etc.) that supports a dictionary. Operations are insert key-value pair, delete key-value pair, and search for a key-value pair.
How to make a search structure algorithm concurrent? Naïve approach: use two phase locking (but then the root is at least read-locked so conflicts occur) Semi-naïve: use hierarchical tree locking: lock root; afterwards lock node n (Still tends to hold locks high in tree.)
How can we do better? Fundamental insight: In a search structure algorithm, all that we really care about is that we implement the dictionary operations correctly. Operations on structure need not even be serializable provided they maintain certain constraints.
Train your intuition: parable of the library Bob goes to a library (with books) and looks in the catalogue for a great puzzle book P. He is told it is on stack G. He walks towards G but sees a friend and they have a chat. Alice the librarian moves some books from G to H and leaves a note. She then changes the catalogue. Bob goes to G, sees the note, and finds P on stack H.
Observe: In no serial execution would Bob visit two stacks – if he had gone after Alice, he would have visited only H; if before, only G. But this is still ok. Why? Intuition: the search is always pointed towards a correct final position.
Using this for search structures KeySpace = the set of all possible keys (e.g. all possible integers) Inset of a node n = subset of KeySpace that is either in n, a node reachable from n or nowhere in the data structure. Outset of n towards n’ = The subset of KeySpace associated with the edge from n to n’. Depends on the data structure. Keyset of n = inset(n) – U outset(n,n’)
Binary search tree Root node is 20 Left child is 5 And maybe the left child has descendants. What is the inset of the left child? All values less than 20. Right child of 5 has value 13. What is the outset(5,13) and what is the inset(13)
Example Suppose the root of a binary search tree has the value 20 and a left child L and right child R. Inset(root) = KeySpace Outset(root,L) = {x| x < 20} Outset(root, R) = {x| x > 20} Keyset(root) = {20}
Let’s Return to the Library Suppose that, to start, the shelf G has as its inset all books with last name starting with “S”. Alice moves those between “S” and “Si” to shelf H and leaves a note to that effect. Then the keyset of G becomes {x | x begins with “S”} – {x|x <= “Si”} The keyset of G represents the set of books that are in G or nowhere in the structure.
The Key Invariants If x is in node n, then x is in keyset(n) The keysets partition the KeySpace. If the search for an item x is at node n, then x is in keyset(n) or there is a path from n to node m such that x is in keyset(m) and every edge along the path has x in its outset. The invariants assure that the search, if it terminates, will terminate in the correct place.
Application: link algorithm Recall splits in B trees. Split n into n and n’ and adjust the parent p to reflect the change. Here is how to do a split: lock(n), move the values from n to n’ as desired and leave a forward pointer to that effect, unlock(n), lock(p), and adjust p, then unlock(p). This is exactly the library scenario.
Application: give-up algorithm Instead of including a forwarding pointer, just asset explicitly the inset of each node. When splitting, reduce the asserted inset(n), denoted assertinset(n) and establish assertinset(n’). If a search for x arrives at n and x is not in assertinset(n), the search starts over. Happens rarely enough that performance is very good.
Application: multi-rooted structure Imagine a link B-tree structure with a root at the top of the tree and a root to the left of the leftmost leaf node. Same invariants hold and search can proceed from anywhere.
Conclusion Simple framework for all search structures. Handful of concepts: KeySpace, inset, outset, keyset. Performs well.