Combining HTM and RCU to Implement Highly Efficient Balanced Binary Search Trees Dimitrios Siakavaras, Konstantinos Nikas, Georgios Goumas and Nectarios.

Slides:



Advertisements
Similar presentations
Comp 122, Spring 2004 Binary Search Trees. btrees - 2 Comp 122, Spring 2004 Binary Trees  Recursive definition 1.An empty tree is a binary tree 2.A node.
Advertisements

Chapter 4: Trees Part II - AVL Tree
AVL Trees COL 106 Amit Kumar Shweta Agrawal Slide Courtesy : Douglas Wilhelm Harder, MMath, UWaterloo
AVL Trees Balancing. The AVL Tree An AVL tree is a balanced binary search tree. What does it mean for a tree to be balanced? It means that for every node.
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
AA Trees another alternative to AVL trees. Balanced Binary Search Trees A Binary Search Tree (BST) of N nodes is balanced if height is in O(log N) A balanced.
10/31/20111 Relativistic Red-Black Trees Philip Howard 10/31/2011
Relativistic Red Black Trees. Relativistic Programming Concurrent reading and writing improves performance and scalability – concurrent readers may disagree.
Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
Chapter 13 B Advanced Implementations of Tables – Balanced BSTs.
1 Balanced Trees There are several ways to define balance Examples: –Force the subtrees of each node to have almost equal heights –Place upper and lower.
1 Trees 4: AVL Trees Section 4.4. Motivation When building a binary search tree, what type of trees would we like? Example: 3, 5, 8, 20, 18, 13, 22 2.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Range Queries in Non-blocking k-ary Search Trees Trevor Brown Hillel Avni.
AVL Trees AVL (Adel`son-Vel`skii and Landis) tree = – A BST – With the property: For every node, the heights of the left and right subtrees differ at most.
Priority Queues and Heaps Tom Przybylinski. Maps ● We have (key,value) pairs, called entries ● We want to store and find/remove arbitrary entries (random.
Balanced Search Trees 2-3 Trees AVL Trees Red-Black Trees
Lecture 23 Red Black Tree Chapter 10 of textbook
COMP261 Lecture 23 B Trees.
Algorithmic Improvements for Fast Concurrent Cuckoo Hashing
AA Trees.
CSCE 3110 Data Structures & Algorithm Analysis
BCA-II Data Structure Using C
CSCE 3110 Data Structures & Algorithm Analysis
Multiway Search Trees Data may not fit into main memory
Massively Concurrent Red-Black Trees with Hardware Transactional Memory Dimitrios Siakavaras, Konstantinos Nikas, Georgios Goumas and Nectarios Koziris.
Binary Search Trees A binary search tree is a binary tree
BST Trees
Lecture 15 AVL Trees Slides modified from © 2010 Goodrich, Tamassia & by Prof. Naveen Garg’s Lectures.
Balancing Binary Search Trees
UNIT III TREES.
Red Black Trees
Faster Data Structures in Transactional Memory using Three Paths
Binary Search Tree (BST)
CS202 - Fundamental Structures of Computer Science II
Extra: B+ Trees CS1: Java Programming Colorado State University
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
CS202 - Fundamental Structures of Computer Science II
Binary Search Tree Chapter 10.
Height Balanced Trees CPSC 335 Dr. Marina Gavrilova Computer Science
Lecture 22 Binary Search Trees Chapter 10 of textbook
B+ Tree.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
AVL Trees "The voyage of discovery is not in seeking new landscapes but in having new eyes. " - Marcel Proust.
Lecture 25 Splay Tree Chapter 10 of textbook
Binary Trees, Binary Search Trees
Binary Tree and General Tree
Binary Search Trees.
Data Structures Lecture 4 AVL and WAVL Trees Haim Kaplan and Uri Zwick
Red-Black Trees 11/13/2018 2:07 AM AVL Trees v z AVL Trees.
TCSS 342, Winter 2006 Lecture Notes
Chapter 6 Transform and Conquer.
Concurrent Data Structures Concurrent Algorithms 2017
CS202 - Fundamental Structures of Computer Science II
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
AVL Trees: AVL Trees: Balanced binary search tree
CS202 - Fundamental Structures of Computer Science II
v z Chapter 10 AVL Trees Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich,
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
CS202 - Fundamental Structures of Computer Science II
Red-Black Trees 2/24/ :17 AM AVL Trees v z AVL Trees.
Lecture 9: Self Balancing Trees
Richard Anderson Spring 2016
Red-Black Trees 5/19/2019 6:39 AM AVL Trees v z AVL Trees.
Chapter 12&13: Binary Search Trees (BSTs)
CS202 - Fundamental Structures of Computer Science II
CS202 - Fundamental Structures of Computer Science II
AVL Trees: AVL Trees: Balanced binary search tree
Presentation transcript:

Combining HTM and RCU to Implement Highly Efficient Balanced Binary Search Trees Dimitrios Siakavaras, Konstantinos Nikas, Georgios Goumas and Nectarios Koziris National Technical University of Athens (NTUA) School of Electrical and Computer Engineering (ECE) Computing Systems Laboratory (CSLab) {jimsiak,knikas,goumas,nkoziris}@cslab.ece.ntua.gr http://research.cslab.ece.ntua.gr Transact/WTTM 2017

Outline Binary Search Trees (BSTs) Concurrent BSTs RCU-HTM Experimental results Conclusions & Future work Siakavaras et. al cslab@ntua

Binary search trees Siakavaras et. al cslab@ntua

Binary Search Trees (BSTs) 8 3 10 1 6 14 4 7 13 A classic binary tree with an additional property: Nodes in left subtree have keys less than the key of the root, nodes in right subtree have keys greater than the root. Most commonly used to implement dictionaries: <key,value> pairs 3 operations: lookup(key), insert(key, value) delete(key) Siakavaras et. al cslab@ntua

Internal vs. External BSTs 8 8 3 10 3 10 1 6 10 14 1 6 14 1 3 6 8 Internal External Internal: <key,value> pairs in every node External: values only in leaves, internal nodes only contain keys. External trees simplify the delete() operation They require twice as much memory Longer traversal paths Siakavaras et. al cslab@ntua

Deletion in an Internal BST Deleting a node with one or zero children is easy Just change parent’s child pointer 8 Example: delete(10) 3 10 1 6 14 Siakavaras et. al cslab@ntua

Deletion in an Internal BST Deleting a node with one or zero children is easy Just change parent’s child pointer Deleting a node with two children is more complicated Need to find successor, swap keys and remove successor node Successor may be many links away 8 10 Example: delete(8) successor 3 10 1 6 14 Siakavaras et. al cslab@ntua

Deletion in an External BST Deletion is always simple 8 3 10 Example: delete(8) 1 6 10 14 1 3 6 8 Siakavaras et. al cslab@ntua

Unbalanced vs. Balanced BSTs 3 8 8 10 2 1 13 4 13 3 8 14 1 1 1 6 10 14 3 7 10 14 3 13 1 6 4 7 1 6 4 7 Unbalanced Tree Red-Black Tree AVL Tree Balanced trees limit the height of the tree (i.e., the length of maximum path) to provide bounded and predictable traversal times Rebalancing requires additional effort after insertions/deletions Siakavaras et. al cslab@ntua

Insertion in an Unbalanced BST traverse_bst(bst, key); int bst_insert(bst_t *bst, int key, void *value) { traverse_bst(bst, key); if (key was found) return 0; insert_node(bst, key, value); return 1; } insert_node(bst, key, value); 10 Example: bst_insert(key = 2) 8 14 3 13 1 6 2 4 7 Siakavaras et. al cslab@ntua

Insertion in a Balanced BST insert_node_and_rebalance(bbst, key, value); traverse_bbst(bbst, key); int bbst_insert(bbst_t *bst, int key, void *value) { traverse_bbst(bbst, key); if (key was found) return 0; insert_node_and_rebalance(bbst, key, value); return 1; } 3 8 3 8 Example: bst_insert(key = 2) 2 1 4 13 2 1 4 13 1 1 1 2 7 10 14 1 3 7 10 14 1 3 6 1 6 2 Siakavaras et. al cslab@ntua

Concurrent Binary search trees Siakavaras et. al cslab@ntua

Concurrent BSTs There are 2 challenges for concurrent internal and balanced BSTs: The deletion of a node with 2 children requires exclusive access to the whole path from the node to the successor. Rebalancing requires several modifications that need to be performed in a single atomic step. Proposed non-blocking and lock-based concurrent BSTs are either: Unbalanced [Natarajan PPoPP’14, Howley SPAA’12, Ellen PODC’10] Relaxed balanced [Bronson PPoPP’10, Drachsler PPoPP’14, Brown PPoPP’14] External [Natarajan PPoPP’14, Ellen PODC’10] Partially external [Bronson PPoPP’10] Siakavaras et. al cslab@ntua

Concurrent RCU-based BSTs Read-Copy-Update (RCU) Modifications are performed in copies and not in place. Copies are atomically installed in the shared data structure. Readers may proceed without any synchronization and without restarting Updaters need to be explicitly synchronized (most commonly only a single updater is allowed to operate) Single updater RCU tree: Multiple readers Single updater Citrus RCU tree [Arbel PODC’14]: Multiple updaters using fine-grain locks. Unbalanced tree to enable fine-grain locking Example: bst_insert(key = 2) 3 8 Old readers may still traverse old versions of nodes. New readers will see the new nodes. 2 1 4 13 1 1 2 3 7 10 14 Updaters can safely replace parts of the tree as only a single updater is allowed. 1’ 3’ 1 6 Siakavaras et. al cslab@ntua Siakavaras et. al cslab@ntua 14

Concurrent HTM-based BSTs Hardware Transactional Memory (HTM) Avoids STM’s huge overheads Allows the modification of multiple locations atomically → good fit for the rebalancing phase in a BBST HTM-based BSTs: Coarse-grained HTM (cg-htm): Each operation enclosed in a single transaction Easy to implement Large transactions (increased conflict probability) Consistency-Oblivious-Programming HTM (cop-htm) [Avni TRANSACT’14]: The traversal is performed outside the transaction The executed transaction includes 2 steps: Validate that the traversal ended at the correct node Insert/Delete the node and rebalance if necessary Shorter transactions than cg-htm Traversals (and consequently lookup operations) may need to restart Siakavaras et. al cslab@ntua

RCU-htm Siakavaras et. al cslab@ntua

Siakavaras et. al cslab@ntua RCU-HTM Combines RCU with HTM in an innovative way and provides trees with: Asynchronized traversals (thanks to RCU) Oblivious of concurrent updates in the tree No locks, no transactions or any other synchronization No restarts Concurrent updaters (thanks to HTM) All updates are performed in copies Modified copies are first validated and then installed in the tree An HTM transaction is used for the validation+installation phase HTM transaction includes several reads but only a single write → minimized conflict probability Siakavaras et. al cslab@ntua

RCU-HTM: insert operation Traverse the tree to locate the insertion point During traversal we maintain a stack of pointers to the traversed nodes Perform the insertion and rebalance using copies The reverse traversal uses the saved stack of pointers For each copied node we store the observed children pointers Validate the modified copy For each copied node check that children pointers haven’t been modified since we copied the node Also validate the access path followed during traversal Install the copy Change connection_point’s child 2 3 Steps 3 and 4 performed atomically inside an HTM transaction If the validation in step 3 fails we abort the transaction and restart the operation For the non-transactional fallback path we use a lock that allows only a single updater. 3’ 7 connection_point copy_root 3 ... 1 1 2 5’ 5 2’ 5’ copy_root 2 1 3’ 3 6 connection_point Example: insert(key = 1) copy_root 1 1 2’ 2 4 connection_point copy_root connection_point 1 Siakavaras et. al cslab@ntua

RCU-HTM: delete operation Similar to insert One difference: When we delete a node with two children we need to copy the whole path to its successor Siakavaras et. al cslab@ntua

Experimental results Siakavaras et. al cslab@ntua

Siakavaras et. al cslab@ntua Experimental Setup Intel Broadwell-EP Xeon E5-2699 v4 22 cores / 44 hyperthreads @ 2.2GHz 64 GB of RAM GCC 4.9.2, -O3 optimizations enabled Scalable memory allocator (jemalloc) No memory reclamation All threads pinned to hardware threads (hyperthreads enabled only at 44-threaded executions) Experiments: Threads run for 2 seconds, executing randomly chosen operations (lookups/inserts/deletes) 3 Workloads: 100%, 80% and 20% lookups, and the rest equally divide between insertions and deletions 3 tree sizes: 2K keys, 20K keys and 2M keys Siakavaras et. al cslab@ntua

Comparison with HTM-based approaches 2K keys 100% lookups Read-only workloads No conflict/capacity aborts → all HTM-based trees scale RCU-HTM is constantly better due to 2 reasons: In small trees the overhead of starting/ending transactions is visible in cg-htm and cop-htm. In large trees the transaction overhead is hidden but rcu-htm is faster because of the smaller size of its nodes (e.g., cop-htm also stores 3 more pointers: parent, prev, succ) 2M keys 100% lookups Siakavaras et. al cslab@ntua

Comparison with HTM-based approaches 2K keys 20% lookups Write-dominated workloads In small trees both cg-htm and cop-htm suffer from conflict aborts due to their larger transactions (see next slide). In large trees cop-htm also manages to avoid conflicts. 2M keys 20% lookups Siakavaras et. al cslab@ntua

Comparison with HTM-based approaches cg-htm cop-htm rcu-htm RCU-HTM executes much less transactions and suffers less aborts. Siakavaras et. al cslab@ntua

Comparison with RCU-based approaches 2K keys 100% lookups 2K keys 20% lookups avl-rcu-mrsw: writers synchronized using a single lock bst-citrus: unbalanced BST, RCU for readers, fine-grain locks for writers [Arbel PODC’14] Siakavaras et. al cslab@ntua

Comparison with state-of-the-art 2K keys 100% lookups 2K keys 20% lookups avl-lb: relaxed balance lock-based AVL tree [Bronson PPOPP’10] bst-lf: unbalanced lock-free (CAS-based) tree [Natarajan PPoPP’14] Siakavaras et. al cslab@ntua

Comparison with state-of-the-art 2M keys 100% lookups 2M keys 20% lookups Siakavaras et. al cslab@ntua

Conclusions & future work Siakavaras et. al cslab@ntua

Conclusions & Future Work RCU-HTM combines RCU with HTM and provides concurrent BSTs that are: Internal Strictly balanced Efficient both for readers and updaters Future work Memory reclamation Formal proof of correctness (linearizability) More BSTs (e.g., B+-trees, Splay trees, etc.) Siakavaras et. al cslab@ntua

THANK YOU! QUESTIONS? ACKNOWLEDGMENT Intel Corporation for kindly providing the Broadwell-EP server on which we executed our experiments. Siakavaras et. al cslab@ntua