Range Queries in Non-blocking k-ary Search Trees Trevor Brown Hillel Avni.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

Comp 122, Spring 2004 Binary Search Trees. btrees - 2 Comp 122, Spring 2004 Binary Trees  Recursive definition 1.An empty tree is a binary tree 2.A node.
Chapter 4: Trees Part II - AVL Tree
Trees Types and Operations
Scalable and Lock-Free Concurrent Dictionaries
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
1 Lecture 8: Data structures for databases II Jose M. Peña
Multiversion Access Methods - Temporal Indexing. Basics A data structure is called : Ephemeral: updates create a new version and the old version cannot.
BTrees & Bitmap Indexes
1 B trees Nodes have more than 2 children Each internal node has between k and 2k children and between k-1 and 2k-1 keys A leaf has between k-1 and 2k-1.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter Trees and B-Trees.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
General Trees and Variants CPSC 335. General Trees and transformation to binary trees B-tree variants: B*, B+, prefix B+ 2-4, Horizontal-vertical, Red-black.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
CS4432: Database Systems II
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B+ Tree What is a B+ Tree Searching Insertion Deletion.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.
B+ Trees COMP
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 6.
B + TREE. INTRODUCTION A B+ tree is a balanced tree in which every path from the root of the tree to a leaf is of the same length, and each non leaf node.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Index tuning-- B+tree. overview © Dennis Shasha, Philippe Bonnet 2001 B+-Tree Locking Tree Traversal –Update, Read –Insert, Delete phantom problem: need.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
A General Technique for Non-blocking Trees
Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.
November 15, 2007 A Java Implementation of a Lock- Free Concurrent Priority Queue Bart Verzijlenberg.
A Consistency Framework for Iteration Operations in Concurrent Data Structures Yiannis Nikolakopoulos A. Gidenstam M. Papatriantafilou P. Tsigas Distributed.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Starting at Binary Trees
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Skiplist-based Concurrent Priority Queues Itay Lotan Stanford University Nir Shavit Sun Microsystems Laboratories.
A Simple Optimistic skip-list Algorithm Maurice Herlihy Brown University & Sun Microsystems Laboratories Yossi Lev Brown University & Sun Microsystems.
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Prof. Sumanta Guha Slide Sources: CLRS “Intro.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
CompSci Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
Rooted Tree a b d ef i j g h c k root parent node (self) child descendent leaf (no children) e, i, k, g, h are leaves internal node (not a leaf) sibling.
Concurrent Tries with Efficient Non-blocking Snapshots Aleksandar Prokopec Phil Bagwell Martin Odersky École Polytechnique Fédérale de Lausanne Nathan.
Non-blocking k-ary Search Trees Trevor Brown – University of Toronto Joanna Helga – York University Ellen, Fatourou, Ruppert, and van Breugel. Non-blocking.
CS4432: Database Systems II More on Index Structures 1.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
B+-Tree Deletion Underflow conditions B+ tree Deletion Algorithm
ITEC 2620M Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: ec2620m.htm Office: TEL 3049.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
COMP261 Lecture 23 B Trees.
Multiway Search Trees Data may not fit into main memory
Review Deleting an Element from a Linked List Deletion involves:
Combining HTM and RCU to Implement Highly Efficient Balanced Binary Search Trees Dimitrios Siakavaras, Konstantinos Nikas, Georgios Goumas and Nectarios.
Faster Data Structures in Transactional Memory using Three Paths
Chapter Trees and B-Trees
Chapter Trees and B-Trees
Trees & Forests D. J. Foreman.
A Concurrent Lock-Free Priority Queue for Multi-Thread Systems
Presentation transcript:

Range Queries in Non-blocking k-ary Search Trees Trevor Brown Hillel Avni

Problem Statement ● Want to store keys in a dynamic data structure supporting insertion, deletion, and: ● RangeQuery(a, b): returns all keys of the data structure in range [a, b]

Previous Solutions ● Software transactional memory ● Locks ● Persistence a a d d f f e e b b root pointer e e b b d d c c Insert(c)

k-ary Search Tree (k-ST) ● Add or remove keys by replacing node(s) ● Related to persistent data structures [Brown, Helga]

The Range Query Algorithm RangeQuery(a, b): – Traverse the tree, skipping sub-trees which cannot contain a key in [a, b] – During this traversal, save a pointer to each leaf that contains a key in [a, b] – […] – Problem: how to efficiently tell if a key was added or removed during this traversal?

Extending the Data Structure ● Add a dirty bit to each leaf ● Each leaf has its dirty bit set just before it is replaced – Consequence: If a leaf's dirty bit is not set, then it has not been replaced

The Range Query Algorithm RangeQuery(a, b): – Traverse the tree, skipping sub-trees which cannot contain a key in [a, b] – During this traversal, save a pointer to each leaf that contains a key in [a, b] – After this traversal, check the dirty bits of these leaves, one by one – If no dirty bit is set, then return “the result” – Otherwise, retry ● Reading dirty bit is far faster than re-traversing

Example: RangeQuery(3, 14) 8, 13, 25 2, 3, 5 8, 9, , 7 14, 19, , 16, 18 23, 24 29, 35 Saved pointers 3, 4 Insert(3) RangeQuery sees 4 is dirty… Retry!

Retrying RangeQuery(3, 14) 8, 13, 25 2, 3, 5 8, 9, , 7 14, 19, , 16, 18 23, 24 29, 35 Saved pointers 3, 4 Success! Return the result…

Ctrie ● Taking a “snapshot:” atomically replace root ● Old tree no longer changes ● Future searches and updates copy nodes from the old tree [Prokopec, Bronson, Bagwell, Odersky] | | | … … … … … … … … root pointer | | |

When our Algorithm is Good ● When workloads contain range queries over small ranges (i.e., where snapshots are bad) – Example: database applications such as airline database of flights When it might not be ● Very large ranges increase the chance that a range query will have to retry – Our experiments explore how much this matters – In extreme cases Ctrie or Snap might be better

Experiment: compare performance of ● k-ST: k=16, 32, 64 ● Snap ● Ctrie ● Java’s Concurrent Skip List (SL) – NOT LINEARIZABLE!

Experiment ● Throughput vs. number of concurrent threads ● Each thread repeatedly chooses a random operation (Search, Insert, Delete, RangeQuery) with arguments chosen uniformly randomly in [0, 10^6) ● Each experiment ran with a fixed amount of memory, for a fixed, sufficiently long amount of time

Hardware ● Intel 4-chip, 40-core, 80-thread ● Sun 2-chip, 16-core, 128-thread

Many queries with small ranges Snap Ctrie 16-ST 32-ST SL 64-ST Number of threads Throughput (millions) Throughput (millions) 50% search, 5% insert, 5% delete, 40% range query size 100

Many queries with bigger ranges Snap Ctrie 16-ST 32-ST SL 64-ST Number of threads Throughput (hundred thousands) Throughput (hundred thousands) 50% search, 5% insert, 5% delete, 40% range query size 10,000

Few queries (with small ranges) Snap Ctrie 16-ST 32-ST SL 64-ST Number of threads Throughput (ten millions) Throughput (ten millions) 59% search, 20% insert, 20% delete, 1% range query size 100

Throughput versus P(range query) Snap Ctrie KST16 KST32 SL KST64 Probability of range query Throughput (ten millions) Throughput (ten millions) 1:10000 operations is RQ

Throughput versus arity 5i-5d-40r-size i-20d-1r-size i-5d-40r-size100 20i-20d-1r-size100 Degree of tree Throughput (ten millions) Throughput (ten millions)

Conclusion ● Provably correct algorithm ● Searches can ignore concurrent updates ● Although dirty bits invalidate range queries, they do not invalidate searches ● Range queries are invisible ● No CAS, don’t change data structure ● Avoids excessive duplication of nodes ● Appears to be practical when workloads contain queries over small ranges

Future work ● Adding balance – (a, b)-tree, chromatic tree, relaxed AVL tree ● Wait-freedom?