Presentation is loading. Please wait.

Presentation is loading. Please wait.

Range Queries in Non-blocking k-ary Search Trees Trevor Brown Hillel Avni.

Similar presentations


Presentation on theme: "Range Queries in Non-blocking k-ary Search Trees Trevor Brown Hillel Avni."— Presentation transcript:

1 Range Queries in Non-blocking k-ary Search Trees Trevor Brown Hillel Avni

2 Problem Statement ● Want to store keys in a dynamic data structure supporting insertion, deletion, and: ● RangeQuery(a, b): returns all keys of the data structure in range [a, b]

3 Previous Solutions ● Software transactional memory ● Locks ● Persistence a a d d f f e e b b root pointer e e b b d d c c Insert(c)

4 k-ary Search Tree (k-ST) ● Add or remove keys by replacing node(s) ● Related to persistent data structures [Brown, Helga]

5 The Range Query Algorithm RangeQuery(a, b): – Traverse the tree, skipping sub-trees which cannot contain a key in [a, b] – During this traversal, save a pointer to each leaf that contains a key in [a, b] – […] – Problem: how to efficiently tell if a key was added or removed during this traversal?

6 Extending the Data Structure ● Add a dirty bit to each leaf ● Each leaf has its dirty bit set just before it is replaced – Consequence: If a leaf's dirty bit is not set, then it has not been replaced

7 The Range Query Algorithm RangeQuery(a, b): – Traverse the tree, skipping sub-trees which cannot contain a key in [a, b] – During this traversal, save a pointer to each leaf that contains a key in [a, b] – After this traversal, check the dirty bits of these leaves, one by one – If no dirty bit is set, then return “the result” – Otherwise, retry ● Reading dirty bit is far faster than re-traversing

8 Example: RangeQuery(3, 14) 8, 13, 25 2, 3, 5 8, 9, 12 1 1 4 4 5, 7 14, 19, 21 13 15, 16, 18 23, 24 29, 35 Saved pointers 3, 4 Insert(3) RangeQuery sees 4 is dirty… Retry!

9 Retrying RangeQuery(3, 14) 8, 13, 25 2, 3, 5 8, 9, 12 1 1 4 4 5, 7 14, 19, 21 13 15, 16, 18 23, 24 29, 35 Saved pointers 3, 4 Success! Return the result…

10 Ctrie ● Taking a “snapshot:” atomically replace root ● Old tree no longer changes ● Future searches and updates copy nodes from the old tree [Prokopec, Bronson, Bagwell, Odersky] | | | … … … … … … … … root pointer | | | 00 011011011011

11 When our Algorithm is Good ● When workloads contain range queries over small ranges (i.e., where snapshots are bad) – Example: database applications such as airline database of flights When it might not be ● Very large ranges increase the chance that a range query will have to retry – Our experiments explore how much this matters – In extreme cases Ctrie or Snap might be better

12 Experiment: compare performance of ● k-ST: k=16, 32, 64 ● Snap ● Ctrie ● Java’s Concurrent Skip List (SL) – NOT LINEARIZABLE!

13 Experiment ● Throughput vs. number of concurrent threads ● Each thread repeatedly chooses a random operation (Search, Insert, Delete, RangeQuery) with arguments chosen uniformly randomly in [0, 10^6) ● Each experiment ran with a fixed amount of memory, for a fixed, sufficiently long amount of time

14 Hardware ● Intel 4-chip, 40-core, 80-thread ● Sun 2-chip, 16-core, 128-thread

15 Many queries with small ranges Snap Ctrie 16-ST 32-ST SL 64-ST Number of threads Throughput (millions) Throughput (millions) 50% search, 5% insert, 5% delete, 40% range query size 100

16 Many queries with bigger ranges Snap Ctrie 16-ST 32-ST SL 64-ST Number of threads Throughput (hundred thousands) Throughput (hundred thousands) 50% search, 5% insert, 5% delete, 40% range query size 10,000

17 Few queries (with small ranges) Snap Ctrie 16-ST 32-ST SL 64-ST Number of threads Throughput (ten millions) Throughput (ten millions) 59% search, 20% insert, 20% delete, 1% range query size 100

18 Throughput versus P(range query) Snap Ctrie KST16 KST32 SL KST64 Probability of range query Throughput (ten millions) Throughput (ten millions) 1:10000 operations is RQ

19 Throughput versus arity 5i-5d-40r-size10000 20i-20d-1r-size10000 5i-5d-40r-size100 20i-20d-1r-size100 Degree of tree Throughput (ten millions) Throughput (ten millions)

20 Conclusion ● Provably correct algorithm ● Searches can ignore concurrent updates ● Although dirty bits invalidate range queries, they do not invalidate searches ● Range queries are invisible ● No CAS, don’t change data structure ● Avoids excessive duplication of nodes ● Appears to be practical when workloads contain queries over small ranges

21 Future work ● Adding balance – (a, b)-tree, chromatic tree, relaxed AVL tree ● Wait-freedom?


Download ppt "Range Queries in Non-blocking k-ary Search Trees Trevor Brown Hillel Avni."

Similar presentations


Ads by Google