Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.

Slides:



Advertisements
Similar presentations
Chapter 4: Trees Part II - AVL Tree
Advertisements

I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Multiversion Access Methods - Temporal Indexing. Basics A data structure is called : Ephemeral: updates create a new version and the old version cannot.
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
CSE332: Data Abstractions Lecture 10: More B-Trees Tyler Robison Summer
I/O-Algorithms Lars Arge Spring 2011 March 8, 2011.
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
CSE332: Data Abstractions Lecture 9: BTrees Tyler Robison Summer
1 B trees Nodes have more than 2 children Each internal node has between k and 2k children and between k-1 and 2k-1 keys A leaf has between k-1 and 2k-1.
I/O-Algorithms Lars Arge Aarhus University February 13, 2007.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
6/14/2015 6:48 AM(2,4) Trees /14/2015 6:48 AM(2,4) Trees2 Outline and Reading Multi-way search tree (§3.3.1) Definition Search (2,4)
Advanced Tree Data Structures Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
I/O-Algorithms Lars Arge University of Aarhus February 13, 2005.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
BST Data Structure A BST node contains: A BST contains
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
Trees and Red-Black Trees Gordon College Prof. Brinton.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
© 2004 Goodrich, Tamassia (2,4) Trees
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
CSE 326: Data Structures Lecture #13 Extendible Hashing and Splay Trees Alon Halevy Spring Quarter 2001.
General Trees and Variants CPSC 335. General Trees and transformation to binary trees B-tree variants: B*, B+, prefix B+ 2-4, Horizontal-vertical, Red-black.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
(B+-Trees, that is) Steve Wolfman 2014W1
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Advanced Data Structures and Algorithms COSC-600 Lecture presentation-6.
CSC 213 – Large Scale Programming. Today’s Goals  Review a new search tree algorithm is needed  What real-world problems occur with old tree?  Why.
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
Comp 249 Programming Methodology Chapter 15 Linked Data Structure - Part B Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
1 Road Map Associative Container Impl. Unordered ACs Hashing Collision Resolution Collision Resolution Open Addressing Open Addressing Separate Chaining.
CSE AU B-Trees1 B-Trees CSE 373 Data Structures.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
Sorting with Heaps Observation: Removal of the largest item from a heap can be performed in O(log n) time Another observation: Nodes are removed in order.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
B-Trees And B+-Trees Jay Yim CS 157B Dr. Lee.
Balanced Search Trees Fundamental Data Structures and Algorithms Margaret Reid-Miller 3 February 2005.
B-Trees and Red Black Trees. Binary Trees B Trees spread data all over – Fine for memory – Bad on disks.
COSC 2007 Data Structures II Chapter 15 External Methods.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Starting at Binary Trees
A Study of Balanced Search Trees: Brainstorming a New Balanced Search Tree Anthony Kim, 2005 Computer Systems Research.
Symbol Tables and Search Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
1 Searching Searching in a sorted linked list takes linear time in the worst and average case. Searching in a sorted array takes logarithmic time in the.
Fall 2006 CSC311: Data Structures 1 Chapter 10: Search Trees Objectives: Binary Search Trees: Search, update, and implementation AVL Trees: Properties.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
3.1. Binary Search Trees   . Ordered Dictionaries Keys are assumed to come from a total order. Old operations: insert, delete, find, …
Internal and External Sorting External Searching
Exam 3 Review Data structures covered: –Hashing and Extensible hashing –Priority queues and binary heaps –Skip lists –B-Tree –Disjoint sets For each of.
1 Binary Search Trees   . 2 Ordered Dictionaries Keys are assumed to come from a total order. New operations: closestKeyBefore(k) closestElemBefore(k)
B-Trees Katherine Gurdziel 252a-ba. Outline What are b-trees? How does the algorithm work? –Insertion –Deletion Complexity What are b-trees used for?
ITEC 2620M Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: ec2620m.htm Office: TEL 3049.
Multiway Search Trees Data may not fit into main memory
Presentation transcript:

Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees

Problem  Design an embedded device that implements a dictionary  Element = key & value  Operations  Search  Insert  Delete

 Goals:  Predictable running time for searches  Minimize amount of memory needed  Model:  Memory is allocated in blocks of one fixed size (e.g., 32 words)  In one cycle, can load and analyze a block  Pointers, keys and values each fit in one word

Naïve solutions  Sorted array  2n space (optimal)  search takes log 2 n loads  updates take ϴ (n) loads/stores  Hash table with linear probing  only expected running time  usually wastes 25-50% of space to avoid collisions k1k1 k2k2 k3k3 k4k4 k5k5 k6k6 k7k7 v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 k8k8 v8v8

Naïve solutions  Balanced BST  searches take log 2 n loads  updates take ϴ (log 2 n) loads/stores  but 50% of space is used for pointers d, v d b, v b f, v f a, v a c, v c e, v e g, v g

B-trees  All leaves have same depth, ϴ (log b n)  Root is a leaf or has between 2 and b children  Every non-root leaf contains between b/2 and b keys  Every non-root internal node has between b/2 and b children b = 8 iqy---- abcefghjklmnoprstuvwxzABCDF-

B-trees  Worst case:  Root has degree 2  Other internal nodes have b/2 children  Each leaf contains b/2 keys  ~50% of space is unused b = 8 iqw---- bdf----kmo----suv----wxz----

Leaf-oriented trees  All elements (keys & values) are stored in leaves  Internal nodes store pointers and routing keys, which direct searches to the correct leaf  Leaves store no pointers  Internal nodes store no values

Nodes in a leaf oriented B-tree  Leaf node  k i is a key, v i is its associated value  Internal node  p i is a child pointer k1k1 k2k2 k3k3 k4k4 k5k5 k6k6 k7k7 v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 k8k8 v8v8 16 words k1k1 k2k2 k3k3 k4k4 k5k5 k6k6 k7k7 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 p7p7 p8p8 16 words (1 wasted) b=8

Analysis of space complexity  Number of words of memory needed to store a dictionary containing n elements  These results are from the analysis

Analysis of space complexity  Number of words of memory needed to store a dictionary containing n elements  These results are from the analysis Root has degree 2 Every other internal node has degree b/2 Every leaf has b/2 keys Root has degree 2 Every other internal node has degree b/2 Every leaf has b/2 keys

Space efficient B-tree variants  Paper discusses many variants  B*-trees, Generalized B*-trees, B+trees with partial expansions, strongly dense multiway trees, compact B- trees, overflow trees, H-trees  Big problems:

Space efficient B-tree variants  Paper discusses many variants  B*-trees, Generalized B*-trees, B+trees with partial expansions, strongly dense multiway trees, compact B- trees, overflow trees, H-trees  Big problems: no deletion

Space efficient B-tree variants  Paper discusses many variants  B*-trees, Generalized B*-trees, B+trees with partial expansions, strongly dense multiway trees, compact B- trees, overflow trees, H-trees  Big problems: no deletion, multiple node sizes

Overflow trees  Satisfies B-tree properties  The leaves under each parent are partitioned into one or more groups  Each group gets one overflow leaf that contains between 0 and b keys/pointers  Every non-overflow leaf contains at least b-3 keys  Family with poor space complexity:  Root has degree 2, every other internal node has degree b/2, and every leaf contains b-3 keys, with one empty overflow node per group of b/2 leaves

Overflow trees  Satisfies B-tree properties  The leaves under each parent are partitioned into one or more groups  Each group gets one overflow leaf that contains between 0 and b keys/pointers  Every non-overflow leaf contains at least b-3 keys  Family with poor space complexity:  Root has degree 2, every other internal node has degree b/2, and every leaf contains b-3 keys, with one empty overflow node per group of b/2 leaves

H-trees  Satisfies B-tree properties  Each leaf contains at least b-3 keys  Each internal node has 0 grandchildren or at least b 2 /2 grandchildren  Family with poor space complexity:  Root has degree 2, every non-root internal node has degree b/ √ 2, and every leaf contains b-3 keys

H-trees  Satisfies B-tree properties  Each leaf contains at least b-3 keys  Each internal node has 0 grandchildren or at least b 2 /2 grandchildren  Family with poor space complexity:  Root has degree 2, every non-root internal node has degree b/ √ 2, and every leaf contains b-3 keys

B-slack trees (where b > 4)  P1: All leaves have the same depth  P2: Internal nodes have between 2 and b children  P3: Leaves contain between 0 and b keys  P4: a constraint on slack

Slack in a node  Leaf node  Slack is the number of unused spaces for keys  Internal node  Slack is the number of unused pointers k1k1 k2k2 k3k3 v1v1 v2v2 v3v3 5 slack k1k1 k2k2 k3k3 k4k4 k5k5 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 2 slack

B-slack trees (where b > 4)  P1: All leaves have the same depth  P2: Internal nodes have between 2 and b children  P3: Leaves contain between 0 and b keys  P4: For each internal node u, the total slack contained in the children of u is at most b – 1  P4 distinguishes B-slack trees from other B-tree variants  It limits aggregate space wasted by a number of nodes, instead limiting each node’s wasted space

Understanding P4  Example: b = 8, each node contains 4 slack  Children of the root contain a total of 16 slack  P4 says they can have at most b - 1 = 7 slack iqw bdfgkmopstuvwxyz

Understanding P4  Example: b = 8, each node contains 4 slack  Children of the root contain a total of 16 slack  P4 says they can have at most b - 1 = 7 slack  So, this is not a B-slack tree iqw bdfgkmopstuvwxyz

Understanding P4  Example: b = 8  Children of the root contain a total of 5 slack  P4 says they can have at most b - 1 = 7 slack  This is a B-slack tree iq bdfghijklmnoqrstuvp

Understanding P4  Example: b = 8  Children of the root contain a total of 5 slack  P4 says they can have at most b - 1 = 7 slack  This is a B-slack tree iq bdfghijklmnoqrstuvp

Understanding P4  Example: b = 8  Children of the root contain a total of 7 slack  P4 says they can have at most b - 1 = 7 slack  This is a B-slack tree iq aijklmnoqrstuvxpz

Understanding P4  Example: b = 8  Children of the root contain a total of 7 slack  P4 says they can have at most b - 1 = 7 slack  This is a B-slack tree iq aijklmnoqrstuvxpz

P4 implies large average degree  Example: b = 8, height = 2  What is the smallest possible number of pointers/keys at each level? o afimqsu 7 total slack (so 2*8-7 = 9 pointers) 7 total slack (so 5*8-7 = 33 keys) 7 total slack (so 4*8 - 7 = 25 keys) 2 pointers per node 3.5 pointers per node 6.4 keys per node

B-slack trees (where b > 4)  P1: All leaves have the same depth  P2: Internal nodes have between 2 and b children  P3: Leaves contain between 0 and b keys  P4: For each internal node u, the total slack contained in the children of u is at most b – 1  Family with worst case space complexity:  Root has degree 2, the total slack contained in the children of each internal node is exactly b - 1  Worse than possible: total slack contained in the children of each internal node is exactly b

B-slack trees (where b > 4)  P1: All leaves have the same depth  P2: Internal nodes have between 2 and b children  P3: Leaves contain between 0 and b keys  P4: For each internal node u, the total slack contained in the children of u is at most b – 1  Family with worst case space complexity:  Root has degree 2, the total slack contained in the children of each internal node is exactly b - 1  Worse than possible: total slack contained in the children of each internal node is exactly b

Relaxed B-slack trees  Idea: decouple rebalancing from insertion/deletion  Insertion and deletion are simple  All updates make small, localized changes  Any relaxed B-slack tree can be transformed into a B-slack tree by rebalancing  Rebalancing can be deferred

Relaxed B-slack trees  Goal:  Insertion and deletion routines that maintain a relaxed B-slack tree  Rebalancing steps that can turn any relaxed B-slack tree into a B-slack tree  Obtaining a B-slack tree:  After inserting or deleting, perform rebalancing steps until no rebalancing step is applicable

Nodes in a relaxed B-slack tree  Each node is given a weight of 0 or 1  Serves a similar purpose to the colours red & black in a red-black tree  Relaxed depth is one less than the sum of weights on a root-to-leaf path

Properties of relaxed B-slack trees  P0 ' : Every node with weight 0 has 2 children  P1 ' : All leaves have the same relaxed depth  P2 ' : Internal nodes have between 1 and b children  P3: Leaves contain between 0 and b keys  P4: For each internal node u, the total slack contained in the children of u is at most b – 1

Properties of relaxed B-slack trees  Three types of violations in a relaxed B-slack tree can be removed by rebalancing steps to produce a B-slack tree  A weight violation occurs at a node with weight zero (violating P1)  A degree violation occurs at an internal node that has only one child (violating P2)  A slack violation occurs at an internal node whose children contain a total of b or more slack (violating P4) P1 ' : All leaves have the same relaxed depth P2 ' : Internal nodes have between 1 and b children

Updates to relaxed B-slack trees  Insertion and deletion routines that preserve P0 ', P1 ', P2 ' and P3

Rebalancing: fix a weight violation Root-Zero Absorb Split

Rebalancing: fix a degree violation Root-Replace One-Child k ≥ 2 children with total slack s < b, and some child has ONE pointer Evenly distribute the keys & pointers

Rebalancing: fix a slack violation Compress k ≥ 2 children with total slack s ≥ b remove children until s < b and evenly distribute keys & pointers

Amortized complexity of rebalancing  Starting from a B-slack tree containing n keys, do i insertions and d deletions  The resulting relaxed B-slack tree will be transformed into a B-slack tree after at most rebalancing steps  O(log(n+i)) rebalancing steps per insertion  O(1/b) rebalancing steps per deletion

Space complexity  Number of words needed to store n > b 3 elements is less than 2n b/(b-3)  Close to the optimal 2n  More complicated bounds are known; they are much better when b is small

Experiments  B-slack tree implemented in Java  Experiments performed random operations (50% insertion and 50% deletion) on keys drawn uniformly from a fixed range

Experiments  Prefilling phase  Perform updates until the tree is approximately half full (i.e., contains approximately half of the keys in the key range)  Measurement phrase  Perform one million updates, and record: Number of rebalancing steps performed Number of nodes in the tree at the end Number of keys in the dictionary at the end

Experiment 1: small tree  B-slack tree with b = 16  Key range [0,2 12 ) = [0,4096)  Measurements:  0.6 rebalancing steps per update  Average degree of nodes was 15.4 (lower bound = 12.7, optimal = 16)  Space complexity 2.226n (upper bound = 2.727n, optimal = 2n)

Experiment 2: big tree  B-slack tree with b = 32  Key range [0,2 20 ) = [0, )  Measurements:  1.1 rebalancing steps per update  Average degree of nodes was 31.5 (lower bound = 30.8, optimal = 32)  Space complexity 2.097n (upper bound = 2.144n, optimal = 2n)

Rebalancing histogram number of rebalancing steps per insertion or deletion

Other experiments  Performed experiments for  b = 8, 16, 32  key range sizes 2 5, 2 6, 2 7, …, 2 20  workloads: 50% ins 50% del 90% ins 10% del 10% ins 90% del  Results  Average degrees always approximately b  At most 1.2 rebalancing steps per update

Improving rebalancing complexity  Can greatly improve amortized complexity of rebalancing at a small space cost by changing P4: For each internal node u, the total slack contained in the children of u is at most b – 1 + degree(u)  O(1) amortized rebalancing complexity  B-slack tree containing n elements requires less than 2n b/(b-4) words

Conclusion  B-slack trees have  Excellent worst-case space complexity  Even better space complexity in practice  Amortized logarithmic insertion and deletion  Only one node size  Well suited for hardware implementation  Rebalancing can be improved to amortized constant per update with a small increase in space  Easy to obtain good concurrent implementation of relaxed B-slack tree