B-trees Eduardo Laber David Sotelo. What are B-trees? Balanced search trees designed for secondary storage devices Similar to AVL-trees but better at.

Slides:



Advertisements
Similar presentations
Comp 122, Spring 2004 Binary Search Trees. btrees - 2 Comp 122, Spring 2004 Binary Trees  Recursive definition 1.An empty tree is a binary tree 2.A node.
Advertisements

S. Sudarshan Based partly on material from Fawzi Emad & Chau-Wen Tseng
Tree Data Structures &Binary Search Tree 1. Trees Data Structures Tree  Nodes  Each node can have 0 or more children  A node can have at most one parent.
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
AA Trees another alternative to AVL trees. Balanced Binary Search Trees A Binary Search Tree (BST) of N nodes is balanced if height is in O(log N) A balanced.
4.5 AVL Trees  A tree is said to be balanced if for each node, the number of nodes in the left subtree and the number of nodes in the right subtree differ.
6/14/2015 6:48 AM(2,4) Trees /14/2015 6:48 AM(2,4) Trees2 Outline and Reading Multi-way search tree (§3.3.1) Definition Search (2,4)
TCSS 342 AVL Trees v1.01 AVL Trees Motivation: we want to guarantee O(log n) running time on the find/insert/remove operations. Idea: keep the tree balanced.
1 Trees. 2 Outline –Tree Structures –Tree Node Level and Path Length –Binary Tree Definition –Binary Tree Nodes –Binary Search Trees.
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
1 Theory I Algorithm Design and Analysis (3 - Balanced trees, AVL trees) Prof. Th. Ottmann.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
AVL Trees ITCS6114 Algorithms and Data Structures.
Data Structures Using C++ 2E Chapter 11 Binary Trees and B-Trees.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
E.G.M. PetrakisB-trees1 Multiway Search Tree (MST)  Generalization of BSTs  Suitable for disk  MST of order n:  Each node has n or fewer sub-trees.
Tirgul 6 B-Trees – Another kind of balanced trees.
More Trees COL 106 Amit Kumar and Shweta Agrawal Most slides courtesy : Douglas Wilhelm Harder, MMath, UWaterloo
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
1 Multiway trees & B trees & 2_4 trees Go&Ta Chap 10.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.
Different Tree Data Structures for Different Problems
More Trees Multiway Trees and 2-4 Trees. Motivation of Multi-way Trees Main memory vs. disk ◦ Assumptions so far: ◦ We have assumed that we can store.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Chapter 19: Binary Trees. Objectives In this chapter, you will: – Learn about binary trees – Explore various binary tree traversal algorithms – Organize.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
B-Trees And B+-Trees Jay Yim CS 157B Dr. Lee.
B + -Trees Same structure as B-trees. Dictionary pairs are in leaves only. Leaves form a doubly-linked list. Remaining nodes have following structure:
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Starting at Binary Trees
1 Trees 4: AVL Trees Section 4.4. Motivation When building a binary search tree, what type of trees would we like? Example: 3, 5, 8, 20, 18, 13, 22 2.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
Arboles B External Search The algorithms we have seen so far are good when all data are stored in primary storage device (RAM). Its access is fast(er)
Search Trees: BSTs and B-Trees David Kauchak cs161 Summer 2009.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
© 2004 Goodrich, Tamassia Trees
Chapter 7 Trees_Part3 1 SEARCH TREE. Search Trees 2  Two standard search trees:  Binary Search Trees (non-balanced) All items in left sub-tree are less.
+ David Kauchak cs312 Review. + Midterm Will be posted online this afternoon You will have 2 hours to take it watch your time! if you get stuck on a problem,
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Prof. Sumanta Guha Slide Sources: CLRS “Intro.
Binary Search Trees (BSTs) 18 February Binary Search Tree (BST) An important special kind of binary tree is the BST Each node stores some information.
Binary Search Trees (BST)
Internal and External Sorting External Searching
CSE 2331/5331 Topic 8: Binary Search Tree Data structure Operations.
B-Trees Katherine Gurdziel 252a-ba. Outline What are b-trees? How does the algorithm work? –Insertion –Deletion Complexity What are b-trees used for?
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
CSC317 1 Binary Search Trees (binary because branches either to left or to right) Operations: search min max predecessor successor. Costs? Time O(h) with.
ITEC 2620M Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: ec2620m.htm Office: TEL 3049.
B-Tree Michael Tsai 2017/06/06.
(2,4) Trees (2,4) Trees 1 (2,4) Trees (2,4) Trees
Balanced-Trees This presentation shows you the potential problem of unbalanced tree and show two way to fix it This lecture introduces heaps, which are.
Balanced-Trees This presentation shows you the potential problem of unbalanced tree and show two way to fix it This lecture introduces heaps, which are.
B-Tree Presenter: Jun Tao.
CSE2331/5331 Topic 7: Balanced search trees Rotate operation
Presentation transcript:

B-trees Eduardo Laber David Sotelo

What are B-trees? Balanced search trees designed for secondary storage devices Similar to AVL-trees but better at minimizing disk I/O operations Main data structure used by DBMS to store and retrieve information

What are B-trees? Nodes may have many children (from a few to thousands) Branching factor can be quite large Every B-tree of n keys has height O(log n) In practice, its height is smaller than the height of an AVL-Tree

B-trees and Branching Factor

Definition of B-trees B-tree is a rooted tree containing the following five properties: 1.Every node x has the following attributes: a) x.n, the number of keys stored in node x b)The x.n keys: x.key 1 ≤ x.key 2 ≤... ≤ x.key x.n c)The boolen x.leaf indicating if x is a leaf or an internal node

Definition of B-trees 2.If x is an internal node it contains x.n + 1 pointers x.p 1, x.p 2,..., x.p (x.n + 1) to its children 3.The keys x.key i separate ranges of trees stored in each subtree (x.p i, x.p i+1 ) 4.All leaves have the same depth == tree’s height.

Definition of B-trees 5.Bounds on the number of keys of a node: Let B be a positive integer representing the order of the B-tree. Every node (except the root) must have at least B keys. Every node (except the root) must have at most 2B keys. Root is free to contain between 1 and 2B nodes (why?)

Example of a B-tree

Exercise 1 Enumerate all valid B-trees of order 2 that represent the set {1, 2,..., 8}

Exercise 1 Solution:

The height of a B-tree Theorem : Let h be the height of a B-tree of n keys and order B > 1. Then: h ≤ log B (n+1)/2 Proof: Root contains at least one key. All other nodes contain at least B keys At least one key at depth 0 At least 2B keys at depth 1 At least 2B 2 + B keys at depth 2 At least 2B i + B i-1 + B i B keys at depth i

Proof (continued) ■

Searching a B-tree Similar to searching a binary search tree. Multiway branching decision according to the number of the node’s chidren. Recursive procedure with a time complexity of O(B log B n) for a tree of n keys and order B.

Searching a B-tree B-TREE-SEARCH (x, k) 1 i = 1 2 while i ≤ x.n and k > x.key i do i = i if i ≤ x.n and k == x.key i then return (x, i) 4 if x.leaf then return NIL 5 else DISK-READ(x.p i ) return B-TREE-SEARCH (x.p i, k)

Searching a B-tree J J A B N O H I Q R D E F L M T U K P S C G Search for the key F

Searching a B-tree J J A B N O H I Q R D E F L M T U K P S C G Search for the key F

Searching a B-tree J J A B N O H I Q R D E F L M T U K P S C G Search for the key F

Searching a B-tree J J A B N O H I Q R D E F L M T U K P S C G Search for the key N

Searching a B-tree Lemma: The time complexity of procedure B-TREE-SEARCH is O(B log B n) Proof: Number of recursive calls is equal to tree’s height. The height of a B-tree is O(log B n) Cost between B and 2B iterations per call. Total of O(B log B n) steps. ■

Exercise 2 Suppose that B-TREE-SEARCH is implemented to use binary search rather than linear search within each node. Show that this changes makes the time complexity O(lg n), independently of how B might be chosen as a function of n.

Exercise 2 Solution: By using binary search the number of steps of the algorithm becomes O(lg B log B n). Observe that log B n = lg n / lg B. Therefore O(lg B log B n) = O(lg n).

Linear or Binary B-tree search ? Lemma: If 1 < B < n then lg n ≤ B log B n Proof:

Inserting a key into a B-tree The new key is always inserted into an existing leaf node (why?) Firstly we search for the leaf position at which to insert the new key. If such a node is full we split it. A split operation splits a full node around its median key into two nodes having B keys each. Median key moves up into splitted node’s parent (insertion recursive call).

Split operation Inserting key F into a full node (B = 2) A C E G K M O Q J J

Split operation Node found but already full A C E F G K M O Q J J

Split operation Median key identified A C E F G K M O Q J J

Split operation Splitting the node A C K M O Q E J F G

Inserting a key into a B-tree Insertion can be propagated upward (B = 2) A CK M O Q E J T X F G U W Y Z

Inserting a key into a B-tree Insertion can be propagated upward (B = 2) A CK M N O Q E J T X F G U W Y Z

Inserting a key into a B-tree Insertion can be propagated upward (B = 2) A CK M E J N T X F GU WY ZO Q SPLIT

Inserting a key into a B-tree Insertion can be propagated upward (B = 2) A CK MF G E J N O QY ZU W T X SPLIT

Inserting a key into a B-tree B-TREE-INSERT (x, k, y) 1 i = 1 2 while i ≤ x.n and k < x.key i do i = i x.n = x.n x.key i = k 5 x.p i+1 = y 6 for j = x.n downto i+1 do 7 x.key j = x.key j-1 8 x.p j = x.p j-1 9 end-for 10 DISK-WRITE(x)

Inserting a key into a B-tree B-TREE-INSERT (x, k) 11 if x.n > 2*B then 12 [m, z] = SPLIT (x) 13 if x.parent != NIL then 14 DISK-READ (x.parent) 15 end-if 16 else 17 x.parent = ALLOCATE-NODE() 18 DISK-WRITE (x) 19 root = x.parent 20 end-else 21 B-TREE-INSERT (x.parent, m, z) 22 end-if

Inserting a key into a B-tree SPLIT (x) 1 z = ALLOCATE-NODE() 2 m = FIND-MEDIAN (x) 3 COPY-GREATER-ELEMENTS(x, m, z) 4 DISK-WRITE (z) 5 COPY-SMALLER-ELEMENTS(x, m, x) 6 DISK-WRITE (x) 7return [m, z]

Inserting a key into a B-tree Function B-TREE-INSERT has three arguments: – The node x at which an element of key k should be inserted – The key k to be inserted – A pointer y to the left child of k to be used as one of the pointers of x during insertion process. There is a global variable named root which is a pointer to the root of the B-Tree. Observe that the field x.parent was not defined as an original B-tree attribute, but is considered just to simplify the process. The fields x.leaf should also be updated accordingly.

Inserting a key into a B-tree Lemma: The time complexity of B-TREE-INSERT is O(B log B n) Proof: Recall that B-TREE-SEARCH function is called first and costs O(log n) by using binary search. Then, B-TREE-INSERT starts by visiting a node and proceeds upward. At most one node is visited per level/depth and only visited nodes can be splitted. A most one node is created during the insertion process. Cost for splitting is proportional to 2B. Number of visited nodes is equal to tree’s height and the height of a B-tree is O(log B n). Cost between B and 2B iterations per visited node. Total of O(B log B n) steps. ■

Some questions on insertion Which split operation increases the tree’s height? The split of the root of the tree. How many DISK-READ operations are executed by the insertion algorithm? Every node was read at least twice. Does binary search make sense here? Not exactly. We already pay O(B) to split a node (for finding the median).

Drawbacks of our insertion method Once that the key’s insertion node is found it may be necessary to read its parent node again (due to splitting). DISK-READ/WRITE operations are expensive and would be executed al least twice for each node in the key’s path. It would be necessary to store a nodes’s parent or to use the recursion stack to keep its reference. (Mond and Raz, 1985) provide a solution that spends one DISK-READ/WRITE per visited node (See at CLRS)

Exercise 3 Show the results of inserting the keys E, H, B, A, F, G, C, J, D, I in order into an empty B-tree of order 1.

Exercise 3 Solution: (final configuration) G I FC DHJA B E

Exercise 4 Does a B-tree of order 1 is a good choice for a balanced search tree? What about the expression h ≤ log B (n+1)/2 when B = 1?

Deleting a key from a B-tree Analogous to insertion but a little more complicated. A key can be deleted from any node (not just a leaf) and can affect its parent and its children (insertion operation just affect parents). One must ensure that a node does not get to small during deletion (less than B keys). As a result deleting a node is the most complex operation on B-trees. It will be considered in 4 particular cases.

Deleting a key from a B-tree Case 1: The key is in a leaf node with more than B elements. Procedure:  Just remove the key from the node.

Deleting a key from a B-tree Case 1: The key is in a leaf node with more than B elements (B = 2) A C DK MF G E J N O QY ZU W T X

Deleting a key from a B-tree Case 1: The key is in a leaf node with more than B elements (B = 2) A DK MF G E J N O QY ZU W T X

Deleting a key from a B-tree Case 2: The join procedure The key k 1 to be deleted is in a leaf x with exactly B elements. Let y be a node that is an “adjacent brother” of x. Suppose that y has exactly B elements. Procedure:  Remove the key k 1.  Let k 2 be the key that separates nodes x and y in their parent.  Join the the nodes x and y and move the key k 2 from the parent to the new joined node.  If the parent of x becomes with B-1 elements and also has an “adjacent brother” with B elements, apply the join procedure recursively for the parent of x (seen as x) and its adjacent brother (seen as y).

Deleting a key from a B-tree Case 2: Delete key Q (B = 2) H I F O QY ZU W K T X...

Deleting a key from a B-tree Case 2: Delete key Q (B = 2) H I F O QY ZU W K T X Node x Node y Parent...

Deleting a key from a B-tree Case 2: Delete key Q (B = 2) H I F OY ZU W K T X Node x Node y Parent...

Deleting a key from a B-tree Case 2: Delete key Q (B = 2) H I F OY ZU W K T X Node x Node y Parent...

Deleting a key from a B-tree Case 2: Delete key Q (B = 2) H I F Y ZO T U W K X... Parent Join

Deleting a key from a B-tree Case 2: Delete key Q (B = 2) H I F Y ZO T U W K X...

Deleting a key from a B-tree Case 3: join and split The key k 1 to be deleted is in a leaf x with exactly B elements. Let y be a node that is an “adjacent brother” of x. Suppose that y has more than B elements. Procedure:  Remove the key k 1.  Let k 2 be the key that separates nodes x and y in their parent.  Join the the nodes x and y and move the key k 2 from the parent to the new joined node z.  Find the median key m of z  Determine the new nodes x and y by splitting z around m.  Insert m into the parent of x and y.

Deleting a key from a B-tree Case 3: Delete key F (B = 2) A C DK MF G E J N O QY ZU W T X

Deleting a key from a B-tree Case 3: Delete key F (B = 2) A C DK MF G E J N O QY ZU W T X

Deleting a key from a B-tree Case 3: Delete key F (B = 2) A C DK MG E J N O QY ZU W T X

Deleting a key from a B-tree Case 3: Delete key F (B = 2) A C DK MG E J N O QY ZU W T X Parent Node x Node y

Deleting a key from a B-tree Case 3: Delete key F (B = 2) A C DK MG E J N O QY ZU W T X Parent Node x Node y

Deleting a key from a B-tree Case 3: Delete key F (B = 2) A C D E GK M J N O QY ZU W T X Join

Deleting a key from a B-tree Case 3: Delete key F (B = 2) A C D E GK M J N O QY ZU W T X Median key

Deleting a key from a B-tree Case 3: Delete key F (B = 2) A CK M D J N O QY ZU W T X Split E G

Deleting a key from a B-tree Case 3: Delete key F (B = 2) A CK M D J N O QY ZU W T X E G

Deleting a key from a B-tree Case 4: internal node The key k 1 to be deleted is in a node x that is not a leaf or a root. Procedure:  Let k 2 be the smallest key that is greater than k 1.  Let y be the node of k 2, which will be a leaf.  Insert key k 2 into x.  Remove the key k 1 from x.  Solve now the problem of removing k 2 from a leaf y, previously considered.

Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QY ZU W T X E G

Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QY ZU W T X E G Node x

Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QY ZU W T X E G Node x Node y

Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QY ZW U X E G Node x Node y

Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QY ZW U X E G Node x Node y

Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QY ZW U X E G Node x Node y

Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QY ZW U X E G Node x Node y

Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QW X Y Z U E G Node x Node y

Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QW X Y Z U E G Node x Node y

Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QW X Y Z U E G Node x Node y

Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QW X Y Z U E G Node x Node y

Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QW X Y Z U E G Node x Node y

Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N U O Q W X Y Z E G