Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Similar presentations


Presentation on theme: "Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures."— Presentation transcript:

1 Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures

2 Spring 2006 Copyright (c) All rights reserved Leonard Wesley1 Why B-Trees?  Trees studied so far are for storing data in memory  B-Trees are better suited for storing data in memory AND on secondary storage.  Better suited for balancing data than some other three ADTs.  Can store multiple keys with the same value, unlike some other trees, such as AVL trees.

3 Spring 2006 Copyright (c) All rights reserved Leonard Wesley2 The Problem With Unbalanced Trees 1 2 3 4 5 The levels are sparsely filled resulting in deep paths. This defeats the purpose of binary trees

4 Spring 2006 Copyright (c) All rights reserved Leonard Wesley3 Possible Solutions To Unbalanced Trees  Periodically balance the tree  Don’t let a tree get too unbalanced when inserting or deleting AVL Trees: Sometimes called HB[1] trees. Invented by Adel’son-Vel’skii and Landis ~early 1960s. (an in- memory solution … not ideally suited secondary storage) B-Trees: Proposed by R. Bayer & E.M. Creight (see pg. 542 Main & Savitch for ref.)

5 Spring 2006 Copyright (c) All rights reserved Leonard Wesley4 What Is A B-Tree?  It is a type of “multiway” tree.  It is NOT a binary search tree, nor is it a binary tree.  It provides a fast way to index into a multi- level set of nodes.  Each node in the B-Tree contains a sorted array of key values.

6 Spring 2006 Copyright (c) All rights reserved Leonard Wesley5 Motivation For Multiway Tree  Secondary storage (e.g., disks) is typically divided into equal- sized blocks (e.g., 512, 1024, …, 4096, …)  The basic I/O operation reads and writes blocks rather than single bytes at a time between secondary storage and memory.  Goal is to devise a multiway search tree that will minimize file access by exploiting disk reads.  Each access to secondary storage is approximately equal to 250K instructions … depending on the speed of the CPU

7 Spring 2006 Copyright (c) All rights reserved Leonard Wesley6 ISAM  ISAM = Indexed Sequential Access Method.

8 Spring 2006 Copyright (c) All rights reserved Leonard Wesley7 ISAM: The Idea Disk Platter Block 512, 1024, …bytes Track

9 Spring 2006 Copyright (c) All rights reserved Leonard Wesley8 ISAM: Index & Keys Block Key Data All data in the block will have keys ≤ the block key, or have keys ≥ the block key. Pick one inequality and stick with it. A Block on a track. Block #

10 Spring 2006 Copyright (c) All rights reserved Leonard Wesley9 ISAM: Block Index Block Index Block #Key This index could be stored in memory 0G 1K 2N

11 Spring 2006 Copyright (c) All rights reserved Leonard Wesley10 ISAM: Disk Index Disk #Key This index could be stored in memory also 0G 1V 2X Disk 0 Disk n

12 Spring 2006 Copyright (c) All rights reserved Leonard Wesley11 ISAM: Insertion/Deletion  Insertion: Might involve moving data across blocks Can leave extra space when inserting into a block  Deletion: Might involve contracting data across blocks Need not contract every time, i.e., leave some space for possible future expansion

13 Spring 2006 Copyright (c) All rights reserved Leonard Wesley12 Multiway Search Tree (order m )  A generalization of a binary search trees.  Each node has at most m children. If k <=m is the number of children, then the node has exactly k-1 keys. The tree is ordered.

14 Spring 2006 Copyright (c) All rights reserved Leonard Wesley13 Multiway Search Tree (cont.) keys < k1 k2 < keys < k3 k5 < keys k1 k2 k3 k4 k5 Nodes in a multiway tree

15 Spring 2006 Copyright (c) All rights reserved Leonard Wesley14 Definition Of A B-Tree  A B-Tree of order m is a m -way tree such that All leaves are on the same level All internal nodes except the root node are constrained to have at most m non-empty children and at least m /2 non-empty children. The root node has at most m non-empty children

16 Spring 2006 Copyright (c) All rights reserved Leonard Wesley15 Three Important Properties Of B-Trees  All nodes in the B-Tree are at least half-full (root node is an exception at times)  The B-tree is always balanced. That is, an identical number of nodes must be read into memory in order to locate all keys at any given level in the tree.  A well organized B-Tree will have just a small number of levels relative to the number of nodes.

17 Spring 2006 Copyright (c) All rights reserved Leonard Wesley16 Where are B-Tree Used?  B-Trees are commonly found in database and file systems.  B-Trees allow logarithmic time insertions and deletions.  They generally grow from the bottom upwards as elements are inserted, whereas most binary trees grow downward.

18 Spring 2006 Copyright (c) All rights reserved Leonard Wesley17 The Six Rules Governing B-Trees  R1: A B-Tree might be empty, if not, then each node has some specified MINIMUM number of entries in each node.  R2: The MAXIMUM number of entries is twice the MINIMUM.

19 Spring 2006 Copyright (c) All rights reserved Leonard Wesley18 The Six Rules Governing B-Trees (cont)  R3: The entries of each B-Tree node are stored in a partially filled array, sorted from the smallest entry (at index 0) to the largest entry (at the final position of the array). hkk*n.... 0n-1 The data in such an array can be stored in a block on a disk B-Tree node * B-Trees can support duplicate keys

20 Spring 2006 Copyright (c) All rights reserved Leonard Wesley19 The Six Rules Governing B-Trees (cont)  R4: The number of subtrees below a non-leaf node is always one more than the number of entries in the node. 45556782 4 entries in a non-leaf node Keys < 45 Keys > 45 & < 55 Keys > 55 & < 67 Keys > 67 & < 82 Keys > 82 5 subtrees subtree 0 subtree 1 subtree 2 subtree 3 subtree 4 0 1 2 3

21 Spring 2006 Copyright (c) All rights reserved Leonard Wesley20 The Six Rules Governing B-Trees (cont)  R5: For any non-leaf node: An entry at index i is greater than all the entries in subtree i of the node, and An entry at index i is less than all the entries at entry i+1 of the node.  R6: Every leaf node in a B-Tree has the same depth (i.e., at the same level)

22 Spring 2006 Copyright (c) All rights reserved Leonard Wesley21 Example B-Tree MIN = 1 MAX = 2 30 80 50 60 35 40 20 90 95 72 82 85 55 25 10

23 Spring 2006 Copyright (c) All rights reserved Leonard Wesley22 Searching For A Target In B-Trees  Start with root node and search for target in the array at that node. If found, then done and return success.  If the target is not in the root and there are no children, then also done, but return failure.  If the target is not in the root node, and there are children, then if the target exists, then it can only be in one subtree.  Compare the target with the listed keys and traverse first subtree i for which target is < key_array[i] … while search key_array from left to right … up to data_count. Repeat the process at the new root node

24 Spring 2006 Copyright (c) All rights reserved Leonard Wesley23 Inserting Into A B-Tree Add the new key to the appropriate leaf node Split the node into two nodes on the same level, and promote the median key Overflow? Yes No

25 Spring 2006 Copyright (c) All rights reserved Leonard Wesley24 Loose Insertion (pg. 551 Maini & Savitch, one of several ways) 617 1922 4 MIN = 1 MAX = 2 12 Insert 18 6 | 17 4 12 18 | 19 | 22 Excess Entry (problem child)

26 Spring 2006 Copyright (c) All rights reserved Leonard Wesley25 Fixing A Loose Insertion 6, 17, 19 4 12 22 18 Split problem child, and promote middle key to parent node. Still have excess. 6 4 12 22 18 17 19 Fix excess by repeating the process. Split node and promote middle key to new root node. MIN = 1 MAX = 2

27 Spring 2006 Copyright (c) All rights reserved Leonard Wesley26 Pseudo Code For Loose Insert 1.Make a local variable, i, equal to the first index such that data[i] is not less than the new entry to insert. If there is no such index, then set i equal to data_count, indicating that all of the entries are less than the target. 2. If (we found the new entry at data[i]) a)Return false with no further work (since the new entry is already in the tree) else if (the root has no children) b) Add the new entry to the root at data[i]. The original entries at data[i] and afterwards must be shifted right to make room for the new entry. Return to indicate that we added the entry. else c) Save the value from this recursive call: subset[i]->loose_insert(entry); Then check whether the root of subset[i] now has an excess entry; if so, then fix that problem. Return the saved value from the recursive call.

28 Spring 2006 Copyright (c) All rights reserved Leonard Wesley27 Insert In Class Exercise 617 1922 4 MIN = 1 MAX = 2 12  Insert 5, then insert 7.

29 Spring 2006 Copyright (c) All rights reserved Leonard Wesley28 Deleting From A B-Tree

30 Spring 2006 Copyright (c) All rights reserved Leonard Wesley29 Deleting From A B-Tree Example #1 6, 17 4 12 19, 22 Delete 17 6 4 12 19, 22 Violates # subtrees = # keys +1 B-Tree Rule 4 Min = 1 Max = 2

31 Spring 2006 Copyright (c) All rights reserved Leonard Wesley30 Solution To Example #1 6, 19 4 12 22 Min = 1 Max = 2

32 Spring 2006 Copyright (c) All rights reserved Leonard Wesley31 Deleting From A B-Tree Example #2 6, 17 2, 4 10, 12 19, 22 Delete 22 10, 12 19 Violates # keys !< MIN B-Tree Property Min = 2 Max = 4 6, 17 2, 4

33 Spring 2006 Copyright (c) All rights reserved Leonard Wesley32 Solution #1 For Example #2 2, 4 10, 12 19 Min = 2 Max = 4 6, 17 2, 4 10, 12, 17, 19 6 Case 3 Solution: combine subset [i] with subset[i-1] If excess entries in siblings are not available pg. 561 Main & Savitch

34 Spring 2006 Copyright (c) All rights reserved Leonard Wesley33 Solution #2 To Fix A Shortage  Case 1: Transfer an extra entry from subset[i-1] to subset[i] pg 560 Main & Savitch 2, 4 10, 12, 15 19 6, 17 2, 4 10, 12 17, 19 6, 15 Min = 2 Max = 4

35 Spring 2006 Copyright (c) All rights reserved Leonard Wesley34 Solution #3 To Fix A Shortage  Case 2: Transfer an extra entry from subset[i+1] Pg 561 Main & Savitch 2, 4 10 19, 21, 22 6, 17 2, 4 10, 17 21, 22 6, 19

36 Spring 2006 Copyright (c) All rights reserved Leonard Wesley35 Deleting From A B-Tree (Loose Erase) 1.Make a local variable, i, = first index such that data[i] is !< target to delete. If there is no such index, then set i = to data_count, indicating that all of the entries are less than the target. 2.Deal with one of the following four possibilities: a. Root has no children, and we did not find the target (i.e., noting to do) b. Root has no children, and we found the target. Just remove target. c. Root has children, did not find target in root. Make recursive call to search subset[i]. d. Root has children, found target in root. Remove largest from subset[i], insert into data[i]. Elaborate on 2c and 2d on following slides …

37 Spring 2006 Copyright (c) All rights reserved Leonard Wesley36 Delete From B-Tree: Elaborate 2c  Target not found in root node, but target might be in subset[i]. Make recursive call subset[i]->loose_erase(target)  This will remove the target from subset[i] if it is in subset[i]. If so, then subset[i] might have < MIN entries. If so, then it needs to be fixed. subset[i]->fix_shortage(size_t i); Will discuss later

38 Spring 2006 Copyright (c) All rights reserved Leonard Wesley37 Delete From B-Tree: Elaborate 2d  Target is found in root node, but cannot be remove because there are children. subset[i]->loose_erase(target)  Go to subset[i] and remove the largest item in the subset. Create a copy of this largest item and insert it in data[i] (which contains the target) In effect this removes the target. However, removing the largest can cause a shortage. If so, call subset[i]->fix_shortage(i); Will discuss NOW!!

39 Spring 2006 Copyright (c) All rights reserved Leonard Wesley38 Fix Shortage  Case 1: If subset[i-1] has extra entries, then transfer the entry to subset[i] (pg 560 Main & Savitch) Transfer data[i-1] (i.e., 17) down to the front of subset[i]->data Shift over as necessary & update data count Transfer the final item of subset[i-1] (i.e., 15) up to replace data[i-1] and update data_count If subset[i-1] has children, transfer the final child of subset[i-1] over to the front of subset[i] … update data_count 2, 4 10, 12, 15 19 6, 17 2, 4 10, 12 17, 19 6, 15

40 Spring 2006 Copyright (c) All rights reserved Leonard Wesley39 Fix Shortage (cont.)  Case 2: If subset[i+1] has extra entries, then transfer the entry to subset[i] (pg 561 Main & Savitch) Similar to Case 1 2, 4 10 19, 21, 22 6, 17 2, 4 10, 17 21, 22 6, 19

41 Spring 2006 Copyright (c) All rights reserved Leonard Wesley40 Fix Shortage (cont.)  Case 3: Combine subset[i] with subset[i-1] (pg 561 Main & Savitch) If subset[i-1] is present (i.e., i > 0) but subset[i-1] only has the minimum # items/keys (i.e., no excess keys/items). Transfer data[i-1] down from the end of subset[i-1]->data …(see a pg 562) Transfer all of the items and children from subset[i] to the end of subset[i-1] … (see b pg 562) Delete the node subset[1] and shift subset[i+1], subset[i+2], and so on left… (see c pg 562) 2, 4 10, 12 19 6, 17 2, 4 10, 12, 17, 19 6 Deleted 22

42 Spring 2006 Copyright (c) All rights reserved Leonard Wesley41 In Class Delete Example #2 Go through Loose Erase Section In Main & Savitch pg. 558.


Download ppt "Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures."

Similar presentations


Ads by Google