Presentation is loading. Please wait.

Presentation is loading. Please wait.

External Sorting and Searching

Similar presentations


Presentation on theme: "External Sorting and Searching"— Presentation transcript:

1 External Sorting and Searching
B-Trees, etc.

2 m-Way Search Trees In a binary search tree, there is one key value per node and two children. There is no reason why I couldn’t have (at most) m-1 key values per node and m children. Such trees are called m-way search trees.

3 m-Way Search Tree Example
120, 240, 97 200 360, 440 Here is a 3-way search tree; each node has a maximum of 3 children.

4 m-Way Search Tree Example II
97 120, 240 360, 440 500 Here is another one.

5 m-Way Time Complexity Clearly, the search and insert time for an m-way search tree is still O(n). The number of nodes visited is O(n/m) For each, we must look at m values. We could search in O(log2(m)) time, yielding a best case of O(n/m * log2(m)). Of course, as n gets much larger than M, this is still O(n).

6 B-Trees What I want is a height-balanced m-way search tree to achieve the best search time. These are called B-Trees. As with height-balanced BSTs, we will have a re-balancing algorithm to run after every insert and delete.

7 B-Tree Properties The root may have between 2 and m children.
All other nodes must have between M/2 and m children. A node that has k children will have k-1 key values. Thus, the root may have only 2 children; all other nodes must be at least half full.

8 B-Tree Properties II If a B-Tree has k children (T0, T1, ...TK-1) and k-1 ordered key values (D1, D2,...DK-1), then all the key values in Ti are greater than Di but less than Di+1 for i=1...k-2. All the key values in T0 are less than D1. All the key values in Tk-1 are greater than DK-1. This simply means it is a search tree.

9 B-Tree Insertion All insertions are done at the terminal level.
First search for terminal level node to insert the new key value into. If the number of children of this node does not exceed m, stop. If the number of children does exceed m...

10 B-Tree Node Splitting Split this node into two nodes:
Take the middle value out. Create one node with the lower half of the key values and one with the upper half. Insert middle value into the parent node. Continue recursively until either the node can hold the new key value, or you split the root.

11 B-Tree Insert Example A B-Tree of order 3 (i.e. m=3) is the smallest possible. It is also the easiest to draw, so we’ll use this order for our example. This is also called a “2-3 Tree” because each node may have a maximum of 2 key values and 3 children.

12 B-Tree Example Key values left to insert: 360, 240, 200, 97, 440, 280 120 Insert A new root node is created and this value is placed into it.

13 B-Tree Example Key values left to insert:240, 200, 97, 440, 280 120, 360 Insert It goes into the root. No further action is required.

14 B-Tree Example Key values left to insert: 200, 97, 440, 280 120, 240, 360 Insert It goes into the root. Since this node has 3 values, it must be split.

15 B-Tree Example Key values left to insert: 200, 97, 440, 280 240 120 360 This shows the result of the split and 360 go into nodes by themselves, and 240 is placed into a new root node.

16 B-Tree Example Key values left to insert: 97, 440, 280 240 120, 200 360 Insert value 200. It goes into the node with No further action is required.

17 B-Tree Example Key values left to insert: 440, 280 240 97, 120, 200 360 Insert value 97. It goes into the node with 120 and Since this node contains too many values, it must be split

18 B-Tree Example Key values left to insert: 440, 280 120, 240, 97 200 360 This shows the result of the split. 97 and 200 are placed into their own nodes, and 120 is moved up to the parent. The parent node is OK.

19 B-Tree Example Key values left to insert:280 120, 240, 97 200 360, 440 Insert It goes into the node with No further action is required.

20 B-Tree Example Key values left to insert:DONE 120, 240, 97 200 280, 360, 440 Insert the value It goes into the node with 360 and Since this node has 3 values, it must be split.

21 B-Tree Example 120, 240, 360 97 200 280 440 This shows the result of the split and 440 go into nodes by themselves, and 360 is moved up to the parent node.

22 B-Tree Example 240 120 360 97 200 280 440 The parent node must be split as well. Because it is the root, we must create a new root node.

23 Time Complexity What is the order of a B-tree search? To answer this, we need to determine the worst case number of levels in a B-Tree of order m that has n key values. Let’s look at the number of nodes per level: The root must have 1 node; Level 2 must have 2 nodes; Level 3 must have 2* M/2 nodes; Level 4 must have 2* M/2 2 nodes; Level L must have 2* M/2 L-2 nodes.

24 Time Complexity II Observation: in any list of n elements, there are n+1 ways for the search to fail. In a B-tree, all the ways to fail are at level L+1 (these are sometimes called Failure Nodes). Thus, this is a relationship between the number of key values and the height of the tree:

25 Time Complexity III Because the previous analysis is a worst case, the number of nodes at level L+1 must be less than or equal to N+1: 2 * ém/2ù L-1 <= (N+1) ém/2ù L-1 <= (N+1)/2 L-1 <= Log ém/2ù [(N+1)/2] L <= Log ém/2ù [(N+1)/2] + 1

26 Time Complexity IV One node at each level must be accessed, so L gives the number of nodes to access. Each node contains ém/2ù -1 key values, so the total number of comparisons is {Log ém/2ù [(N+1)/2]+1} * {Log2[ém/2ù -1]}

27 Fun With Math Removing the constants, we may say this search is
O{ Log ém/2ù (N) * Log2[ém/2ù] } O{Log2(N) / Log2ém/2ù * (Log2[ém/2ù) } O{Log2(N)}

28 WHAT??? ALL THIS WORK FOR THE SAME ORDER AS AN AVL-TREE!!!
Summing it up: WHAT??? ALL THIS WORK FOR THE SAME ORDER AS AN AVL-TREE!!! What’s going on here???

29 What Really Happens Remember this is external sorting, so accessing the information and doing comparisons are a much different cost. Each node in the B-tree is stored in a “block” on the disk; a “block” is the minimum amount of information which can be retrieved with one disk access.

30 What Really Happens II Thus, the number of disk accesses is the bottle-neck; this is given by L. A B-tree is built on a field of a data file to speed access to that field. A “Clustered” or “Primary” B-tree stores the entire record of the file in the B-Tree. An “Unclustered” or “Secondary” B-tree stores the field’s value and the record number in the node.

31 What Really Happens III
It is the secondary B-trees that one usually means when one says “B-tree”. Thus, to do a search for a record on a field which has a B-tree: Search the B-tree for the key value. When found, retrieve its associated record number. Retrieve that record from the data file.

32 A Real Example. What follows is a real example of how a B-tree is used.

33 Sample Data File

34 B-Tree on Schedule# This is the way we would normally view it: 100 45
120 23 46 110 140,210

35 B-Tree on Schedule# This is how it really looks in a file :

36 Deleting in a B-tree To delete from a B-Tree, first locate the key value with the normal search routine. If the key value is not located in a terminal node, replace it with its in order successor and delete the in order successor. Thus, all deletes which reduce the number of key values occur at the terminal level.

37 Deleting From the Terminal Level
Good news: because there are no children to worry about, we can just remove it from the list. Bad news: what if this removal reduces the number of children below ém/2ù ? Reality: at some point we will need to reduce the number of nodes...

38 The “Borrow” Algorithm
When a node is reduced below ém/2ù children, first try and borrow a key value from one of its neighbors. If a neighbor has more than the minimum, then rotate the appropriate key to the parent and the appropriate key from the parent down to the reduced child.

39 Borrow Example 120, 240 97 200 360, 440 Suppose I want to delete 200 from this b-tree of order 3. To do so, rotate 240 into middle child, and 360 up to root:

40 Borrow Example This shows the result.
120, 360 97 240 440 This shows the result. Problem: what if I now want to delete 240? Borrowing won’t work...

41 Combining Nodes When borrowing won’t work, combine the node with the key value from the parent AND the neighbor node with minimum children. Repeat the deletion algorithm from the parent, looking first to borrow if possible. Now, let’s delete

42 Combining Example 120, 360 97 240 440 First, remove 240.

43 Combining Example Next, attempt to borrow. Borrowing fails.
120, 360 97 <empty> 440 Next, attempt to borrow. Borrowing fails. Combine empty node with 360 and 440.

44 Combining Example This shows the result.
120 97 360, 440 This shows the result. The parent is OK, so we are done...

45 A Larger Example Delete 280 This is a “borrow” case: 260 120, 180 360
97 150 200 280 440, 500 Delete 280 This is a “borrow” case:

46 A Larger Example Delete 360 This is a “combine” case: 260 120, 180 440
97 150 200 360 500 Delete 360 This is a “combine” case:

47 A Larger Example First, remove 360... 260 120, 180 440 97 150 200
<empty> 500 First, remove

48 A Larger Example 260 120, 180 440 97 150 200 <empty> 500 Next combine node with its neighbor (500) and 440 from the parent...

49 A Larger Example Parent now has a problem... This is a borrow case:
260 120, 180 <empty> 97 150 200 440, 500 Parent now has a problem... This is a borrow case:

50 A Larger Example 180 120 260 97 150 200 440, 500 Children must now be considered. What do I do with the node with 200?

51 A Larger Example Link it under 260. Now, delete 97... 180 120 260 97
150 200 440, 500 Link it under 260. Now, delete 97...

52 A Larger Example 180 120 260 <empty> 150 200 440, 500 This is a combine case, so bring 120 down and combine with

53 A Larger Example The parent now has a problem. This is a combine case:
180 <empty> 260 120, 150 200 440, 500 The parent now has a problem. This is a combine case:

54 A Larger Example The old root is now empty; what to do with it?
180, 260 120, 150 200 440, 500 The old root is now empty; what to do with it?

55 A Larger Example Root Just dispose of it properly. 180, 260 120, 150
200 440, 500 Just dispose of it properly.


Download ppt "External Sorting and Searching"

Similar presentations


Ads by Google