External Sorting and Searching

Slides:



Advertisements
Similar presentations
Chapter 4: Trees Part II - AVL Tree
Advertisements

Extendible Hashing For Use as a File Structure 1.
Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
CS 206 Introduction to Computer Science II 04 / 27 / 2009 Instructor: Michael Eckmann.
COMP 451/651 Indexes Chapter 1.
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
CS 206 Introduction to Computer Science II 12 / 05 / 2008 Instructor: Michael Eckmann.
Other time considerations Source: Simon Garrett Modifications by Evan Korth.
CS 206 Introduction to Computer Science II 12 / 03 / 2008 Instructor: Michael Eckmann.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
1 Red-black Trees zConsider a b-tree of order 4. yA node must have at least 2 children and as many as 4. yA node must have at least 1 key value and as.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
1 Heaps. 2 Background: Priority Queues. zQueues are a First-In, First-Out data structure; zPriority Queues are similar, except those of highest priority.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Tirgul 6 B-Trees – Another kind of balanced trees.
CS4432: Database Systems II
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
Index Structures for Files Indexes speed up the retrieval of records under certain search conditions Indexes called secondary access paths do not affect.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
CSE AU B-Trees1 B-Trees CSE 373 Data Structures.
Multi-way Trees. M-way trees So far we have discussed binary trees only. In this lecture, we go over another type of tree called m- way trees or trees.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
Chapter 6 Binary Trees. 6.1 Trees, Binary Trees, and Binary Search Trees Linked lists usually are more flexible than arrays, but it is difficult to use.
COSC 2007 Data Structures II Chapter 15 External Methods.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of CHAPTER 12: Multi-way Search Trees Java Software Structures: Designing.
File Organization and Processing Week Tree Tree.
CS 206 Introduction to Computer Science II 04 / 22 / 2009 Instructor: Michael Eckmann.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Chapter 7 Trees_Part3 1 SEARCH TREE. Search Trees 2  Two standard search trees:  Binary Search Trees (non-balanced) All items in left sub-tree are less.
Binary Search Trees (BSTs) 18 February Binary Search Tree (BST) An important special kind of binary tree is the BST Each node stores some information.
 B-tree is a specialized multiway tree designed especially for use on disk  B-Tree consists of a root node, branch nodes and leaf nodes containing the.
CIS 068 Welcome to CIS 068 ! Lesson 12: Data Structures 3 Trees.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Internal and External Sorting External Searching
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
B-Trees Katherine Gurdziel 252a-ba. Outline What are b-trees? How does the algorithm work? –Insertion –Deletion Complexity What are b-trees used for?
Priority Queues Opening Discussion zWhat did we talk about last class? zDo you have any questions about the assignments? The designs for assignment.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Chapter 11: Multiway Search Trees
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
(edited by Nadia Al-Ghreimil)
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B- Trees D. Frey with apologies to Tom Anastasio
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B- Trees D. Frey with apologies to Tom Anastasio
(edited by Nadia Al-Ghreimil)
B-Trees.
B-Trees.
Presentation transcript:

External Sorting and Searching B-Trees, etc.

m-Way Search Trees In a binary search tree, there is one key value per node and two children. There is no reason why I couldn’t have (at most) m-1 key values per node and m children. Such trees are called m-way search trees.

m-Way Search Tree Example 120, 240, 97 200 360, 440 Here is a 3-way search tree; each node has a maximum of 3 children.

m-Way Search Tree Example II 97 120, 240 360, 440 500 Here is another one.

m-Way Time Complexity Clearly, the search and insert time for an m-way search tree is still O(n). The number of nodes visited is O(n/m) For each, we must look at m values. We could search in O(log2(m)) time, yielding a best case of O(n/m * log2(m)). Of course, as n gets much larger than M, this is still O(n).

B-Trees What I want is a height-balanced m-way search tree to achieve the best search time. These are called B-Trees. As with height-balanced BSTs, we will have a re-balancing algorithm to run after every insert and delete.

B-Tree Properties The root may have between 2 and m children. All other nodes must have between M/2 and m children. A node that has k children will have k-1 key values. Thus, the root may have only 2 children; all other nodes must be at least half full.

B-Tree Properties II If a B-Tree has k children (T0, T1, ...TK-1) and k-1 ordered key values (D1, D2,...DK-1), then all the key values in Ti are greater than Di but less than Di+1 for i=1...k-2. All the key values in T0 are less than D1. All the key values in Tk-1 are greater than DK-1. This simply means it is a search tree.

B-Tree Insertion All insertions are done at the terminal level. First search for terminal level node to insert the new key value into. If the number of children of this node does not exceed m, stop. If the number of children does exceed m...

B-Tree Node Splitting Split this node into two nodes: Take the middle value out. Create one node with the lower half of the key values and one with the upper half. Insert middle value into the parent node. Continue recursively until either the node can hold the new key value, or you split the root.

B-Tree Insert Example A B-Tree of order 3 (i.e. m=3) is the smallest possible. It is also the easiest to draw, so we’ll use this order for our example. This is also called a “2-3 Tree” because each node may have a maximum of 2 key values and 3 children.

B-Tree Example Key values left to insert: 360, 240, 200, 97, 440, 280 120 Insert 120. A new root node is created and this value is placed into it.

B-Tree Example Key values left to insert:240, 200, 97, 440, 280 120, 360 Insert 360. It goes into the root. No further action is required.

B-Tree Example Key values left to insert: 200, 97, 440, 280 120, 240, 360 Insert 240. It goes into the root. Since this node has 3 values, it must be split.

B-Tree Example Key values left to insert: 200, 97, 440, 280 240 120 360 This shows the result of the split. 120 and 360 go into nodes by themselves, and 240 is placed into a new root node.

B-Tree Example Key values left to insert: 97, 440, 280 240 120, 200 360 Insert value 200. It goes into the node with 120. No further action is required.

B-Tree Example Key values left to insert: 440, 280 240 97, 120, 200 360 Insert value 97. It goes into the node with 120 and 200. Since this node contains too many values, it must be split

B-Tree Example Key values left to insert: 440, 280 120, 240, 97 200 360 This shows the result of the split. 97 and 200 are placed into their own nodes, and 120 is moved up to the parent. The parent node is OK.

B-Tree Example Key values left to insert:280 120, 240, 97 200 360, 440 Insert 440. It goes into the node with 360. No further action is required.

B-Tree Example Key values left to insert:DONE 120, 240, 97 200 280, 360, 440 Insert the value 280. It goes into the node with 360 and 440. Since this node has 3 values, it must be split.

B-Tree Example 120, 240, 360 97 200 280 440 This shows the result of the split. 280 and 440 go into nodes by themselves, and 360 is moved up to the parent node.

B-Tree Example 240 120 360 97 200 280 440 The parent node must be split as well. Because it is the root, we must create a new root node.

Time Complexity What is the order of a B-tree search? To answer this, we need to determine the worst case number of levels in a B-Tree of order m that has n key values. Let’s look at the number of nodes per level: The root must have 1 node; Level 2 must have 2 nodes; Level 3 must have 2* M/2 nodes; Level 4 must have 2* M/2 2 nodes; Level L must have 2* M/2 L-2 nodes.

Time Complexity II Observation: in any list of n elements, there are n+1 ways for the search to fail. In a B-tree, all the ways to fail are at level L+1 (these are sometimes called Failure Nodes). Thus, this is a relationship between the number of key values and the height of the tree:

Time Complexity III Because the previous analysis is a worst case, the number of nodes at level L+1 must be less than or equal to N+1: 2 * ém/2ù L-1 <= (N+1) ém/2ù L-1 <= (N+1)/2 L-1 <= Log ém/2ù [(N+1)/2] L <= Log ém/2ù [(N+1)/2] + 1

Time Complexity IV One node at each level must be accessed, so L gives the number of nodes to access. Each node contains ém/2ù -1 key values, so the total number of comparisons is {Log ém/2ù [(N+1)/2]+1} * {Log2[ém/2ù -1]}

Fun With Math Removing the constants, we may say this search is O{ Log ém/2ù (N) * Log2[ém/2ù] } O{Log2(N) / Log2ém/2ù * (Log2[ém/2ù) } O{Log2(N)}

WHAT??? ALL THIS WORK FOR THE SAME ORDER AS AN AVL-TREE!!! Summing it up: WHAT??? ALL THIS WORK FOR THE SAME ORDER AS AN AVL-TREE!!! What’s going on here???

What Really Happens Remember this is external sorting, so accessing the information and doing comparisons are a much different cost. Each node in the B-tree is stored in a “block” on the disk; a “block” is the minimum amount of information which can be retrieved with one disk access.

What Really Happens II Thus, the number of disk accesses is the bottle-neck; this is given by L. A B-tree is built on a field of a data file to speed access to that field. A “Clustered” or “Primary” B-tree stores the entire record of the file in the B-Tree. An “Unclustered” or “Secondary” B-tree stores the field’s value and the record number in the node.

What Really Happens III It is the secondary B-trees that one usually means when one says “B-tree”. Thus, to do a search for a record on a field which has a B-tree: Search the B-tree for the key value. When found, retrieve its associated record number. Retrieve that record from the data file.

A Real Example. What follows is a real example of how a B-tree is used.

Sample Data File

B-Tree on Schedule# This is the way we would normally view it: 100 45 120 23 46 110 140,210

B-Tree on Schedule# This is how it really looks in a file :

Deleting in a B-tree To delete from a B-Tree, first locate the key value with the normal search routine. If the key value is not located in a terminal node, replace it with its in order successor and delete the in order successor. Thus, all deletes which reduce the number of key values occur at the terminal level.

Deleting From the Terminal Level Good news: because there are no children to worry about, we can just remove it from the list. Bad news: what if this removal reduces the number of children below ém/2ù ? Reality: at some point we will need to reduce the number of nodes...

The “Borrow” Algorithm When a node is reduced below ém/2ù children, first try and borrow a key value from one of its neighbors. If a neighbor has more than the minimum, then rotate the appropriate key to the parent and the appropriate key from the parent down to the reduced child.

Borrow Example 120, 240 97 200 360, 440 Suppose I want to delete 200 from this b-tree of order 3. To do so, rotate 240 into middle child, and 360 up to root:

Borrow Example This shows the result. 120, 360 97 240 440 This shows the result. Problem: what if I now want to delete 240? Borrowing won’t work...

Combining Nodes When borrowing won’t work, combine the node with the key value from the parent AND the neighbor node with minimum children. Repeat the deletion algorithm from the parent, looking first to borrow if possible. Now, let’s delete 240...

Combining Example 120, 360 97 240 440 First, remove 240.

Combining Example Next, attempt to borrow. Borrowing fails. 120, 360 97 <empty> 440 Next, attempt to borrow. Borrowing fails. Combine empty node with 360 and 440.

Combining Example This shows the result. 120 97 360, 440 This shows the result. The parent is OK, so we are done...

A Larger Example Delete 280 This is a “borrow” case: 260 120, 180 360 97 150 200 280 440, 500 Delete 280 This is a “borrow” case:

A Larger Example Delete 360 This is a “combine” case: 260 120, 180 440 97 150 200 360 500 Delete 360 This is a “combine” case:

A Larger Example First, remove 360... 260 120, 180 440 97 150 200 <empty> 500 First, remove 360...

A Larger Example 260 120, 180 440 97 150 200 <empty> 500 Next combine node with its neighbor (500) and 440 from the parent...

A Larger Example Parent now has a problem... This is a borrow case: 260 120, 180 <empty> 97 150 200 440, 500 Parent now has a problem... This is a borrow case:

A Larger Example 180 120 260 97 150 200 440, 500 Children must now be considered. What do I do with the node with 200?

A Larger Example Link it under 260. Now, delete 97... 180 120 260 97 150 200 440, 500 Link it under 260. Now, delete 97...

A Larger Example 180 120 260 <empty> 150 200 440, 500 This is a combine case, so bring 120 down and combine with 150...

A Larger Example The parent now has a problem. This is a combine case: 180 <empty> 260 120, 150 200 440, 500 The parent now has a problem. This is a combine case:

A Larger Example The old root is now empty; what to do with it? 180, 260 120, 150 200 440, 500 The old root is now empty; what to do with it?

A Larger Example Root Just dispose of it properly. 180, 260 120, 150 200 440, 500 Just dispose of it properly.