B-Trees CSE 373 Data Structures CSE 373 -- AU 2004 -- B-Trees.

Slides:



Advertisements
Similar presentations
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
Advertisements

Other time considerations Source: Simon Garrett Modifications by Evan Korth.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 6.
CSE AU B-Trees1 B-Trees CSE 373 Data Structures.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Starting at Binary Trees
Binary Search Trees (10.1) CSE 2011 Winter November 2015.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Binary Search Trees (BSTs) 18 February Binary Search Tree (BST) An important special kind of binary tree is the BST Each node stores some information.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
COMP261 Lecture 23 B Trees.
B-Trees B-Trees.
Multiway Search Trees Data may not fit into main memory
B-Trees B-Trees.
B-Trees B-Trees.
Tree-Structured Indexes
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Binary Search Trees (10.1) CSE 2011 Winter August 2018.
B+-Trees.
B+-Trees.
B+-Trees.
B+ Tree.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
O(lg n) Search Tree Tree T is a search tree made up of n elements: x0 x1 x2 x3 … xn-1 No function (except transverse) takes more than O(lg n) in the.
B Tree Adhiraj Goel 1RV07IS004.
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
B-Trees (continued) Analysis of worst-case and average number of disk accesses for an insert. Delete and analysis. Structure for B-tree node.
Multi-Way Search Trees
B-Trees.
B-Trees CSE 373 Data Structures CSE AU B-Trees.
B+-Trees and Static Hashing
CS222/CS122C: Principles of Data Management Notes #07 B+ Trees
B- Trees D. Frey with apologies to Tom Anastasio
B- Trees D. Frey with apologies to Tom Anastasio
B-Tree.
B+-Trees (Part 1).
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Other time considerations
Multiway Trees Searching and B-Trees Advanced Tree Structures
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees CSE 373 Data Structures CSE AU B-Trees.
CSIT 402 Data Structures II With thanks to TK Prasad
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B- Trees D. Frey with apologies to Tom Anastasio
CSE 373, Copyright S. Tanimoto, 2002 B-Trees -
Credit for some of the slides in this lecture goes to
CENG 351 Data Management and File Structures
B-Trees.
Tree-Structured Indexes
Credit for some of the slides in this lecture goes to
B-Trees.
1 Lecture 13 CS2013.
Heaps & Multi-way Search Trees
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #06 B+ trees Instructor: Chen Li.
CS222P: Principles of Data Management UCI, Fall Notes #06 B+ trees
Presentation transcript:

B-Trees CSE 373 Data Structures CSE 373 -- AU 2004 -- B-Trees

B-Trees Considerations for disk-based storage systems. Indexed Sequential Access Method (ISAM) m-way search trees B-trees CSE 373 -- AU 2004 -- B-Trees

Data Layout on Disk Track: one ring Sector: one pie-shaped piece. Block: intersection of a track and a sector. CSE 373 -- AU 2004 -- B-Trees

Disk Block Access Time Seek time = maximum of Time for the disk head to move to the correct track. Time for the beginning of the correct sector to spin round to the head. (Some authors use “latency” as the term for this component, or they use latency to refer to all of what we are calling seek time.) Transfer time = Time to read or write the data. (Approximately the time for the sector to spin by the head). For a 7200 RPM hard disk with 8 millisec seek time, average access time for a block is about 12 millisec. (see Albert Drayes and John Treder: http://www.tanstaafl-software.com/seektime.html) CSE 373 -- AU 2004 -- B-Trees

Considerations for Disk Based Dictionary Structures Use a disk-based method when the dictionary is too big to fit in RAM at once. Minimize the expected or worst-case number of disk accesses for the essential operations (put, get, remove). Keep space requirements reasonable -- O(n). Methods based on binary trees, such as AVL search trees, are not optimal for disk-based representations. The number of disk accesses can be greatly reduced by using m-way search trees. CSE 373 -- AU 2004 -- B-Trees

Indexed Sequential Access Method (ISAM) Store m records in each disk block. Use an index that consists of an array with one element for each disk block, holding a copy of the largest key that occurs in that block. CSE 373 -- AU 2004 -- B-Trees

ISAM (Continued) 1.7 5.1 21.2 26.8 . . . CSE 373 -- AU 2004 -- B-Trees

ISAM (Continued) To perform a get(k) operation: Look in the index using, say, either a sequential search or a binary search, to determine which disk block should hold the desired record. Then perform one disk access to read that block, and extract the desired record, if it exists. CSE 373 -- AU 2004 -- B-Trees

ISAM Limitations Problems with ISAM: What if the index itself is too large to fit entirely in RAM at the same time? Insertion and deletion could be very expensive if all records after the inserted or deleted one have to shift up or down, crossing block boundaries. CSE 373 -- AU 2004 -- B-Trees

A Solution: B-Trees Idea 1: Use m-way search trees. (ISAM uses a root and one level under the root.) m-way search trees can be as high as we need. Idea 2: Don’t require that each node always be full. Empty space will permit insertion without rebalancing. Allowing empty space after a deletion can also avoid rebalancing. Idea 3: Rebalancing will sometimes be necessary: figure out how to do it in time proportional to the height of the tree. CSE 373 -- AU 2004 -- B-Trees

B-Tree Example with m = 5 12 2 3 8 13 27 The root has been 2 and m children. Each non-root internal node has between m/2 and m children. All external nodes are at the same level. (External nodes are actually represented by null pointers in implementations.) CSE 373 -- AU 2004 -- B-Trees

Insert 10 12 2 3 8 10 13 27 We find the location for 10 by following a path from the root using the stored key values to guide the search. The search falls out the tree at the 4th child of the 1st child of the root. The 1st child of the root has room for the new element, so we store it there. CSE 373 -- AU 2004 -- B-Trees

Insert 11 12 2 3 8 10 11 13 27 We fall out of the tree at the child to the right of key 10. But there is no more room in the left child of the root to hold 11. Therefore, we must split this node... CSE 373 -- AU 2004 -- B-Trees

Insert 11 (Continued) 8 12 2 3 10 11 13 27 The m + 1 children are divided evenly between the old and new nodes. The parent gets one new child. (If the parent become overfull, then it, too, will have to be split). CSE 373 -- AU 2004 -- B-Trees

Remove 8 8 12 2 3 10 11 13 27 Removing 8 might force us to move another key up from one of the children. It could either be the 3 from the 1st child or the 10 from the second child. However, neither child has more than the minimum number of children (3), so the two nodes will have to be merged. Nothing moves up. CSE 373 -- AU 2004 -- B-Trees

Remove 8 (Continued) 12 2 3 10 11 13 27 The root contains one fewer key, and has one fewer child. CSE 373 -- AU 2004 -- B-Trees

Remove 13 12 2 3 10 11 13 27 Removing 13 would cause the node containing it to become underfull. To fix this, we try to reassign one key from a sibling that has spares. CSE 373 -- AU 2004 -- B-Trees

Remove 13 (Cont) 11 2 3 10 12 27 The 13 is replaced by the parent’s key 12. The parent’s key 12 is replaced by the spare key 11 from the left sibling. The sibling has one fewer element. CSE 373 -- AU 2004 -- B-Trees

Remove 11 11 2 3 10 12 27 11 is in a non-leaf, so replace it by the value immediately preceding: 10. 10 is at leaf, and this node has spares, so just delete it there. CSE 373 -- AU 2004 -- B-Trees

Remove 11 (Cont) 10 2 3 12 27 CSE 373 -- AU 2004 -- B-Trees

Remove 2 10 2 3 12 27 Although 2 is at leaf level, removing it leads to an underfull node. The node has no left sibling. It does have a right sibling, but that node is at its minimum occupancy already. Therefore, the node must be merged with its right sibling. CSE 373 -- AU 2004 -- B-Trees

Remove 2 (Cont) 3 10 12 27 The result is illegal, because the root does not have at least 2 children. Therefore, we must remove the root, making its child the new root. CSE 373 -- AU 2004 -- B-Trees

Remove 2 (Cont) 3 10 12 27 The new B-tree has only one node, the root. CSE 373 -- AU 2004 -- B-Trees

Insert 49 3 10 12 27 Let’s put an element into this B-tree. CSE 373 -- AU 2004 -- B-Trees

Insert 49 (Cont) 3 10 12 27 49 Adding this key make the node overfull, so it must be split into two. But this node was the root. So we must construct a new root, and make these its children. CSE 373 -- AU 2004 -- B-Trees

Insert 49 (Cont) 12 3 10 27 49 The middle key (12) is moved up into the root. The result is a B-tree with one more level. CSE 373 -- AU 2004 -- B-Trees

B-Tree performance Let h = height of the B-tree. get(k): at most h disk accesses. O(h) put(k): at most 3h + 1 disk accesses. O(h) remove(k): at most 3h disk accesses. O(h) h < log d (n + 1)/2 + 1 where d = m/2 (Sahni, p.641). An important point is that the constant factors are relatively low. m should be chosen so as to match the maximum node size to the block size on the disk. Example: m = 128, d = 64, n  643 = 262144 , h = 4. CSE 373 -- AU 2004 -- B-Trees

2-3 Trees A B-tree of order m is a kind of m-way search tree. A B-Tree of order 3 is called a 2-3 Tree. In a 2-3 tree, each internal node has either 2 or 3 children. In practical applications, however, B-Trees of large order (e.g., m = 128) are more common than low-order B-Trees such as 2-3 trees. CSE 373 -- AU 2004 -- B-Trees