CSE 373 -- AU 2004 -- B-Trees1 B-Trees CSE 373 Data Structures.

Slides:



Advertisements
Similar presentations
Chapter 4: Trees Part II - AVL Tree
Advertisements

Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
Other time considerations Source: Simon Garrett Modifications by Evan Korth.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
CS4432: Database Systems II
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
1 Multiway trees & B trees & 2_4 trees Go&Ta Chap 10.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 6.
Multi-way Trees. M-way trees So far we have discussed binary trees only. In this lecture, we go over another type of tree called m- way trees or trees.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
B-Trees And B+-Trees Jay Yim CS 157B Dr. Lee.
COSC 2007 Data Structures II Chapter 15 External Methods.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Starting at Binary Trees
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of CHAPTER 12: Multi-way Search Trees Java Software Structures: Designing.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
 B-tree is a specialized multiway tree designed especially for use on disk  B-Tree consists of a root node, branch nodes and leaf nodes containing the.
B+ tree & B tree Extracted from Garcia Molina
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Internal and External Sorting External Searching
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
More Trees. Outline Tree B-Tree 2-3 Tree Tree Red-Black Tree.
COMP261 Lecture 23 B Trees.
B-Trees B-Trees.
Multiway Search Trees Data may not fit into main memory
B-Trees B-Trees.
B+-Trees.
B+-Trees.
B-Trees CSE 373 Data Structures CSE AU B-Trees.
Other time considerations
B-Trees CSE 373 Data Structures CSE AU B-Trees.
CSIT 402 Data Structures II With thanks to TK Prasad
CSE 373, Copyright S. Tanimoto, 2002 B-Trees -
Credit for some of the slides in this lecture goes to
B-Trees CSE 373 Data Structures CSE AU B-Trees.
B-Trees.
Credit for some of the slides in this lecture goes to
B-Trees.
Presentation transcript:

CSE AU B-Trees1 B-Trees CSE 373 Data Structures

CSE AU B-Trees2 Readings Reading –Goodrich and Tamassia, Chapter 9, pp in the 3 rd edition.

CSE AU B-Trees3 B-Trees Considerations for disk-based storage systems. Indexed Sequential Access Method (ISAM) m-way search trees B-trees

CSE AU B-Trees4 Data Layout on Disk Track: one ring Sector: one pie-shaped piece. Block: intersection of a track and a sector.

CSE AU B-Trees5 Disk Block Access Time Seek time = maximum of Time for the disk head to move to the correct track. Time for the beginning of the correct sector to spin round to the head. (Some authors use “latency” as the term for this component, or they use latency to refer to all of what we are calling seek time.) Transfer time = Time to read or write the data. (Approximately the time for the sector to spin by the head). For a 7200 RPM hard disk with 8 millisec seek time, average access time for a block is about 12 millisec. (see Albert Drayes and John Treder:

CSE AU B-Trees6 Considerations for Disk Based Dictionary Structures Use a disk-based method when the dictionary is too big to fit in RAM at once. Minimize the expected or worst-case number of disk accesses for the essential operations (put, get, remove). Keep space requirements reasonable -- O(n). Methods based on binary trees, such as AVL search trees, are not optimal for disk-based representations. The number of disk accesses can be greatly reduced by using m-way search trees.

CSE AU B-Trees7 Indexed Sequential Access Method (ISAM) Store m records in each disk block. Use an index that consists of an array with one element for each disk block, holding a copy of the largest key that occurs in that block.

CSE AU B-Trees8 ISAM (Continued)

CSE AU B-Trees9 ISAM (Continued) To perform a get(k) operation: Look in the index using, say, either a sequential search or a binary search, to determine which disk block should hold the desired record. Then perform one disk access to read that block, and extract the desired record, if it exists.

CSE AU B-Trees10 ISAM Limitations Problems with ISAM: What if the index itself is too large to fit entirely in RAM at the same time? Insertion and deletion could be very expensive if all records after the inserted or deleted one have to shift up or down, crossing block boundaries.

CSE AU B-Trees11 A Solution: B-Trees Idea 1: Use m-way search trees. (ISAM uses a root and one level under the root.) m-way search trees can be as high as we need. Idea 2: Don’t require that each node always be full. Empty space will permit insertion without rebalancing. Allowing empty space after a deletion can also avoid rebalancing. Idea 3: Rebalancing will sometimes be necessary: figure out how to do it in time proportional to the height of the tree.

CSE AU B-Trees12 B-Tree Example with m = The root has been 2 and m children. Each non-root internal node has between  m/2  and m children. All external nodes are at the same level. (External nodes are actually represented by null pointers in implementations.)

CSE AU B-Trees13 Insert We find the location for 10 by following a path from the root using the stored key values to guide the search. The search falls out the tree at the 4th child of the 1st child of the root. The 1st child of the root has room for the new element, so we store it there. 10

CSE AU B-Trees14 Insert We fall out of the tree at the child to the right of key 10. But there is no more room in the left child of the root to hold 11. Therefore, we must split this node

CSE AU B-Trees15 Insert 11 (Continued) The m + 1 children are divided evenly between the old and new nodes. The parent gets one new child. (If the parent become overfull, then it, too, will have to be split)

CSE AU B-Trees16 Remove Removing 8 might force us to move another key up from one of the children. It could either be the 3 from the 1st child or the 10 from the second child. However, neither child has more than the minimum number of children (3), so the two nodes will have to be merged. Nothing moves up

CSE AU B-Trees17 Remove 8 (Continued) The root contains one fewer key, and has one fewer child. 1011

CSE AU B-Trees18 Remove Removing 13 would cause the node containing it to become underfull. To fix this, we try to reassign one key from a sibling that has spares. 1011

CSE AU B-Trees19 Remove 13 (Cont) The 13 is replaced by the parent’s key 12. The parent’s key 12 is replaced by the spare key 11 from the left sibling. The sibling has one fewer element. 10

CSE AU B-Trees20 Remove is in a non-leaf, so replace it by the value immediately preceding: is at leaf, and this node has spares, so just delete it there. 10

CSE AU B-Trees21 Remove 11 (Cont)

CSE AU B-Trees22 Remove Although 2 is at leaf level, removing it leads to an underfull node. The node has no left sibling. It does have a right sibling, but that node is at its minimum occupancy already. Therefore, the node must be merged with its right sibling.

CSE AU B-Trees23 Remove 2 (Cont) The result is illegal, because the root does not have at least 2 children. Therefore, we must remove the root, making its child the new root.

CSE AU B-Trees24 Remove 2 (Cont) The new B-tree has only one node, the root.

CSE AU B-Trees25 Insert Let’s put an element into this B-tree.

CSE AU B-Trees26 Insert 49 (Cont) 310 Adding this key make the node overfull, so it must be split into two. But this node was the root. So we must construct a new root, and make these its children

CSE AU B-Trees27 Insert 49 (Cont) 310 The middle key (12) is moved up into the root. The result is a B-tree with one more level

CSE AU B-Trees28 B-Tree performance Let h = height of the B-tree. get(k): at most h disk accesses. O(h) put(k): at most 3h + 1 disk accesses. O(h) remove(k): at most 3h disk accesses. O(h) h < log d (n + 1)/2 + 1 where d =  m/2  (Sahni, p.641). An important point is that the constant factors are relatively low. m should be chosen so as to match the maximum node size to the block size on the disk. Example: m = 128, d = 64, n  64 3 = , h = 4.

CSE AU B-Trees Trees A B-tree of order m is a kind of m-way search tree. A B-Tree of order 3 is called a 2-3 Tree. In a 2-3 tree, each internal node has either 2 or 3 children. In practical applications, however, B-Trees of large order (e.g., m = 128) are more common than low-order B-Trees such as 2-3 trees.