Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Slides:



Advertisements
Similar presentations
Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
Advertisements

B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
They’re not just binary anymore!
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter Trees and B-Trees.
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
Other time considerations Source: Simon Garrett Modifications by Evan Korth.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Tirgul 6 B-Trees – Another kind of balanced trees.
CS4432: Database Systems II
CSC 213 – Large Scale Programming. Today’s Goals  Review a new search tree algorithm is needed  What real-world problems occur with old tree?  Why.
1 Multiway trees & B trees & 2_4 trees Go&Ta Chap 10.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
Index Structures for Files Indexes speed up the retrieval of records under certain search conditions Indexes called secondary access paths do not affect.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 6.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
CSE AU B-Trees1 B-Trees CSE 373 Data Structures.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
B-Trees And B+-Trees Jay Yim CS 157B Dr. Lee.
COSC 2007 Data Structures II Chapter 15 External Methods.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
P p Chapter 10 has several programming projects, including a project that uses heaps. p p This presentation shows you what a heap is, and demonstrates.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Starting at Binary Trees
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of CHAPTER 12: Multi-way Search Trees Java Software Structures: Designing.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
CompSci 100E 39.1 Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
CS 206 Introduction to Computer Science II 04 / 22 / 2009 Instructor: Michael Eckmann.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
1 Multi-Level Indexing and B-Trees. 2 Statement of the Problem When indexes grow too large they have to be stored on secondary storage. However, there.
CompSci Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
 B-tree is a specialized multiway tree designed especially for use on disk  B-Tree consists of a root node, branch nodes and leaf nodes containing the.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Internal and External Sorting External Searching
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
@ Zhigang Zhu, CSC212 Data Structure - Section FG Lecture 17 B-Trees and the Set Class Instructor: Zhigang Zhu Department of Computer Science.
COMP261 Lecture 23 B Trees.
B-Trees B-Trees.
B-Trees B-Trees.
CSC212 Data Structure - Section AB
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B-Trees © Dave Bockus Acknowledgements to:
B Tree Adhiraj Goel 1RV07IS004.
Balanced-Trees This presentation shows you the potential problem of unbalanced tree and show two way to fix it This lecture introduces heaps, which are.
B-Trees This presentation shows you the potential problem of unbalanced tree and show one way to fix it This lecture introduces heaps, which are used.
Balanced-Trees This presentation shows you the potential problem of unbalanced tree and show two way to fix it This lecture introduces heaps, which are.
CSIT 402 Data Structures II With thanks to TK Prasad
B-Trees.
B-Trees This presentation shows you the potential problem of unbalanced tree and show one way to fix it This lecture introduces heaps, which are used.
Presentation transcript:

Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures

Spring 2006 Copyright (c) All rights reserved Leonard Wesley1 Why B-Trees?  Trees studied so far are for storing data in memory  B-Trees are better suited for storing data in memory AND on secondary storage.  Better suited for balancing data than some other three ADTs.  Can store multiple keys with the same value, unlike some other trees, such as AVL trees.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley2 The Problem With Unbalanced Trees The levels are sparsely filled resulting in deep paths. This defeats the purpose of binary trees

Spring 2006 Copyright (c) All rights reserved Leonard Wesley3 Possible Solutions To Unbalanced Trees  Periodically balance the tree  Don’t let a tree get too unbalanced when inserting or deleting AVL Trees: Sometimes called HB[1] trees. Invented by Adel’son-Vel’skii and Landis ~early 1960s. (an in- memory solution … not ideally suited secondary storage) B-Trees: Proposed by R. Bayer & E.M. Creight (see pg. 542 Main & Savitch for ref.)

Spring 2006 Copyright (c) All rights reserved Leonard Wesley4 What Is A B-Tree?  It is a type of “multiway” tree.  It is NOT a binary search tree, nor is it a binary tree.  It provides a fast way to index into a multi- level set of nodes.  Each node in the B-Tree contains a sorted array of key values.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley5 Motivation For Multiway Tree  Secondary storage (e.g., disks) is typically divided into equal- sized blocks (e.g., 512, 1024, …, 4096, …)  The basic I/O operation reads and writes blocks rather than single bytes at a time between secondary storage and memory.  Goal is to devise a multiway search tree that will minimize file access by exploiting disk reads.  Each access to secondary storage is approximately equal to 250K instructions … depending on the speed of the CPU

Spring 2006 Copyright (c) All rights reserved Leonard Wesley6 ISAM  ISAM = Indexed Sequential Access Method.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley7 ISAM: The Idea Disk Platter Block 512, 1024, …bytes Track

Spring 2006 Copyright (c) All rights reserved Leonard Wesley8 ISAM: Index & Keys Block Key Data All data in the block will have keys ≤ the block key, or have keys ≥ the block key. Pick one inequality and stick with it. A Block on a track. Block #

Spring 2006 Copyright (c) All rights reserved Leonard Wesley9 ISAM: Block Index Block Index Block #Key This index could be stored in memory 0G 1K 2N

Spring 2006 Copyright (c) All rights reserved Leonard Wesley10 ISAM: Disk Index Disk #Key This index could be stored in memory also 0G 1V 2X Disk 0 Disk n

Spring 2006 Copyright (c) All rights reserved Leonard Wesley11 ISAM: Insertion/Deletion  Insertion: Might involve moving data across blocks Can leave extra space when inserting into a block  Deletion: Might involve contracting data across blocks Need not contract every time, i.e., leave some space for possible future expansion

Spring 2006 Copyright (c) All rights reserved Leonard Wesley12 Multiway Search Tree (order m )  A generalization of a binary search trees.  Each node has at most m children. If k <=m is the number of children, then the node has exactly k-1 keys. The tree is ordered.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley13 Multiway Search Tree (cont.) keys < k1 k2 < keys < k3 k5 < keys k1 k2 k3 k4 k5 Nodes in a multiway tree

Spring 2006 Copyright (c) All rights reserved Leonard Wesley14 Definition Of A B-Tree  A B-Tree of order m is a m -way tree such that All leaves are on the same level All internal nodes except the root node are constrained to have at most m non-empty children and at least m /2 non-empty children. The root node has at most m non-empty children

Spring 2006 Copyright (c) All rights reserved Leonard Wesley15 Three Important Properties Of B-Trees  All nodes in the B-Tree are at least half-full (root node is an exception at times)  The B-tree is always balanced. That is, an identical number of nodes must be read into memory in order to locate all keys at any given level in the tree.  A well organized B-Tree will have just a small number of levels relative to the number of nodes.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley16 Where are B-Tree Used?  B-Trees are commonly found in database and file systems.  B-Trees allow logarithmic time insertions and deletions.  They generally grow from the bottom upwards as elements are inserted, whereas most binary trees grow downward.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley17 The Six Rules Governing B-Trees  R1: A B-Tree might be empty, if not, then each node has some specified MINIMUM number of entries in each node.  R2: The MAXIMUM number of entries is twice the MINIMUM.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley18 The Six Rules Governing B-Trees (cont)  R3: The entries of each B-Tree node are stored in a partially filled array, sorted from the smallest entry (at index 0) to the largest entry (at the final position of the array). hkk*n.... 0n-1 The data in such an array can be stored in a block on a disk B-Tree node * B-Trees can support duplicate keys

Spring 2006 Copyright (c) All rights reserved Leonard Wesley19 The Six Rules Governing B-Trees (cont)  R4: The number of subtrees below a non-leaf node is always one more than the number of entries in the node entries in a non-leaf node Keys < 45 Keys > 45 & < 55 Keys > 55 & < 67 Keys > 67 & < 82 Keys > 82 5 subtrees subtree 0 subtree 1 subtree 2 subtree 3 subtree

Spring 2006 Copyright (c) All rights reserved Leonard Wesley20 The Six Rules Governing B-Trees (cont)  R5: For any non-leaf node: An entry at index i is greater than all the entries in subtree i of the node, and An entry at index i is less than all the entries at entry i+1 of the node.  R6: Every leaf node in a B-Tree has the same depth (i.e., at the same level)

Spring 2006 Copyright (c) All rights reserved Leonard Wesley21 Example B-Tree MIN = 1 MAX =

Spring 2006 Copyright (c) All rights reserved Leonard Wesley22 Searching For A Target In B-Trees  Start with root node and search for target in the array at that node. If found, then done and return success.  If the target is not in the root and there are no children, then also done, but return failure.  If the target is not in the root node, and there are children, then if the target exists, then it can only be in one subtree.  Compare the target with the listed keys and traverse first subtree i for which target is < key_array[i] … while search key_array from left to right … up to data_count. Repeat the process at the new root node

Spring 2006 Copyright (c) All rights reserved Leonard Wesley23 Inserting Into A B-Tree Add the new key to the appropriate leaf node Split the node into two nodes on the same level, and promote the median key Overflow? Yes No

Spring 2006 Copyright (c) All rights reserved Leonard Wesley24 Loose Insertion (pg. 551 Maini & Savitch, one of several ways) MIN = 1 MAX = 2 12 Insert 18 6 | | 19 | 22 Excess Entry (problem child)

Spring 2006 Copyright (c) All rights reserved Leonard Wesley25 Fixing A Loose Insertion 6, 17, Split problem child, and promote middle key to parent node. Still have excess Fix excess by repeating the process. Split node and promote middle key to new root node. MIN = 1 MAX = 2

Spring 2006 Copyright (c) All rights reserved Leonard Wesley26 Pseudo Code For Loose Insert 1.Make a local variable, i, equal to the first index such that data[i] is not less than the new entry to insert. If there is no such index, then set i equal to data_count, indicating that all of the entries are less than the target. 2. If (we found the new entry at data[i]) a)Return false with no further work (since the new entry is already in the tree) else if (the root has no children) b) Add the new entry to the root at data[i]. The original entries at data[i] and afterwards must be shifted right to make room for the new entry. Return to indicate that we added the entry. else c) Save the value from this recursive call: subset[i]->loose_insert(entry); Then check whether the root of subset[i] now has an excess entry; if so, then fix that problem. Return the saved value from the recursive call.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley27 Insert In Class Exercise MIN = 1 MAX = 2 12  Insert 5, then insert 7.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley28 Deleting From A B-Tree

Spring 2006 Copyright (c) All rights reserved Leonard Wesley29 Deleting From A B-Tree Example #1 6, , 22 Delete , 22 Violates # subtrees = # keys +1 B-Tree Rule 4 Min = 1 Max = 2

Spring 2006 Copyright (c) All rights reserved Leonard Wesley30 Solution To Example #1 6, Min = 1 Max = 2

Spring 2006 Copyright (c) All rights reserved Leonard Wesley31 Deleting From A B-Tree Example #2 6, 17 2, 4 10, 12 19, 22 Delete 22 10, Violates # keys !< MIN B-Tree Property Min = 2 Max = 4 6, 17 2, 4

Spring 2006 Copyright (c) All rights reserved Leonard Wesley32 Solution #1 For Example #2 2, 4 10, Min = 2 Max = 4 6, 17 2, 4 10, 12, 17, 19 6 Case 3 Solution: combine subset [i] with subset[i-1] If excess entries in siblings are not available pg. 561 Main & Savitch

Spring 2006 Copyright (c) All rights reserved Leonard Wesley33 Solution #2 To Fix A Shortage  Case 1: Transfer an extra entry from subset[i-1] to subset[i] pg 560 Main & Savitch 2, 4 10, 12, , 17 2, 4 10, 12 17, 19 6, 15 Min = 2 Max = 4

Spring 2006 Copyright (c) All rights reserved Leonard Wesley34 Solution #3 To Fix A Shortage  Case 2: Transfer an extra entry from subset[i+1] Pg 561 Main & Savitch 2, , 21, 22 6, 17 2, 4 10, 17 21, 22 6, 19

Spring 2006 Copyright (c) All rights reserved Leonard Wesley35 Deleting From A B-Tree (Loose Erase) 1.Make a local variable, i, = first index such that data[i] is !< target to delete. If there is no such index, then set i = to data_count, indicating that all of the entries are less than the target. 2.Deal with one of the following four possibilities: a. Root has no children, and we did not find the target (i.e., noting to do) b. Root has no children, and we found the target. Just remove target. c. Root has children, did not find target in root. Make recursive call to search subset[i]. d. Root has children, found target in root. Remove largest from subset[i], insert into data[i]. Elaborate on 2c and 2d on following slides …

Spring 2006 Copyright (c) All rights reserved Leonard Wesley36 Delete From B-Tree: Elaborate 2c  Target not found in root node, but target might be in subset[i]. Make recursive call subset[i]->loose_erase(target)  This will remove the target from subset[i] if it is in subset[i]. If so, then subset[i] might have < MIN entries. If so, then it needs to be fixed. subset[i]->fix_shortage(size_t i); Will discuss later

Spring 2006 Copyright (c) All rights reserved Leonard Wesley37 Delete From B-Tree: Elaborate 2d  Target is found in root node, but cannot be remove because there are children. subset[i]->loose_erase(target)  Go to subset[i] and remove the largest item in the subset. Create a copy of this largest item and insert it in data[i] (which contains the target) In effect this removes the target. However, removing the largest can cause a shortage. If so, call subset[i]->fix_shortage(i); Will discuss NOW!!

Spring 2006 Copyright (c) All rights reserved Leonard Wesley38 Fix Shortage  Case 1: If subset[i-1] has extra entries, then transfer the entry to subset[i] (pg 560 Main & Savitch) Transfer data[i-1] (i.e., 17) down to the front of subset[i]->data Shift over as necessary & update data count Transfer the final item of subset[i-1] (i.e., 15) up to replace data[i-1] and update data_count If subset[i-1] has children, transfer the final child of subset[i-1] over to the front of subset[i] … update data_count 2, 4 10, 12, , 17 2, 4 10, 12 17, 19 6, 15

Spring 2006 Copyright (c) All rights reserved Leonard Wesley39 Fix Shortage (cont.)  Case 2: If subset[i+1] has extra entries, then transfer the entry to subset[i] (pg 561 Main & Savitch) Similar to Case 1 2, , 21, 22 6, 17 2, 4 10, 17 21, 22 6, 19

Spring 2006 Copyright (c) All rights reserved Leonard Wesley40 Fix Shortage (cont.)  Case 3: Combine subset[i] with subset[i-1] (pg 561 Main & Savitch) If subset[i-1] is present (i.e., i > 0) but subset[i-1] only has the minimum # items/keys (i.e., no excess keys/items). Transfer data[i-1] down from the end of subset[i-1]->data …(see a pg 562) Transfer all of the items and children from subset[i] to the end of subset[i-1] … (see b pg 562) Delete the node subset[1] and shift subset[i+1], subset[i+2], and so on left… (see c pg 562) 2, 4 10, , 17 2, 4 10, 12, 17, 19 6 Deleted 22

Spring 2006 Copyright (c) All rights reserved Leonard Wesley41 In Class Delete Example #2 Go through Loose Erase Section In Main & Savitch pg. 558.