B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.

Slides:



Advertisements
Similar presentations
Chapter 4: Trees Part II - AVL Tree
Advertisements

Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
CSE332: Data Abstractions Lecture 9: BTrees Tyler Robison Summer
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
1 B trees Nodes have more than 2 children Each internal node has between k and 2k children and between k-1 and 2k-1 keys A leaf has between k-1 and 2k-1.
Other time considerations Source: Simon Garrett Modifications by Evan Korth.
CS 206 Introduction to Computer Science II 12 / 03 / 2008 Instructor: Michael Eckmann.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
CS 206 Introduction to Computer Science II 11 / 24 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 2) Lecture 21 COMP171 Fall 2006.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Splay Trees and B-Trees
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
Advanced Data Structures and Algorithms COSC-600 Lecture presentation-6.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
More Trees Multiway Trees and 2-4 Trees. Motivation of Multi-way Trees Main memory vs. disk ◦ Assumptions so far: ◦ We have assumed that we can store.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 6.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
CSE373: Data Structures & Algorithms Lecture 15: B-Trees Linda Shapiro Winter 2015.
Oct 29, 2001CSE 373, Autumn External Storage For large data sets, the computer will have to access the disk. Disk access can take 200,000 times longer.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
COSC 2007 Data Structures II Chapter 15 External Methods.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Starting at Binary Trees
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
AVL-Trees (Part 1) COMP171. AVL Trees / Slide 2 * Data, a set of elements * Data structure, a structured set of elements, linear, tree, graph, … * Linear:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
CS 206 Introduction to Computer Science II 04 / 22 / 2009 Instructor: Michael Eckmann.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
B+ Trees  What if you have A LOT of data that needs to be stored and accessed quickly  Won’t all fit in memory.  Means we have to access your hard.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Internal and External Sorting External Searching
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
COMP261 Lecture 23 B Trees.
Unit 9 Multi-Way Trees King Fahd University of Petroleum & Minerals
B-Trees B-Trees.
Multiway Search Trees Data may not fit into main memory
B-Trees B-Trees.
B+-Trees.
B+-Trees.
B+-Trees.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Trees 4 The B-Tree Section 4.7
B+-Trees (Part 1).
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
CSIT 402 Data Structures II With thanks to TK Prasad
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees.
Presentation transcript:

B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006

AVL Trees / Slide 2 Contents * Why B + Tree? * B + Tree Introduction * Searching and Insertion in B + Tree

AVL Trees / Slide 3 Motivation * AVL tree with N nodes is an excellent data structure for searching, indexing, etc. n The Big-Oh analysis shows most operations finishes within O(logN) time * The theoretical conclusion works as long as the entire structure can fit into the main memory * When the data size is too large and has to reside on disk, the performance of AVL tree may deteriorate rapidly

AVL Trees / Slide 4 A Practical Example * A 500-MIPS machine, with 7200 RPM hard disk n 500 million instruction executions, and approximately 120 disk accesses each second * The machine is shared by 20 users n Thus for each user, can handle 120/20=6 disk access/sec * A database with 10,000,000 items, n 256 bytes/item (assume it doesn’t fit in main memory) n The typical searching time for one user  A successful search need log_{base 2} 10,000,000 = 24 disk access,  Takes around 24/6=4 sec.  This is way too slow!! * We want to reduce the number of disk access to a very small constant

AVL Trees / Slide 5 From Binary to M-ary * Idea: allow a node in a tree to have many children n Less disk access = smaller tree height = more branching * As branching increases, the depth decreases * An M-ary tree allows M-way branching n Each internal node has at most M children * A complete M-ary tree has height that is roughly log M N instead of log 2 N n if M = 20, then log < 5 n Thus, we can speedup the search significantly

AVL Trees / Slide 6 M-ary Search Tree * Binary search tree has one key to decide which of the two branches to take * M-ary search tree needs M-1 keys to decide which branch to take * M-ary search tree should be balanced in some way too n We don’t want an M-ary search tree to degenerate to a linked list, or even a binary search tree  Thus, require that each node is at least ½ full!

AVL Trees / Slide 7 B + Tree * A B + -tree of order M (M>3) is an M-ary tree with the following properties: 1. The data items are stored in leaves 2. The root is either a leaf or has between two and M children 3. The non-leaf nodes store up to M-1 keys to guide the searching; key i represents the smallest key in subtree i+1 4. All non-leaf nodes (except the root) have between  M/2  and M children 5. All leaves are at the same depth and have between  L/2  and L data items, for some L (usually L << M, but we will assume M=L in most examples) Note there are vairous defintions of B-trees, but mostly in minor ways. The above definsion is one of the popular forms.

AVL Trees / Slide 8 Keys in Internal Nodes * Which keys are stored at the internal nodes? n There are several ways to do it. Different books adopt different conventions. * We will adopt the following convention: n key i in an internal node is the smallest key in its i+1 subtree (i.e. right subtree of key i) * Even following this convention, there is no unique B + -tree for the same set of records.

AVL Trees / Slide 9 B + Tree Example 1 (M=L=5) * Records are stored at the leaves (we only show the keys here) * Since L=5, each leaf has between 3 and 5 data items * Since M=5, each nonleaf nodes has between 3 to 5 children * Requiring nodes to be half full guarantees that the B+ tree does not degenerate into a simple binary tree

AVL Trees / Slide 10 B + Tree Example 2 (M=L=4) * We can still talk about left and right child pointers * E.g. the left child pointer of N is the same as the right child pointer of J * We can also talk about the left subtree and right subtree of a key in internal nodes

AVL Trees / Slide 11 B+ Tree in Practical Usage * Each internal node/leaf is designed to fit into one I/O block of data. An I/O block usually can hold quite a lot of data. Hence, an internal node can keep a lot of keys, i.e., large M. This implies that the tree has only a few levels and only a few disk accesses can accomplish a search, insertion, or deletion. * B + -tree is a popular structure used in commercial databases. To further speed up the search, the first one or two levels of the B + -tree are usually kept in main memory. * The disadvantage of B + -tree is that most nodes will have less than M-1 keys most of the time. This could lead to severe space wastage. Thus, it is not a good dictionary structure for data in main memory. * The textbook calls the tree B-tree instead of B + -tree. In some other textbooks, B-tree refers to the variant where the actual records are kept at internal nodes as well as the leaves. Such a scheme is not practical. Keeping actual records at the internal nodes will limit the number of keys stored there, and thus increasing the number of tree levels.

AVL Trees / Slide 12 Searching Example * Suppose that we want to search for the key K. The path traversed is shown in bold.

AVL Trees / Slide 13 Searching Algorithm * Let x be the input search key. * Start the searching at the root * If we encounter an internal node v, search (linear search or binary search) for x among the keys stored at v n If x < K min at v, follow the left child pointer of K min n If K i ≤ x < K i+1 for two consecutive keys K i and K i+1 at v, follow the left child pointer of K i+1 n If x ≥ K max at v, follow the right child pointer of K max * If we encounter a leaf v, we search (linear search or binary search) for x among the keys stored at v. If found, we return the entire record; otherwise, report not found.

AVL Trees / Slide 14 Insertion Procedure * Suppose that we want to insert a key K and its associated record. * Search for the key K using the search procedure * This will bring us to a leaf x * Insert K into x n Splitting (instead of rotations in AVL trees) of nodes is used to maintain properties of B + -trees [next slide]

AVL Trees / Slide 15 Insertion into a Leaf * If leaf x contains < L keys, then insert K into x (at the correct position in node x) * If x is already full (i.e. containing L keys). Split x n Cut x off from its parent n Insert K into x, pretending x has space for K. Now x has L+1 keys. n After inserting K, split x into 2 new leaves x L and x R, with x L containing the  (L+1)/2  smallest keys, and x R containing the remaining  (L+1)/2  keys. Let J be the minimum key in x R n Make a copy of J to be the parent of x L and x R, and insert the copy together with its child pointers into the old parent of x.

AVL Trees / Slide 16 Inserting into a Non-full Leaf (L=3)

AVL Trees / Slide 17 Splitting a Leaf: Inserting T

AVL Trees / Slide 18 Splitting Example 1

AVL Trees / Slide 19 l Two disk accesses to write the two leaves, one disk access to update the parent l For L=32, two leaves with 16 and 17 items are created. We can perform 15 more insertions without another split

AVL Trees / Slide 20 Splitting Example 2

AVL Trees / Slide 21 Cont’d => Need to split the internal node

AVL Trees / Slide 22 Splitting an Internal Node To insert a key K into a full internal node x: * Cut x off from its parent * Insert K and its left and right child pointers into x, pretending there is space. Now x has M keys. * Split x into 2 new internal nodes x L and x R, with x L containing the (  M/2  - 1 ) smallest keys, and x R containing the  M/2  largest keys. Note that the (  M/2  )th key J is not placed in x L or x R * Make J the parent of x L and x R, and insert J together with its child pointers into the old parent of x.

AVL Trees / Slide 23 Example: Splitting Internal Node (M=4)

AVL Trees / Slide 24 Cont’d

AVL Trees / Slide 25 Termination * Splitting will continue as long as we encounter full internal nodes * If the split internal node x does not have a parent (i.e. x is a root), then create a new root containing the key J and its two children