Multiway Search Trees Data may not fit into main memory

Slides:



Advertisements
Similar presentations
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
Advertisements

B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
B+ Trees COMP
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
COSC 2007 Data Structures II Chapter 15 External Methods.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Starting at Binary Trees
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
COMP261 Lecture 23 B Trees.
Unit 9 Multi-Way Trees King Fahd University of Petroleum & Minerals
TCSS 342, Winter 2006 Lecture Notes
AA Trees.
Multilevel Indexing and B+ Trees
CS 728 Advanced Database Systems Chapter 18
B-Trees B-Trees.
Extra: B+ Trees CS1: Java Programming Colorado State University
Chapter 11: Multiway Search Trees
B+-Trees.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B+-Trees.
B+-Trees.
Lecture 22 Binary Search Trees Chapter 10 of textbook
B+ Tree.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Chapter Trees and B-Trees
Chapter Trees and B-Trees
B Tree Adhiraj Goel 1RV07IS004.
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
Topics covered (since exam 1):
Lecture 26 Multiway Search Trees Chapter 11 of textbook
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
(2,4) Trees /26/2018 3:48 PM (2,4) Trees (2,4) Trees
B- Trees D. Frey with apologies to Tom Anastasio
B- Trees D. Frey with apologies to Tom Anastasio
B-Tree.
B+-Trees (Part 1).
(2,4) Trees (2,4) Trees (2,4) Trees.
Multiway Trees Searching and B-Trees Advanced Tree Structures
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Topics covered (since exam 1, excluding PQ):
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
(2,4) Trees 2/15/2019 (2,4) Trees (2,4) Trees.
B- Trees D. Frey with apologies to Tom Anastasio
(2,4) Trees /24/2019 7:30 PM (2,4) Trees (2,4) Trees
(edited by Nadia Al-Ghreimil)
(2,4) Trees (2,4) Trees (2,4) Trees.
CSE 373: Data Structures and Algorithms
Multiway Search Tree (MST)
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
B-Trees.
CSE 326: Data Structures Lecture #10 B-Trees
CS210- Lecture 20 July 19, 2005 Agenda Multiway Search Trees 2-4 Trees
Presentation transcript:

Multiway Search Trees Data may not fit into main memory BigO model no longer applies for performance analysis Cutting disk access by half will cut running time by half

Multiway Search Trees Very large database -- assume 10 million records key = 32 bytes record = 32 bytes Performance Unbalanced binary search tree depth is O(n) in worst case (10 million disk accesses) in average case, 32 disk accesses 5 sec. processing time, assuming 160 msec per accesss Random search tree for nodes that are very deep, 100 disk accesses required (16 sec.) More branching => less height Need multiple children per node

M-ary trees General non-binary trees used as search trees, especially in external storage Each node can contain one or more keys Order n tree: each node contains at most n subtrees and contains at most n-1keys Ex: order 3

M-ary trees If nodes are not full, space is wasted Advantage of using external storage I/O is slowest operation once data is read into memory, search cost is minimal time efficiency is improved at the expense of external space utilization size of each node is maximized (order 200 not uncommon) Question that must be addressed: how and where are records kept store records in tree nodes? associate pointers with keys to records and store pointers in nodes? store records in leaves? Variations of multiway search trees are used to index large databases A B-tree guarantees a search path never longer than (log n) + 1 (efficiency of a full binary tree) B+-tree B*-tree

B-Trees In general, a B-tree of order m is an m-way search tree that is either empty or is of height > 1 and satisfies the following properties: The root node is a leaf or it has at least 2 children All nodes other than the root node and leaves have between m/2 and m children. All leaf nodes are at the same level An m-way search tree, T, is a tree in which all nodes are of degree < m. A non empty T has the following properties: Interior nodes are of the type: P0, (K1,P1),(K2,P2), ...,(Kn,Pn ) where the Pi , 0  i  n are pointers to the subtrees of T and the Ki , 1i  n are key values; and 1  n  m Ki < Ki+1 , 1  i < n (i.e. keys are ordered) All key values in the subtree Pi are less than Ki+1 and greater than Ki-1 , 0 < i < n. All key values in the subtree Pn are greater than Kn and those in P0 are less than K1. The subtrees Pi , 0  i  n are also m-way search trees. Internal nodes store keys and pointers to child nodes; leaf nodes store actual records or keys and pointers

2-3 Tree A specialization of m-way search trees is the 2-3 tree (B-tree of order 3) Definition: A 2-3 tree is a B-tree with the following properties: A node has room to store at most two keys Each internal node has either two children (if it contains one key) or three children (if it contains two keys) All leaves are on the same level, so the tree is always height balanced For every node, values of all descendants in the left subtree are less than the value of the first key; values in the center subtree are greater than or equal to the value of the first key. If two keys exists, values of all descendant nodes in the center subtree are less than the value of the second key; values in the right subtree are greater or equal the value of the second key Example: Each node must have room for 2 keys and three pointers The 2-3 tree is seldom implemented

B+ Tree B+ tree is the most commonly implemented variant B+-trees store records only in the leaf nodes internal nodes store key values; keys guide the search to the correct leaf node a leaf node in a B+-tree of order m can store more or less than m records depends on the size of a record vs the size of a key leaf pages must maintain at least 50% utilization (i.e., remain 1/2 full) leaf nodes are linked to allow sequential access Example B+-tree of order m= 3 with maximum page size L = 5:

Operations on B-Trees Find Insertion Starting at root, do a tree walk; branch according to value of keys stored in each node. Insertion Always add the new element to a leaf node If the leaf node is full, the node must be split; a split may propagate splits all the way to the root General rules for insertion in a B-tree of order m If node contains m keys, node is split into two nodes with (m+1)/2 and (m+1)/2, respectively. If parent node has m children, it is split in same manner. Moving up tree, nodes are split if necessary until a parent with less than m children is found. If root is split, a new root with two children is created.

Insertion, M = 3, L = 5 Case 1: No split Case 2: Overflow in leaf node Find insertion point by following appropriate branches to leaf node where new key belongs. Ex. insert 48 Case 2: Overflow in leaf node Create a new node. Distribute keys between the two nodes, two to each node. Move smallest key of new node up to parent node. Ex. Insert key 107 in (b) Tree is now one level deeper

Insertion [continued] Case 3: Overflow in leaf node and in index node. Create a new leaf node and insert new key as in Case 2. If parent node is full, create new index node. Place smallest index key in old node, largest index key in new node, and move middle index key up to parent. Ex. insert key 95 in (d) Only way the tree can gain height: a node split propagates up the tree and splits the root

B* Trees B* trees differ from B+ trees in insertion algorithm Alternate method if leaf node overflows Redistribute keys between siblings and update parent node. If sibling is also full, split the two pages into three Index nodes handled the same way Method costs more in coding, but uses less space; nodes tend to be at least 2/3 full

Deletion Algorithm Locate leaf containing record to be deleted Remove the record if leaf node is more than half full Otherwise (underflow) if sibling is more than half full, borrow a key or keys from sibling so that each node has the same number of records If no sibling has enough keys to spare, merge the current node's keys into a sibling and remove node from the tree Update keys in parent nodes--if necessary, up to root If root must be deleted due to merging, tree loses one level When collapsing a pair of leaf nodes together, the right node should be removed, since the left node is pointed to by its left neighbor in the linked list of leaf nodes

Deletion: Example, L = 5, M = 3 Simple deletion: delete 75 Before Deletion After Deletion

Deletion: Example Deletion by barrowing: delete 81 Before Deletion After Deletion

Deletion: Example Deletion by merging : delete 83 Before Deletion After Deletion

Analysis of B-Trees Asymptotic cost of search, insertion and deletion from B-trees, B*-trees and B+-trees is (log n), where n=number of records. The maximum depth of the tree is log m/2 n Only half the available pointers are used at all levels At each node on path, O(log m) work is done to determine which branch to take (using binary search) Insert or Delete can require O(m) work to update all node information. Hence, worst case asymptotic cost for insert/delete is O(m logm n) = O((m (log2n/log2 m) = (log n) Find operation takes O(log n) time, but finding next record is only O(1) (involves accessing one additional leaf at most)

The Efficiency of B-Trees Consider a B+-tree of order 100 and leaf nodes with up to 100 records A 1-level B+-tree can have at most 100 records A 2-level B+-tree must have at least 100 records (2 leaves w/50 records each) and at most 10,000 records (100 leaves w/100 records each) A 3-level B+-tree must have at least 5000 records (2 second-level nodes w/ 50 children containing 50 records each) and at most 1 million records (100 second level nodes with 100 full children each) A 4-level B+-tree must have at least 250k records and at most 100 million records. (not many databases bigger than this)

Search Tree Summary and Comparisons Binary search trees are efficient index structures Balanced search trees provide means to search, insert, and delete entries using at most O(log n) time. Data can be accessed in order by traversing the tree Have important dynamic insertion and deletion capabilities B-trees compared with other trees Like BST, finding an element requires searching only a single path between root & leaf Like AVL, insertion & deletion algorithms guarantee that longest path between root and leaf is O(log n) Unlike AVL, height always of uniform depth Unlike Splay, never allowed to get deep Unlike Binary Tree, each node may contain many elements and may have many children

Example: M = 5, L = 5, Insert 85, 97 Before Insert 43, 75, 89, 110 12, 32, 34, 38,40 43, 54, 59, 71, 73 75, 78, 82, 84, 86 89, 92, 94, 96, 99 110, 112, 113, 120, 121 43, 75, 89, 110 After Insert 85 12, 32, 34, 38,40 43, 54, 59, 71, 73 84, 85, 86 89, 92, 94, 96, 99 110, 112, 113, 120, 121 43, 75, 84 110 75, 78, 82 89

Design Problem Design an efficient B+ tree data structure for the following database: 10 million records key = 32 bytes record = 256 bytes Disk: Block/page size = 4096 bytes Every disk access returns one block of data. Goal: Place data and keys in nodes such that every disk access returns at least one complete node of the tree Leaf Nodes: Every leaf node should be contained in one disk block, 256 bytes/record and 8192 bytes/block and yields 32 records per block, therefore L = 32. Every internal node contains (m-1) keys and m pointers. Since each key is 32 bytes, and assuming a pointer takes 4 bytes, each nodes will consist of 32(m-1) + 4m bytes. At least one node per disk block yields m = 228. Since each leaf may contain only L/2 records, maximum number of leaves = 625,000. Each internal node may have at least m/2 children, therefore 625,000 leaves can be grouped into B-tree of log114(625000) height, which is 4.