Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiway Search Trees Data may not fit into main memory

Similar presentations


Presentation on theme: "Multiway Search Trees Data may not fit into main memory"— Presentation transcript:

1 Multiway Search Trees Data may not fit into main memory
BigO model no longer applies for performance analysis Cutting disk access by half will cut running time by half

2 Multiway Search Trees Very large database -- assume 10 million records
key = 32 bytes record = 32 bytes Performance Unbalanced binary search tree depth is O(n) in worst case (10 million disk accesses) in average case, 32 disk accesses 5 sec. processing time, assuming 160 msec per accesss Random search tree for nodes that are very deep, 100 disk accesses required (16 sec.) More branching => less height Need multiple children per node

3 M-ary trees General non-binary trees used as search trees, especially in external storage Each node can contain one or more keys Order n tree: each node contains at most n subtrees and contains at most n-1keys Ex: order 3

4 M-ary trees If nodes are not full, space is wasted
Advantage of using external storage I/O is slowest operation once data is read into memory, search cost is minimal time efficiency is improved at the expense of external space utilization size of each node is maximized (order 200 not uncommon) Question that must be addressed: how and where are records kept store records in tree nodes? associate pointers with keys to records and store pointers in nodes? store records in leaves? Variations of multiway search trees are used to index large databases A B-tree guarantees a search path never longer than (log n) + 1 (efficiency of a full binary tree) B+-tree B*-tree

5 B-Trees In general, a B-tree of order m is an m-way search tree that is either empty or is of height > 1 and satisfies the following properties: The root node is a leaf or it has at least 2 children All nodes other than the root node and leaves have between m/2 and m children. All leaf nodes are at the same level An m-way search tree, T, is a tree in which all nodes are of degree < m. A non empty T has the following properties: Interior nodes are of the type: P0, (K1,P1),(K2,P2), ...,(Kn,Pn ) where the Pi , 0  i  n are pointers to the subtrees of T and the Ki , 1i  n are key values; and 1  n  m Ki < Ki+1 , 1  i < n (i.e. keys are ordered) All key values in the subtree Pi are less than Ki+1 and greater than Ki-1 , 0 < i < n. All key values in the subtree Pn are greater than Kn and those in P0 are less than K1. The subtrees Pi , 0  i  n are also m-way search trees. Internal nodes store keys and pointers to child nodes; leaf nodes store actual records or keys and pointers

6 2-3 Tree A specialization of m-way search trees is the 2-3 tree (B-tree of order 3) Definition: A 2-3 tree is a B-tree with the following properties: A node has room to store at most two keys Each internal node has either two children (if it contains one key) or three children (if it contains two keys) All leaves are on the same level, so the tree is always height balanced For every node, values of all descendants in the left subtree are less than the value of the first key; values in the center subtree are greater than or equal to the value of the first key. If two keys exists, values of all descendant nodes in the center subtree are less than the value of the second key; values in the right subtree are greater or equal the value of the second key Example: Each node must have room for 2 keys and three pointers The 2-3 tree is seldom implemented

7 B+ Tree B+ tree is the most commonly implemented variant
B+-trees store records only in the leaf nodes internal nodes store key values; keys guide the search to the correct leaf node a leaf node in a B+-tree of order m can store more or less than m records depends on the size of a record vs the size of a key leaf pages must maintain at least 50% utilization (i.e., remain 1/2 full) leaf nodes are linked to allow sequential access Example B+-tree of order m= 3 with maximum page size L = 5:

8 Operations on B-Trees Find Insertion
Starting at root, do a tree walk; branch according to value of keys stored in each node. Insertion Always add the new element to a leaf node If the leaf node is full, the node must be split; a split may propagate splits all the way to the root General rules for insertion in a B-tree of order m If node contains m keys, node is split into two nodes with (m+1)/2 and (m+1)/2, respectively. If parent node has m children, it is split in same manner. Moving up tree, nodes are split if necessary until a parent with less than m children is found. If root is split, a new root with two children is created.

9 Insertion, M = 3, L = 5 Case 1: No split Case 2: Overflow in leaf node
Find insertion point by following appropriate branches to leaf node where new key belongs. Ex. insert 48 Case 2: Overflow in leaf node Create a new node. Distribute keys between the two nodes, two to each node. Move smallest key of new node up to parent node. Ex. Insert key 107 in (b) Tree is now one level deeper

10 Insertion [continued]
Case 3: Overflow in leaf node and in index node. Create a new leaf node and insert new key as in Case 2. If parent node is full, create new index node. Place smallest index key in old node, largest index key in new node, and move middle index key up to parent. Ex. insert key 95 in (d) Only way the tree can gain height: a node split propagates up the tree and splits the root

11 B* Trees B* trees differ from B+ trees in insertion algorithm
Alternate method if leaf node overflows Redistribute keys between siblings and update parent node. If sibling is also full, split the two pages into three Index nodes handled the same way Method costs more in coding, but uses less space; nodes tend to be at least 2/3 full

12 Deletion Algorithm Locate leaf containing record to be deleted
Remove the record if leaf node is more than half full Otherwise (underflow) if sibling is more than half full, borrow a key or keys from sibling so that each node has the same number of records If no sibling has enough keys to spare, merge the current node's keys into a sibling and remove node from the tree Update keys in parent nodes--if necessary, up to root If root must be deleted due to merging, tree loses one level When collapsing a pair of leaf nodes together, the right node should be removed, since the left node is pointed to by its left neighbor in the linked list of leaf nodes

13 Deletion: Example, L = 5, M = 3
Simple deletion: delete 75 Before Deletion After Deletion

14 Deletion: Example Deletion by barrowing: delete 81 Before Deletion
After Deletion

15 Deletion: Example Deletion by merging : delete 83 Before Deletion
After Deletion

16 Analysis of B-Trees Asymptotic cost of search, insertion and deletion from B-trees, B*-trees and B+-trees is (log n), where n=number of records. The maximum depth of the tree is log m/2 n Only half the available pointers are used at all levels At each node on path, O(log m) work is done to determine which branch to take (using binary search) Insert or Delete can require O(m) work to update all node information. Hence, worst case asymptotic cost for insert/delete is O(m logm n) = O((m (log2n/log2 m) = (log n) Find operation takes O(log n) time, but finding next record is only O(1) (involves accessing one additional leaf at most)

17 The Efficiency of B-Trees
Consider a B+-tree of order 100 and leaf nodes with up to 100 records A 1-level B+-tree can have at most 100 records A 2-level B+-tree must have at least 100 records (2 leaves w/50 records each) and at most 10,000 records (100 leaves w/100 records each) A 3-level B+-tree must have at least 5000 records (2 second-level nodes w/ 50 children containing 50 records each) and at most 1 million records (100 second level nodes with 100 full children each) A 4-level B+-tree must have at least 250k records and at most 100 million records. (not many databases bigger than this)

18 Search Tree Summary and Comparisons
Binary search trees are efficient index structures Balanced search trees provide means to search, insert, and delete entries using at most O(log n) time. Data can be accessed in order by traversing the tree Have important dynamic insertion and deletion capabilities B-trees compared with other trees Like BST, finding an element requires searching only a single path between root & leaf Like AVL, insertion & deletion algorithms guarantee that longest path between root and leaf is O(log n) Unlike AVL, height always of uniform depth Unlike Splay, never allowed to get deep Unlike Binary Tree, each node may contain many elements and may have many children

19 Example: M = 5, L = 5, Insert 85, 97 Before Insert 43, 75, 89, 110
12, 32, 34, 38,40 43, 54, 59, 71, 73 75, 78, 82, 84, 86 89, 92, 94, 96, 99 110, 112, 113, 120, 121 43, 75, 89, 110 After Insert 85 12, 32, 34, 38,40 43, 54, 59, 71, 73 84, 85, 86 89, 92, 94, 96, 99 110, 112, 113, 120, 121 43, 75, 84 110 75, 78, 82 89

20 Design Problem Design an efficient B+ tree data structure for the following database: 10 million records key = 32 bytes record = 256 bytes Disk: Block/page size = 4096 bytes Every disk access returns one block of data. Goal: Place data and keys in nodes such that every disk access returns at least one complete node of the tree Leaf Nodes: Every leaf node should be contained in one disk block, 256 bytes/record and 8192 bytes/block and yields 32 records per block, therefore L = 32. Every internal node contains (m-1) keys and m pointers. Since each key is 32 bytes, and assuming a pointer takes 4 bytes, each nodes will consist of 32(m-1) + 4m bytes. At least one node per disk block yields m = 228. Since each leaf may contain only L/2 records, maximum number of leaves = 625,000. Each internal node may have at least m/2 children, therefore 625,000 leaves can be grouped into B-tree of log114(625000) height, which is 4.

21


Download ppt "Multiway Search Trees Data may not fit into main memory"

Similar presentations


Ads by Google