Download presentation
Presentation is loading. Please wait.
Published byPhilomena Daniel Modified over 9 years ago
1
Comp 335 File Structures B - Trees
2
Introduction Simple indexes provided a way to directly access a record in an entry sequenced file thereby decreasing the number of seeks to disk. WE ASSUMED THE INDEX COULD BE LOADED INTO MEMORY! What if an index is too large to be loaded entirely into memory?
3
Introduction Assume a data file with 1,000,000 records and has an associated index file containing the 1,000,000 primary keys. Observations: Too large an index to store in memory Finding a key using a binary search will take in the worst case 20 accesses. Since the index is not loaded into memory, each access could require a seek to disk. If a record was added to file then a new primary key entry must be made into the index and placed in the correct location. This will require much seeking to move the records around. The same scenario would occur for a deletion, many records moved around requiring much seeking.
4
Introduction The previous scenario highlights two major problems with standard indexes: 1) Binary searches requires to many accesses to be acceptable if each access required a seek. 2) It is very expensive to maintain an index in order considering additions and deletions.
5
Possible Solution Storing an Index as a BST Instead of “ordering” the index where the logical and physical ordering is the same, store the index as a binary search tree. Advantage – Less expensive to maintain the index (do not have to move records around) Disadvantage – If tree gets out of balance then search efficiency decreases resulting in more accesses which could mean more seeks.
6
Possible Solution Store index in an AVL tree An AVL tree keeps the BST property but maintains balance. Balanced tree – the amount of height difference between two subtrees sharing the root is at most 1. An AVL tree maintains balance by doing rotations. Advantage – can guarantee at least a logarithmic efficiency Disadvantage – still can call for way many accesses (even though it is logarithmic) Visualize an AVL Tree
7
Possible Solution Store index in a B Tree WE MUST DECREASE THE SEEKS TO GET TO OUR KEY IN THE INDEX. Bayer and McCreight solved the problem by developing what is now known as the B Tree. (What the B stands for – nobody knows!) Solution: Determine how many key/reference pairs can be loaded into an operating system page. (Page – the amount of disk memory which can be swapped in and out of main memory; sector size). Build the tree from bottom-up instead of the traditional top-down approach. This technique brought about “self balancing”.
8
B Tree Terminology Page size – number of key/reference pairs which can be stored in a page Order – The number of page pointers stored in a page Page Size = Order – 1 Split – when a page overflows; this is what occurs Promotion – on a split; a key is moved up a level in the tree
9
B Tree Terminology Minimum page size – all pages in a b tree (except) for the root are guaranteed to have at least a minimum number of key/reference pairs in a page; the minimum page size will be trunc(p/2) where p = page size. Redistribution – occurs mainly during deletion (can happen during insertion); when a node falls below it’s minimum number of keys, keys can be rotated and moved into the page from it’s parent page and sibling page Concatenation – occurs when a page underflows and no sibling has more than the minimum number of keys; pages are combined thus removing one page from the tree.
10
Depth of a B Tree Worst case scenario – each node has the minimum number of keys and root has one key. Formula: d <= 1 + log ceil(m/2) ((N+1)/2)
11
Comparing B Tree vs Full Binary Tree Example: 2,000,000 keys, order = 512 B Tree Analysis d <= 1 + log ceil(512/2) ((2,000,000+1)/2) d <= 1 + log 256 (1,000,000.5) d <= 1 + log(1,000,000.5)/log(256) d <= 1 + 2.49 d <= 3.49 d is an upper bound, therefore the worst case depth of the tree is 3 levels. This means 3 disk accesses in the worst case.
12
Comparing B Tree vs Full Binary Tree Example: 2,000,000 keys, order = 2 Complete (Balanced) Binary Tree Analysis d <= 1 + log 2 (2,000,000) d <= 1 + log(2,000,000)/log(2) d <= 1 + 20.93 d <= 21.93 d is an upper bound, therefore the depth of the tree is 21 levels which could mean in the worst case 21 disk accesses.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.