btrees21 B-trees: The rest of the story
btrees22 Review of B-tree rules All nodes except root must have at least MINIMUM data entries No node may exceed MAXIMUM data entries (MAXIMUM is MINIMUM * 2) Entries in individual nodes are sorted The number of subtrees below any non-leaf node is one more than the number of entries in that node
btrees23 Review of B-tree rules In any non-leaf node, for any index n: –the entry at n is greater than all entries in subtree[n] –the entry at n is less than all entries in subtree[n+1] Every leaf node has the same depth
btrees24 Private member variables describe root node data: an array containing from 0 to MAXIMUM + 1 data entries (tree valid when there are 1.. MAXIMUM entries) count: contains a tally of the number of entries in the data array
btrees25 Private member variables describe root node subset: an array of pointers to from 0 to MAXIMUM + 2 subtrees (tree valid when there are MAXIMUM + 1 or fewer subtrees and the number of subtrees is one greater than the number of data entries) children: contains a tally of the number of entries in the subset array
btrees26 Insertion Implementation of insertion function involved a temporary relaxation of the rules, allowing the root node to end up with MAXIMUM + 1 data entries If such a condition occurs, the node is split into three nodes: a new root node with a single entry, and two subtrees each containing half the data (and half the subtrees) of the original root node
btrees27 Insertion example MINIMUM = 1 MAXIMUM = 2 Data entered in this order: 0,1,2,3,4,5,6,7,8 012 Entries exceed MAXIMUM; split node and grow tree upward Child node has too many entries; Split node in two, sending middle entry up to root Continue adding entries, splitting nodes and growing tree upward when necessary , Regardless of data entry order, tree will remain balanced
btrees28 Code for insertion template bool Set ::insert (const item& entry) { // do loose_insert; if entry added, check for excess if(loose_insert(entry)) // returns false { // if entry already in tree if (count > MAXIMUM) {// copy info from root Set *child = new Set; for(int x=0; x<count; x++) child->data[x]=data[x]; for(int y=0; y<children; y++) child->subset[y]=subset[y];
btrees29 Code for insertion child->children=children; child->count=count; // clear root node count=0; children=1; // former root becomes child of new root subset[0]=child; fix_excess(0); // split node to restore B-tree } // ends inner if return true; // insertion succeeded } // ends outer if return false; // if loose_insert failed, so did insert }
btrees210 Helper functions: loose_insert and fix_excess Loose_insert actually adds a data entry to the B-tree; may result in root having too many entries Fix_excess takes care of a problem node by splitting it into two subtrees and sending the middle data item up to root
btrees211 Code for loose_insert template int Set ::loose_insert (const item& entry) { int t; // find first item in data >= entry; save the index for (t=0; (t<count && data[t]<entry); t++); if (t<count && data[t]==entry) { // entry already in set -- not inserted cout << data[t] << " already in set" << endl; return false; }
btrees212 Code for loose_insert if (is_leaf()) // entry not found, {// root has no subtrees // add new entry at root -- // shift data right to make room for new entry for(int x=count; x>t; x--) data[x] = data[x-1]; count++; data[t] = entry; return true;// entry was inserted }
btrees213 Code for loose_insert else// entry wasn't found and node has children { // do loose_insert on appropriate subtree bool added = subset[t]->loose_insert(entry); // if loose_insert results in excess entries // in subtree, split node in two and add // middle data entry to subtree’s root if (subset[t]->count > MAXIMUM) fix_excess(t); return added; } // end else } // end loose_insert function
btrees214 Code for fix_excess template void Set ::fix_excess (item x) { int ct; // copy middle entry of child to root, // first making room in data array for (ct=count; ct>x; ct--) data[ct]=data[ct-1]; data[x]=subset[x]->data[MINIMUM]; count++;
btrees215 Code for fix_excess // split node in 2: Set *left, *right; // will hold child's old entries left=new Set; // allocate memory for right=new Set; // new sets left->count=MINIMUM; right->count=MINIMUM; for(ct=0; ct<MINIMUM; ct++) // copy data to new nodes { left->data[ct]=subset[x]->data[ct]; right->data[ct]=subset[x]->data[ct+MINIMUM+1]; }
btrees216 Code for fix_excess if(!(subset[x]->is_leaf()))// copy subsets if any exist { int chct=(subset[x]->children)/2; for(ct=0; ct<chct; ct++) { left->subset[ct]=subset[x]->subset[ct]; right->subset[ct]=subset[x]->subset[ct+chct]; } left->children=MINIMUM+1; right->children=MINIMUM+1; }
btrees217 Code for fix_excess // make room for new subset in root’s array of subsets subset[children]=new Set; for(ct=children; ct>x; ct--) subset[ct]=subset[ct-1]; children++; // attach new subtrees to root node subset[x]=left; subset[x+1]=right; } // ends fix_excess function
btrees218 Removing a B-tree entry Four functions involved; three are analogous to insertion functions: –remove: public function -- performs “loose” remove, then other functions as necessary to restore B-tree –loose_remove: performs actual removal of data entry; may leave B-tree invalid, with root node having 0 or subtree root having MINIMUM-1 entries
btrees219 Removing a B-tree entry Additional removal functions: –fix_shortage: deals with the problem of a subtree’s root having MINIMUM-1 entries –remove_largest: helper function called by loose_remove to ensure that root node contains children-1 data entries; works by copying largest data value from a subtree into root
btrees220 Pseudocode for public remove function template bool Set ::remove(const item& target) { if (!(loose_remove(target)) return false; // target not found if (count == 0 && children ==1) // root was emptied by loose_erase: shrink the // tree by : //- setting temporary pointer to subset // - copying all member variables from // temp to root //- deleting original child node
btrees221 Pseudocode for loose_remove template bool Set ::loose_remove(const item& target) { find first index such that data[index]>=target; if no such index found, index=count if (target not found and is_leaf()) return false; if (target found and is_leaf()) remove target from data array; shift contents to the left and decrement count return true;
btrees222 Pseudocode for loose_remove if (target not found and root has children) { subset[index]->loose_remove(target); if(subset[index]->count < MINIMUM) fix_shortage(index); return true; }
btrees223 Pseudocode for loose_remove if (target found and root has children) { subset[index]->remove_largest(data[index]); if(subset[index]->count < MINIMUM) fix_shortage(index); return true; }
btrees224 Action of fix_shortage function In order to remedy a shortage of entries in subset[n], do one of the following: –borrow an entry from the node’s left neighbor (subset[n-1]) or right neighbor (subset[n+1]) if either of these two has more than MINIMUM entries –combine subset[n] with either of its neighbors if they don’t have excess entries to give
btrees225 Pseudocode for fix_shortage template void Set ::fix_shortage(int x) { if (subset[x-1]->count > MINIMUM) shift existing entries in subset[x] over one, copy data[x-1] to subset[x]->data[0] and increment subset[x]->count data[x-1] = last item in subset[x-1]->data and decrement subset[x-1]->count if(!(subset[x-1]->is_leaf())) transfer last child of subset[x-1] to front of subset[x], incrementing subset[x]->children and decrementing subset[x-1]->children
btrees226 Example 1 for fix_shortage MINIMUM = 2 x = 1
btrees227 Example 1 for fix_shortage MINIMUM = 2 x = 1
btrees228 Example 1 for fix_shortage MINIMUM = 2 x = 1
btrees229 Example 1 for fix_shortage MINIMUM = 2 x = 1
btrees230 Example 1 for fix_shortage MINIMUM = 2 x = 1
btrees231 Pseudocode for fix_shortage else if (subset[x+1]->count > MINIMUM) increment subset[x]->count and copy data[x] to subset[x]->data[subset[x]->count-1] data[x] = subset[x+1]->data[0] and shift entries in subset[x+1]->data to the left and decrement subset[x+1]->count if (!(subset[x+1]->is_leaf())) transfer first child of subset[x+1] to subset[x], incrementing subset[x]->children and decrementing subset[x+1]->children
btrees232 Example 2 for fix_shortage MINIMUM = 2 x = 1
btrees233 Example 2 for fix_shortage MINIMUM = 2 x = 1
btrees234 Example 2 for fix_shortage MINIMUM = 2 x = 1
btrees235 Example 2 for fix_shortage MINIMUM = 2 x = 1
btrees236 Example 2 for fix_shortage MINIMUM = 2 x = 1
btrees237 Pseudocode for fix_shortage else if (subset[x-1]->count == MINIMUM) add data[x-1] to the end of subset[x-1]->data shift data array leftward, decrementing count and incrementing subset[x-1]->count transfer all data items and children from subset[x] to end of subset[x-1]; update values of subset[x-1]->count and subset[x-1]->children, and set subset[x]->count and subset[x]->children to 0 delete subset[x] and shift subset array to the left and decrement children
btrees238 Example 3 for fix_shortage MINIMUM = 2 x = 1
btrees239 Example 3 for fix_shortage MINIMUM = 2 x = 1
btrees240 Example 3 for fix_shortage MINIMUM = 2 x = 1
btrees241 Example 3 for fix_shortage MINIMUM = 2 x = 1
btrees242 Example 3 for fix_shortage MINIMUM = 2 x = 1
btrees243 Pseudocode for fix_shortage else combine subset[x] with subset[x+1] -- work is similar to previous combination operation: borrow an entry from root and add to subset[x] transfer all private members from subset[x+1] to subset[x], and zero out subset[x+1]’s children and count variables delete subset[x-1] and update root’s subset information
btrees244 Example 4 for fix_shortage MINIMUM = 2 x = 0
btrees245 Example 4 for fix_shortage MINIMUM = 2 x = 0