CSE 214 – Computer Science II B-Trees Source: http://paulmirocha.com/resources/images/sketchbook/beetreecover.jpg
Coding Exam 3 Friday, 12/4 – Starting at 2:15 Topics: Stacks (using linked lists & arrays) Queues (using linked lists & arrays) Hash Tables (open address & chained) Heaps Sample exam is posted to schedule page note, past exams have not covered identical topics sample exam only has stacks & queues
Exam Review Session Thursday, 8 – 9:30 pm CS 2129
We’ve studied many types of trees already Binary Search Trees K-ary Trees Complete Trees Full Trees Octrees
There are many, many, more B-Trees Red-Black trees 2-3 Trees Etc. There are custom trees for solving many problems
Let’s examine on such tree: B-Trees There will be final exam questions on B-Trees conceptual only Note: this is perhaps the toughest topic we’ll cover this semester Why is it tough? complexity of implementation at first glance, benefits aren’t obvious Why are we covering it? they are an important technology (DBMSs love them)
B-Tree applications Foundation for database and file system data management Why are they used? O(log N) amortized time for accessing, insertion, deletion What’s amortized time? average time per calculation measure over a large number of operations
A B-Tree with Letters for data What characteristics do you notice about this B-Tree? it’s sorted, each node has multiple data points, nodes may have many children, the number of children is related to the amount of data a node has, all leaves are on the same level Ref: http://cis.stvincent.edu/html/tutorials/swd/btree/btree1.gif
The 6 B-Tree Rules The root can have as few as one element (or no elements if it has no children). Every other node has at least MINIMUM elements. The MAXIMUM number of elements in a node is twice the value of MINIMUM. The elements of each B-Tree node are stored in a partially filled array, sorted from the smallest element (at index 0) to the largest element (at the final used position of the array). The number of sub-trees below a non-leaf node is always one more than the number of elements in the node For any non-leaf node: (a) An element at index i is greater than the elements in sub-tree number i of the node, and (b) an element at index i is less than all the elements in sub-tree number i + 1 of the node. Every leaf in a B-tree has the same depth
A note about MAXIMUM & MINIMUM These values are selected through tuning What’s tuning? examining performance at runtime making appropriate changes to optimize MINIMUM selected based on: amount of data for the B-Tree likely operations to perform on tree may be in hundreds or thousands in a database Selecting MINIMUM affects: size of nodes depth of tree efficiency of various operations
The 6 B-Tree Rules The root can have as few as one element (or no elements if it has no children). Every other node has at least MINIMUM elements.
The 6 B-Tree Rules 2. The maximum number of elements in a node is twice the value of MINIMUM.
The 6 B-Tree Rules 3. The elements of each B-Tree node are stored in a partially filled array, sorted from the smallest element (at index 0) to the largest element (at the final used position of the array).
The 6 B-Tree Rules 4. The number of sub-trees below a non-leaf node is always one more than the number of elements in the node
The 6 B-Tree Rules 5. For any non-leaf node: (a) An element at index i is greater than all the elements in sub-tree number i of the node, and (b) an element at index i is less than all the elements in sub-tree number i + 1 of the node.
The 6 B-Tree Rules 6. Every leaf in a B-tree has the same depth
What would each node need? an array of data counter for data array of child nodes counter for children What should our B-Tree’s data be? any sortable object we can specify our own comparison criteria