CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees
Today’s Goal Look at advanced Tree structures Part of most databases, operating systems Anywhere there is lot of data to be held Already examined related (2,4) trees Now look at more general definition Also examine why we should care
Big-Oh notation not always accurate For example, treats memory accesses equally But many different memories inside machine Organized in a pyramid Higher == faster Lower == cheaper (Cheaper also means more memory available) register L1 cache main memory (RAM) hard drive L2 cache Lies My Professor Told Me
Hierarchy In Perspective Suppose the processor needs a beverage Registers -- Drink from the mug in its hand L1 Cache -- Get from a case in the fridge L2 Cache -- Get from tapped barrel in the cellar Main memory -- Purchase corner Wilson Farms Hard drive -- Drive to closest brewery & buy vat Network -- Go to Germany & buy Bavaria
Waiting Is a Pain
Not All Access Are Equal Want to limit access to lowest possible level Easy when we are only using a few objects Difficult when working with non-trivial data sets Two common approaches to avoid the wait Caching -- hold data from hard drive in RAM Usually stores most recently or frequently used data Locality -- organize data to limit amount used By matching internal storage to improve cache effectiveness
Virtual Memory “Extends” RAM by using space on hard drive Big win if we rarely access the material on disk Incredibly slow if always stuck driving to brewery Works by dividing memory into pages Each page is a constant size (usually 4096 bytes) Operating system handles memory at page level Limits overhead and maximizes efficiency Evicts unused pages to the hard drive for storage Reloads pages when it is then accessed
Problems with Binary Trees Good way to organize information Provides consistent O (log n ) processing times Organization is very bad for locality, however Nodes contain only 1 piece of data Must then jump to one of its two children Nodes can get randomly spread over heap Good torture test for roommates computer (2,4) trees provide some improvement Still have at most 3 elements & 4 children Does not use anything like 4096 bytes in a page
( a, b ) Trees to the Rescue! Real-world solution to killing disks by paging Linux & MacOS to track files & directories Organization used by MySQL & other databases Found in many other places where paging occurs (2,4) trees are one example of these Can also create others, just follow the rules All leaves are found at same level of the tree All internal nodes but root have at least a children All internal nodes have at most b child Nodes
Improving Locality For (2,4) trees, a == 2 and b == 4 Process of splitting and merging nodes still holds We only vary the number of children in Node Minimize paging using good size for a & b Store all the elements in an additional dictionary Make sure full node, including dictionary and child references fill a page Limit number of nearly empty pages by selecting reasonable value for a
Insertion Always insert data into a leaf node Once inserted check for overflow! Trying to make larger than allowed Example: insert(30)
Split In Case Of Overflow Split overflowing Node 2 new nodes Promote median element to the parent Node Divide remaining elements into the two new Nodes This may cause parent Node to overflow So must repeat the process until we hit the root If the root node overflows, we create a new root!
Parent Overflow Example: insert(29)
Parent Overflow Example: insert(29)
Underflow and Fusion Deleting Entry may cause underflow Two possible solutions depending on situation Example: remove(15)
6 8 Case 1: Transfer Has adjacent sibling with elements to spare Steal closest Entry from parent & sibling’s child Parent takes sibling’s closest Entry We’re done Example: remove(10)
Case 2: Fusion Emptied node has siblings of minimum size Merge node & sibling into one Steal Entry from parent that was between siblings May propagate underflow to parent! Example: remove(15)
For Next Lecture Look at most popular version of ( a, b )Tree How a BTree is implemented Ways of reading an writing these trees to disk