Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 326 Killer Bee-Trees David Kaplan Dept of Computer Science & Engineering Autumn 2001 Where was that root?

Similar presentations


Presentation on theme: "CSE 326 Killer Bee-Trees David Kaplan Dept of Computer Science & Engineering Autumn 2001 Where was that root?"— Presentation transcript:

1 CSE 326 Killer Bee-Trees David Kaplan Dept of Computer Science & Engineering Autumn 2001 Where was that root?

2 B-TreesCSE 326 Autumn 20012 Beyond Binary Trees  One of the most important applications for search trees is databases  If the DB is small enough to fit into RAM, almost any scheme for balanced trees (e.g. AVL) is okay 1980 RAM – 1MB DB – 100 MB 2000 (WalMart) RAM – 1,000 MB (gigabyte) DB – 1,000,000 MB (terabyte) gap between disk and main memory growing!

3 B-TreesCSE 326 Autumn 20013 Time Gap  For many corporate and scientific databases, the search tree must mostly be on disk  Accessing disk 200,000x slower than RAM!  Visiting node  accessing disk  Even perfectly balanced binary trees a disaster! log 2 ( 10,000,000 ) = 24 disk accesses  What matters here? The relative speeds of memory and disk operations Goal: Decrease Height of Tree

4 B-TreesCSE 326 Autumn 20014 Food Pyramid for Silicon-Based Lifeforms ferro-magnetic off-chip silicon on-chip silicon speed space memory cache disk “seams”

5 B-TreesCSE 326 Autumn 20015 Interpreting the Food Pyramid Speed becomes plentiful as we move up the pyramid Space gets (dirt) cheap as we move down the pyramid Physical seams  Disk access ~10 5 x slower than memory access  Memory access ~10 2 x slower than cache access Computational seam  Serialization/de-serialization can impose ~10x penalty Seams argue for data structures like B-Trees, because seams make the constants matter!

6 B-TreesCSE 326 Autumn 20016 M-ary Search Tree  Maximum branching factor of M  Complete tree has depth = log M N  Each internal node in a complete tree has M - 1 keys runtime:

7 B-TreesCSE 326 Autumn 20017 B-Trees  B-Trees are specialized M-ary search trees  Each node has many keys  subtree between two keys x and y contains values v such that x  v < y  binary search within a node to find correct subtree  Each node takes one full {page, block, line} of memory 371221 x<3 3  x<77  x<1212  x<2121  x

8 B-TreesCSE 326 Autumn 20018 B-Tree Properties ‡  Properties  maximum branching factor of M  the root has between 2 and M children or at most L keys  other internal nodes have between  M/2  and M children  internal nodes contain only search keys (no data)  smallest datum between search keys x and y equals x  each (non-root) leaf contains between  L/2  and L keys  all leaves are at the same depth  Result  tree is  (log n) deep  all operations run in  (log n) time  operations pull in about M or L items at a time ‡ Technically, B + -Trees

9 B-TreesCSE 326 Autumn 20019 B-Tree Properties  Properties  maximum branching factor of M  the root has between 2 and M children or at most L keys  other internal nodes have between  M/2  and M children  internal nodes contain only search keys (no data)  smallest datum between search keys x and y equals x  each (non-root) leaf contains between  L/2  and L keys  all leaves are at the same depth  Result  tree is  (log n) deep  all operations run in  (log n) time  operations pull in about M or L items at a time

10 B-TreesCSE 326 Autumn 200110 B-Tree Properties  Properties  maximum branching factor of M  the root has between 2 and M children or at most L keys  other internal nodes have between  M/2  and M children  internal nodes contain only search keys (no data)  smallest datum between search keys x and y equals x  each (non-root) leaf contains between  L/2  and L keys  all leaves are at the same depth  Result  tree is  (log n) deep  all operations run in  (log n) time  operations pull in about M or L items at a time

11 B-TreesCSE 326 Autumn 200111 B-Tree Properties  Properties  maximum branching factor of M  the root has between 2 and M children or at most L keys  other internal nodes have between  M/2  and M children  internal nodes contain only search keys (no data)  smallest datum between search keys x and y equals x  each (non-root) leaf contains between  L/2  and L keys  all leaves are at the same depth  Result  tree is  (log n) deep  all operations run in  (log n) time  operations pull in about M or L items at a time

12 B-TreesCSE 326 Autumn 200112 When constants matter, or … When Big-O is Not Enough B-Tree is about log M/2 n/(L/2) deep = log M/2 n - log M/2 L/2 = O(log M/2 n) = O(log n) steps per operation (same as BST!) Where’s the beef?! log 2 ( 10,000,000 ) = 24 disk accesses log 200/2 ( 10,000,000 ) < 4 disk accesses Recall: disk access ~ 200,000x slower than RAM access, so 4 vs. 24 disk accesses is huge!

13 B-TreesCSE 326 Autumn 200113 … __ k1k1 k2k2 … kiki B-Tree Node Structure Internal node  i search keys; i+1 subtrees; M - i - 1 inactive entries Leaf  j data keys; L - j inactive entries k1k1 k2k2 … kjkj … __ 12M - 1 12L i j

14 B-TreesCSE 326 Autumn 200114 Example B-Tree B-Tree with M = 4 and L = 4 12 3569101112 1517 202526 303233364042 506070 1040 3 15203050

15 B-TreesCSE 326 Autumn 200115 Making a B-Tree The empty B-Tree M = 3 L = 2 3 Insert(3) 314 Insert(14) Now, Insert(1)?

16 B-TreesCSE 326 Autumn 200116 Splitting the Root And create a new root 13 14 13 13 3 Insert(1) Too many keys in a leaf! So, split the leaf.

17 Insertions and Split Ends Insert(59) 14 13 59 14 13 Insert(26) 14 13 26 59 142659 1459 13142659 And add a new child Too many keys in a leaf! So, split the leaf.

18 Propagating Splits 1459 13142659 1459 13142659 5 135 Insert(5) 514 2659 135 5 5 135 142659 14 Add new child Create a new root Too many keys in an internal node! So, split the node.

19 B-TreesCSE 326 Autumn 200119 B-Tree Insertion  Insert the key in its leaf  If the leaf ends up with L+1 items, overflow!  Split the leaf into two nodes:  original with  (L+1)/2  items  new one with  (L+1)/2  items  Add the new child to the parent  If the parent ends up with M+1 items, overflow!  If an internal node ends up with M+1 items, overflow!  Split the node into two nodes:  original with  (M+1)/2  items  new one with  (M+1)/2  items  Add the new child to the parent  If the parent ends up with M+1 items, overflow!  If root overflows, split it in two, hang new nodes under a new root  Only way tree height can increase

20 B-TreesCSE 326 Autumn 200120 After More Routine Inserts 5 135 142659 14 5 135 265979 5989 14 89 Insert(89) Insert(79)

21 B-TreesCSE 326 Autumn 200121 Deletion 5 135 14265979 5989 14 89 5 135 142679 89 14 89 Delete(59)

22 Deletion and Adoption 5 135 142679 8914 89 Delete(5) ? 13 142679 8914 89 3 133 142679 89 14 89 A leaf has too few keys! So, borrow from a neighbor

23 Deletion with Propagation 3 1 3 142679 8914 89 Delete(3) ? 1 142679 8914 89 1 142679 8914 89 A leaf has too few keys! And no neighbor with surplus! So, delete the leaf But now a node has too few subtrees!

24 B-TreesCSE 326 Autumn 200124 Adopt a neighbor 1 142679 89 14 89 14 1 2679 89 79 89 Finishing the Propagation (More Adoption)

25 B-TreesCSE 326 Autumn 200125 Delete(1) (adopt a neighbor) 14 1 2679 89 79 89 A Bit More Adoption 26 14 26 79 89 79 89

26 Delete(26) 26 14 26 79 89 79 89 Pulling out the Root 14 79 89 79 89 A leaf has too few keys! And no neighbor with surplus! 14 79 89 79 89 So, delete the leaf A node has too few subtrees and no neighbor with surplus! 14 79 89 Delete the leaf But now the root has just one subtree!

27 14 79 89 The root has just one subtree! But that’s silly! 14 79 89 Just make the one child the new root! Pulling out the Root (cont.)

28 B-TreesCSE 326 Autumn 200128 B-Tree Deletion  Remove the key from its leaf  If the leaf ends up with fewer than  L/2  items, underflow!  Adopt data from a neighbor; update the parent  If borrowing won’t work, delete node and divide keys between neighbors  If the parent ends up with fewer than  M/2  items, underflow! Why will dumping keys always work if borrowing doesn’t?

29 B-TreesCSE 326 Autumn 200129 B-Tree Deletion, part Bee  If a node ends up with fewer than  M/2  items, underflow!  Adopt subtrees from a neighbor; update the parent  If borrowing won’t work, delete node and divide subtrees between neighbors  If the parent ends up with fewer than  M/2  items, underflow!  If the root ends up with only one child, make the child the new root of the tree  Only way tree height can decrease

30 B-TreesCSE 326 Autumn 200130 Thinking about B-Trees  B-Tree insertion can cause (expensive) splitting and propagation  Deletion can cause (cheap) borrowing or (expensive) propagation  Repeated insertions and deletion can cause thrashing  Propagation is rare if M and L are large (Why?)  Disk reads/writes occur on:  Find, but finds will often get filled from virtual memory  Insert or Delete, but usually affects only the leaf node (unless we over/underflow)  Overflow or underflow, but we size M and L to make these rare Here’s the beef! M = L = 128, h = 4  128 5 = 30+ billion items! M = L = 128, h = 5  128 6 = 4+ trillion items! … and, disk accesses are bounded by height!

31 B-TreesCSE 326 Autumn 200131 Flavors of B- B-Trees with M=3, L=x are called 2-3 trees B-Trees with M=4, L=x are called 2-3-4 trees (internal nodes have 2, 3 or 4 children) Useful for achieving log M (n) bound, prototyping Disk-based systems typically require much larger M, L Base M, L on {page, block, line} and key sizes Branching factors from 50 – 2000 are common Where was that root? Paged out to disk, I think …

32 B-TreesCSE 326 Autumn 200132 Dictionary Alternatives  BST: fast finds, inserts, and deletes O(log n) on average (if data is random!)  AVL trees: guaranteed O(log n) operations  Splay trees: guaranteed amortized O(log n) operations  B-Trees: guaranteed O(log M n)  Shallower depth better for disk-based databases  Hairy 1 in practice: many cases, intricate code  What would be even better?  How about O(1) finds and inserts? 1 hairy - Slang. Fraught with difficulties; hazardous: a hairy escape; hairy problems.


Download ppt "CSE 326 Killer Bee-Trees David Kaplan Dept of Computer Science & Engineering Autumn 2001 Where was that root?"

Similar presentations


Ads by Google