CPSC 221: Algorithms and Data Structures Lecture #7 Sweet, Sweet Tree Hives (B+-Trees, that is) Steve Wolfman 2010W2.

Slides:



Advertisements
Similar presentations
Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
Advertisements

0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
Chapter 14 Multi-Way Search Trees
Chapter 23 Multi-Way Search Trees. Chapter Scope Examine 2-3 and 2-4 trees Introduce the concept of a B-tree Example specialized implementations of B-trees.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Data Structures Haim Kaplan and Uri Zwick November 2012 Lecture 5 B-Trees.
CSE332: Data Abstractions Lecture 10: More B-Trees Tyler Robison Summer
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
CSE332: Data Abstractions Lecture 9: BTrees Tyler Robison Summer
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
CSE 326: Data Structures Lecture #11 B-Trees Alon Halevy Spring Quarter 2001.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
(B+-Trees, that is) Steve Wolfman 2014W1
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Tirgul 6 B-Trees – Another kind of balanced trees.
CS4432: Database Systems II
Splay Trees and B-Trees
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
B+ Trees COMP
CSE373: Data Structures & Algorithms Lecture 15: B-Trees Linda Shapiro Winter 2015.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
B-Trees And B+-Trees Jay Yim CS 157B Dr. Lee.
COSC 2007 Data Structures II Chapter 15 External Methods.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of CHAPTER 12: Multi-way Search Trees Java Software Structures: Designing.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
CSE 326 Killer Bee-Trees David Kaplan Dept of Computer Science & Engineering Autumn 2001 Where was that root?
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
B+ Trees  What if you have A LOT of data that needs to be stored and accessed quickly  Won’t all fit in memory.  Means we have to access your hard.
B-Trees ( Rizwan Rehman) Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory.
 B-tree is a specialized multiway tree designed especially for use on disk  B-Tree consists of a root node, branch nodes and leaf nodes containing the.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Internal and External Sorting External Searching
CSE 326: Data Structures Lecture #9 Big, Bad B-Trees Steve Wolfman Winter Quarter 2000.
COMP261 Lecture 23 B Trees.
Multiway Search Trees Data may not fit into main memory
Btrees Deletion.
CSE 332 Data Abstractions B-Trees
B+-Trees.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Btrees Insertion.
B+-Trees.
B+-Trees.
B+ Tree.
(2,4) Trees (2,4) Trees 1 (2,4) Trees (2,4) Trees
BTrees.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
CSE 326: Data Structures Lecture #9 AVL II
B-Tree.
Multiway Trees Searching and B-Trees Advanced Tree Structures
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
CSE 326: Data Structures Lecture #10 B-Trees
CS210- Lecture 20 July 19, 2005 Agenda Multiway Search Trees 2-4 Trees
Presentation transcript:

CPSC 221: Algorithms and Data Structures Lecture #7 Sweet, Sweet Tree Hives (B+-Trees, that is) Steve Wolfman 2010W2

Learning Goals After this unit, you should be able to: Describe the structure, navigation and complexity of an order m B-tree. Insert and delete elements from a B+-tree, maintaining the half- full principle. Explain the relationship among the order of a B+-tree, the number of nodes, and the minimum and maximum elements of internal and external nodes. Compare and contrast B+-trees with other data structures. Justify why the number of I/Os becomes a more appropriate complexity measure (than the number of operations/steps) when dealing with larger datasets and their indexing structures (e.g., B+-trees). Describe a B+-Tree and explain the difference between a B-tree and a B+ Tree 2

Today’s Outline Addressing our other problem B+-tree properties Implementing B+-tree insertion and deletion Some final thoughts on B+-trees

Cost of a Database Query (10 years ago… more skewed now!) I/O to CPU ratio is 300!

M-ary Search Tree Maximum branching factor of M Complete tree has depth = log M N Each internal node in a complete tree has M - 1 keys runtime:

Incomplete M-ary Search Tree  Just like a binary tree, though, complete m-ary trees have m 0 nodes, m 0 + m 1 nodes, m 0 + m 1 + m 2 nodes, … What about numbers in between??

B-Trees B-Trees are specialized M-ary search trees Each node has many keys –subtree between two keys x and y contains values v such that x  v < y –binary search within a node to find correct subtree Each node takes one full {page, block, line} of memory ALL the leaves are at the same depth! x<3 3  x<77  x<1212  x<2121  x

Today’s Outline Addressing our other problem B+-tree properties Implementing B+-tree insertion and deletion Some final thoughts on B+-trees

B-Tree Properties Properties –maximum branching factor of M –the root has between 2 and M children or at most L keys/values –other internal nodes have between  M/2  and M children –internal nodes contain only search keys (no data) –smallest datum between search keys x and y equals x –each (non-root) leaf contains between  L/2  and L keys/values –all leaves are at the same depth Result –tree is  (log M n) deep (between log M/2 n and log M n ) –all operations run in  (log M n) time –operations get about M/2 to M or L/2 to L items at a time

B-Tree Properties ‡ Properties –maximum branching factor of M –the root has between 2 and M children or at most L keys/values –other internal nodes have between  M/2  and M children –internal nodes contain only search keys (no data) –smallest datum between search keys x and y equals x –each (non-root) leaf contains between  L/2  and L keys/values –all leaves are at the same depth Result –tree is  (log M n) deep (between log M/2 n and log M n ) –all operations run in  (log M n) time –operations get about M/2 to M or L/2 to L items at a time ‡ These are technically B + -Trees. B-Trees store data at internal nodes.

B-Tree Properties Properties –maximum branching factor of M –the root has between 2 and M children or at most L keys/values –other internal nodes have between  M/2  and M children –internal nodes contain only search keys (no data) –smallest datum between search keys x and y equals x –each (non-root) leaf contains between  L/2  and L keys/values –all leaves are at the same depth Result –tree is  (log M n) deep (between log M/2 n and log M n ) –all operations run in  (log M n) time –operations get about M/2 to M or L/2 to L items at a time

B-Tree Properties Properties –maximum branching factor of M –the root has between 2 and M children or at most L keys/values –other internal nodes have between  M/2  and M children –internal nodes contain only search keys (no data) –smallest datum between search keys x and y equals x –each (non-root) leaf contains between  L/2  and L keys/values –all leaves are at the same depth Result –tree is  (log M n) deep (between log M/2 n and log M n ) –all operations run in  (log M n) time –operations get about M/2 to M or L/2 to L items at a time

Today’s Outline Addressing our other problem B+-tree properties Implementing B+-tree insertion and deletion Some final thoughts on B+-trees

… __ k1k1 k2k2 … kiki B-Tree Nodes Internal node –i search keys; i+1 subtrees; M - i - 1 inactive entries Leaf –j data keys; L - j inactive entries k1k1 k2k2 … kjkj … __ 12M L i j

Example B-Tree with M = 4 and L =

Making a B-Tree The empty B-Tree M = 3 L = 2 3 Insert(3) 314 Insert(14) Now, Insert(1)? B-Tree with M = 3 and L = 2

Splitting the Root And create a new root Insert(1) Too many keys in a leaf! So, split the leaf.

Insertions and Split Ends Insert(59) Insert(26) And add a new child Too many keys in a leaf! So, split the leaf.

Propagating Splits Insert(5) Add new child Create a new root Too many keys in an internal node! So, split the node.

Insertion in Boring Text Insert the key in its leaf If the leaf ends up with L+1 items, overflow! –Split the leaf into two nodes: original with  (L+1)/2  items new one with  (L+1)/2  items –Add the new child to the parent –If the parent ends up with M+1 items, overflow! If an internal node ends up with M+1 items, overflow! –Split the node into two nodes: original with  (M+1)/2  items new one with  (M+1)/2  items –Add the new child to the parent –If the parent ends up with M+1 items, overflow! Split an overflowed root in two and hang the new nodes under a new root This makes the tree deeper!

After More Routine Inserts Insert(89) Insert(79)

Deletion Delete(59)

Deletion and Adoption Delete(5) ? A leaf has too few keys! So, borrow from a neighbor P.S. Parent + neighbour pointers. Expensive? a.Definitely yes b.Maybe yes c.Not sure d.Maybe no e.Definitely no

Deletion with Propagation Delete(3) ? A leaf has too few keys! And no neighbor with surplus! So, delete the leaf But now a node has too few subtrees! WARNING: with larger L, can drop below L/2 without being empty! (Ditto for M.)

Adopt a neighbor Finishing the Propagation (More Adoption)

Delete(1) (adopt a neighbor) A Bit More Adoption

Delete(26) Pulling out the Root A leaf has too few keys! And no neighbor with surplus! So, delete the leaf A node has too few subtrees and no neighbor with surplus! Delete the leaf But now the root has just one subtree!

Pulling out the Root (continued) The root has just one subtree! But that’s silly! Just make the one child the new root! Note: The root really does only get deleted when it has just one subtree (no matter what M is).

Deletion in Two Boring Slides of Text Remove the key from its leaf If the leaf ends up with fewer than  L/2  items, underflow! –Adopt data from a neighbor; update the parent –If borrowing won’t work, delete node and divide keys between neighbors –If the parent ends up with fewer than  M/2  items, underflow! Will dumping keys always work if adoption does not? a.Yes b.It depends c.No

Deletion Slide Two If a node ends up with fewer than  M/2  items, underflow! –Adopt subtrees from a neighbor; update the parent –If borrowing won’t work, delete node and divide subtrees between neighbors –If the parent ends up with fewer than  M/2  items, underflow! If the root ends up with only one child, make the child the new root of the tree This reduces the height of the tree!

Today’s Outline Addressing our other problem B+-tree properties Implementing B+-tree insertion and deletion Some final thoughts on B+-trees

Thinking about B-Trees B-Tree insertion can cause (expensive) splitting and propagation (could we do something like borrowing?) B-Tree deletion can cause (cheap) borrowing or (expensive) deletion and propagation Propagation is rare if M and L are large (Why?) Repeated insertions and deletion can cause thrashing If M = L = 128, then a B-Tree of height 4 will store at least 30,000,000 items

A Tree with Any Other Name FYI: –B-Trees with M = 3, L = x are called 2-3 trees –B-Trees with M = 4, L = x are called trees Why would we ever use these?

Coming Up Graph Theory and Counting