Haim Kaplan and Uri Zwick November 2014

Slides:



Advertisements
Similar presentations
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
Advertisements

Data Structures Haim Kaplan and Uri Zwick November 2012 Lecture 5 B-Trees.
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
1 B trees Nodes have more than 2 children Each internal node has between k and 2k children and between k-1 and 2k-1 keys A leaf has between k-1 and 2k-1.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
Binary Trees Chapter 6.
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
1 Multiway trees & B trees & 2_4 trees Go&Ta Chap 10.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
Chapter 6 Binary Trees. 6.1 Trees, Binary Trees, and Binary Search Trees Linked lists usually are more flexible than arrays, but it is difficult to use.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Data Structures Haim Kaplan and Uri Zwick November 2012 Lecture 3 Dynamic Sets / Dictionaries Binary Search Trees.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
CSE 373 Data Structures Lecture 7
COMP261 Lecture 23 B Trees.
Unit 9 Multi-Way Trees King Fahd University of Petroleum & Minerals
TCSS 342, Winter 2006 Lecture Notes
Red Black Trees Colored Nodes Definition Binary search tree.
B-Trees B-Trees.
B/B+ Trees 4.7.
B-Tree Michael Tsai 2017/06/06.
Multiway Search Trees Data may not fit into main memory
Chapter 18: B-Trees Example: M Note: Each leaf has the same depth D H
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
B-Trees Example: Comp 750, Fall 2009 M Note: Each leaf
CSE 332 Data Abstractions B-Trees
Chapter 11: Multiway Search Trees
B+-Trees.
B+-Trees.
B+-Trees.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Lecture 7 Algorithm Analysis
(edited by Nadia Al-Ghreimil)
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
Data Structures Lecture 4 AVL and WAVL Trees Haim Kaplan and Uri Zwick
Trees and Binary Trees.
Wednesday, April 18, 2018 Announcements… For Today…
Lecture 26 Multiway Search Trees Chapter 11 of textbook
B-Trees (continued) Analysis of worst-case and average number of disk accesses for an insert. Delete and analysis. Structure for B-tree node.
Data Structures and Algorithms
B-Trees.
B- Trees D. Frey with apologies to Tom Anastasio
B- Trees D. Frey with apologies to Tom Anastasio
B-Tree.
B+-Trees (Part 1).
Lecture 7 Algorithm Analysis
B-Tree Presenter: Jun Tao.
Multiway Trees Searching and B-Trees Advanced Tree Structures
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
CMSC 202 Trees.
2-3-4 Trees Red-Black Trees
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B- Trees D. Frey with apologies to Tom Anastasio
Lecture 7 Algorithm Analysis
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
CSE 373: Data Structures and Algorithms
Design and Analysis of Algorithms
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
Algorithms, CSCI 235, Spring 2019 Lecture 22—Red Black Trees
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
CS210- Lecture 20 July 19, 2005 Agenda Multiway Search Trees 2-4 Trees
Presentation transcript:

Haim Kaplan and Uri Zwick November 2014 Data Structures Lecture 5 B-Trees Haim Kaplan and Uri Zwick November 2014

Idealized computation model CPU RAM Each instruction takes one unit of time Each memory access takes one unit of time

A more realistic model CPU Disk Each level much larger but much slower RAM Disk Cache Each level much larger but much slower Information moved in blocks

A simplified I/O mode CPU Disk Each block is of size B RAM Disk Each block is of size B Count both operations and I/O operations I/O operations are much more expensive

Data structures in the I/O model Linked list and binary search trees behave poorly in the I/O model. Each pointer followed may cause a cache miss We need an alternative for binary search trees that is more suited to the I/O model B-Trees !

A 4-node 3 keys 4-way branch 10 25 42 key < 10 10 < key < 25

An r-node … r−1 keys r-way branch k0 k1 k2 kr−3 kr−2 c0 c1 c2 cr−2

B-Trees / (d,2d)-Trees [Bayer-McCreight (1972)] d – minimum degree Each node holds between d−1 and 2d −1 keys (Each non-leaf node has between d and 2d children) The root is special: has between 1 and 2d −1 keys (Has between 2 and 2d children, if not a leaf) All leaves are at the same depth

B-Tree with minimal degree d=2 A (2,4)-tree B-Tree with minimal degree d=2 13 4 6 10 15 28 Insert 10, no problem. Insert 18? 1 3 5 7 11 14 16 17 30 40 50

Node structure Room for 2d1 keys and 2d child pointers kr-3 kr-2 k1 k2 … c0 c1 c2 cr−2 cr−1 Room for 2d1 keys and 2d child pointers r – the actual degree key[0],…key[r−2] – the keys item[0],…item[r−2] – the associated items child[0],…child[r−1] – the children leaf – is the node a leaf? Possibly a different representation for leaves

The height of B-Trees … At depth 1 there are at least 2 nodes At depth 2 there are at least 2d nodes At depth 3 there are at least 2d2 nodes … At depth h there are at least 2dh−1 nodes

B-Trees – What are they good for? Large degree B-trees are used to represent very large dictionaries stored on disks. The minimum degree d is chosen according to the size of a disk block. Smaller degree B-trees used for internal-memory dictionaries to reduce the number of cache-misses. B-trees with d=2, i.e., (2,4)-trees, are very similar to Red-Black trees.

WAVL Trees vs. B-Trees n = 230  109 30 ≤ height of WAVL Tree ≤ 60 Up to 60 pages read from disk Height of B-Tree with d= 210 =1024 is only 3 Each B-Tree node resides in a block/page Only 3 (or 4) pages read from disk Disk access  1 millisecond (10-3 sec) Memory access  100 nanosecond (10-7 sec)

Look for k in the subtree of node x Look for k in node x Look for k in the subtree of node x Number of I/Os - logdn Number of operations – O(d logdn) Number of ops with binary search – O(log2d logdn) = O(log2n)

Splitting an overflowing node (I.e., a node with 2d keys / 2d+1 children) A C   B d d−1 A C B   d−1 d

Insert 13 5 10 15 28 1 3 6 11 14 16 17 30 40 50 Insert 10, no problem. Insert 18? Insert(T,2)

Insert 13 5 10 15 28 1 2 3 6 11 14 16 17 30 40 50 Insert 10, no problem. Insert 18? Insert(T,2)

Insert 13 5 10 15 28 1 2 3 6 11 14 16 17 30 40 50 Insert 10, no problem. Insert 18? Insert(T,4)

Insert 13 5 10 15 28 1 2 3 4 6 11 14 16 17 30 40 50 Insert 10, no problem. Insert 18? Insert(T,4)

Split 13 5 10 15 28 1 2 3 4 6 11 14 16 17 30 40 50 Insert 10, no problem. Insert 18? Insert(T,4)

Split 13 5 10 15 28 2 1 3 4 6 11 14 16 17 30 40 50 Insert 10, no problem. Insert 18? Insert(T,4)

Split 13 2 5 10 15 28 1 3 4 6 11 14 16 17 30 40 50 Insert 10, no problem. Insert 18? Insert(T,4)

Splitting an overflowing node (I.e., a node with 2d keys / 2d+1 children) A C   B d d−1 A C B   d−1 d

Splitting an overflowing root C   d−1 T.root d T.root C   d d−1 Number of I/Os – O(1) Number of operations – O(d)

Another insert 13 2 5 10 15 28 1 3 4 6 11 14 16 17 30 40 50 Insert 10, no problem. Insert 18? Insert(T,7)

Another insert 13 2 5 10 15 28 1 3 4 6 7 11 14 16 17 30 40 50 Insert 10, no problem. Insert 18? Insert(T,7)

and another insert 13 2 5 10 15 28 1 3 4 6 7 11 14 16 17 30 40 50 Insert 10, no problem. Insert 18? Insert(T,8)

and another insert 13 2 5 10 15 28 1 3 4 6 7 8 11 14 16 17 30 40 50 Insert 10, no problem. Insert 18? Insert(T,8)

and the last for today 13 2 5 10 15 28 1 3 4 6 7 8 9 11 14 16 17 30 40 50 Insert 10, no problem. Insert 18? Insert(T,9)

Split 13 2 5 10 15 28 7 1 3 4 6 8 9 11 14 16 17 30 40 50 Insert 10, no problem. Insert 18? Insert(T,9)

Split 13 2 5 7 10 15 28 1 3 4 6 8 9 11 14 16 17 30 40 50 Insert 10, no problem. Insert 18? Insert(T,9)

Split 13 5 2 7 10 15 28 1 3 4 6 8 9 11 14 16 17 30 40 50 Insert 10, no problem. Insert 18? Insert(T,9)

Split 5 13 2 7 10 15 28 1 3 4 6 8 9 11 14 16 17 30 40 50 Insert 10, no problem. Insert 18? Insert(T,9)

Insert – Bottom-up Find the insertion point by a downward search Insert the key in the appropriate place If the current node is overflowing, split it If its parent is now overflowing, split it, etc. Disadvantages: Need both a downward scan and an upward scan Nodes are temporarily overflowing Need to keep parents on a stack Note: We do not maintain parent pointers. (Why?)

Exercise: (d,2d1)-Trees Show that essentially the same bottom-up insertion technique also works for (d,2d1)-Trees (d,2d)-Trees are better than (d,2d-1)-Trees for at least two reasons: They allow top-down insertions and deletions The amortized number of split/fuse operations per insertion/deletion is O(1)

If the root is full, split it before starting this process Insert – Top-down While conducting the search, split full children on the search path before descending to them! If the root is full, split it before starting this process When the appropriate leaf it reached, it is not full, so the new key may be added!

Number of operations – O(d) Split-Root(T) C  d−1  T.root C   d−1 T.root Number of I/Os – O(1) Number of operations – O(d)

Number of operations – O(d) Split-Child(x,i) x key[i] x key[i] A C   B d−1 A C B  d−1  x.child[i] x.child[i] Number of I/Os – O(1) Number of operations – O(d)

Insert – Top-down While conducting the search, split full children on the search path before descending to them! Number of I/Os – O(logdn) Number of operations – O(d logdn) Amortized no. of splits  1/(d1) (See bonus material)

Number of splits (Insertions only) Bonus material Number of splits (Insertions only) If n items are inserted into an initially empty (d,2d)-tree, then the total number of splits is at most n/(d1) Amortized number of splits per insert  1/(d1)

Deletions from B-Trees As always, similar, but slightly more complicated than insertions To delete an item in an internal node, replace it by its successor and delete successor Deletion is slightly simpler for B+-Trees

We continue with B-trees B-Trees vs. B+-Trees In a B-tree each node contains items and keys In a B+-tree leaves contain items and keys. Internal nodes contain keys to direct the search. Keys in internal nodes are either keys of existing items, or keys of items that were deleted. Internal nodes may contain more keys. When d is large, the extra space needed is negligible. We continue with B-trees

Delete 7 15 3 10 13 22 28 20 24 26 30 40 50 1 2 4 6 14 8 9 11 12 Insert 10, no problem. Insert 18? delete(T,26)

Delete 7 15 3 10 13 22 28 20 24 30 40 50 1 2 4 6 14 8 9 11 12 Insert 10, no problem. Insert 18? delete(T,26)

Delete 7 15 3 10 13 22 28 20 24 30 40 50 1 2 4 6 14 8 9 11 12 Insert 10, no problem. Insert 18? delete(T,13)

Delete (Replace with predecessor) 7 15 3 10 12 22 28 20 24 30 40 50 1 2 4 6 14 8 9 11 12 Insert 10, no problem. Insert 18? delete(T,13)

Delete 7 15 3 10 12 22 28 11 20 24 30 40 50 1 2 4 6 14 8 9 Insert 10, no problem. Insert 18? delete(T,13)

Delete 7 15 3 10 12 22 28 11 20 24 30 40 50 1 2 4 6 14 8 9 Insert 10, no problem. Insert 18? delete(T,24)

Delete 7 15 3 10 12 22 28 11 20 30 40 50 1 2 4 6 14 8 9 Insert 10, no problem. Insert 18? delete(T,24)

Delete (borrow from sibling) 7 15 3 10 12 22 30 1 2 4 6 8 9 11 14 20 28 40 50 Insert 10, no problem. Insert 18? delete(T,24)

Borrow from left A B    B A    Borrow from right

Delete 7 15 3 10 12 22 30 1 2 4 6 8 9 11 14 20 28 40 50 Insert 10, no problem. Insert 18? delete(T,20)

Delete 7 15 3 10 12 22 30 1 2 4 6 8 9 11 14 28 40 50 Insert 10, no problem. Insert 18? delete(T,20)

Delete (Fuse) 7 15 3 10 12 30 1 2 4 6 8 9 11 14 22 28 40 50 Insert 10, no problem. Insert 18? delete(T,20)

Few more… 7 15 3 10 12 30 1 2 4 6 8 9 11 14 22 28 40 50 Insert 10, no problem. Insert 18? delete(T,22)

Few more… 7 15 3 10 12 30 1 2 4 6 8 9 11 14 28 40 50 Insert 10, no problem. Insert 18? delete(T,22)

Few more… 7 15 3 10 12 30 40 50 11 28 1 2 4 6 14 8 9 Insert 10, no problem. Insert 18? delete(T,28)

Few more… 7 15 3 10 12 30 40 50 11 1 2 4 6 14 8 9 Insert 10, no problem. Insert 18? delete(T,28)

Borrowing again 7 15 3 10 12 40 1 2 4 6 8 9 11 14 30 50 delete(T,28) 7 15 3 10 12 40 1 2 4 6 8 9 11 14 30 50 Insert 10, no problem. Insert 18? delete(T,28)

Another one 7 15 3 10 12 40 1 2 4 6 8 9 11 14 30 50 Insert 10, no problem. Insert 18? delete(T,30)

Another one 7 15 3 10 12 40 1 2 4 6 8 9 11 14 50 Insert 10, no problem. Insert 18? delete(T,30)

Fuse A C B  d−2  d−1 A C   B d−2 d−1

After Fuse 7 15 3 10 12 1 2 4 6 8 9 11 14 40 50 Insert 10, no problem. Insert 18? delete(T,30)

Now we can borrow 7 15 3 10 12 1 2 4 6 8 9 11 14 40 50 delete(T,30) 7 15 3 10 12 1 2 4 6 8 9 11 14 40 50 Insert 10, no problem. Insert 18? delete(T,30)

Now we can borrow 7 12 3 10 15 40 50 1 2 4 6 11 14 8 9 delete(T,30) 7 12 3 10 15 40 50 1 2 4 6 11 14 8 9 Insert 10, no problem. Insert 18? delete(T,30)

Delete – Bottom-up Delete an item from a leaf If the item to be deleted is not in a leaf, replace it by its successor and delete the successor Delete an item from a leaf If the current node is underflowing, i.e., has less than d1 keys, either borrow an item from a sibling, or fuse with a sibling Borrowing fixes the problem Fusing may make the parent underflowing

Borrow from left A B    B A    Borrow from right

Fuse A C B  d−2  d−1 A C   B d−2 d−1

Assume, at first, that the item to be deleted is in a leaf Delete – Top-down Assume, at first, that the item to be deleted is in a leaf While conducting the search, make sure that each child descended into contains at least d keys How? Use Borrow or Fuse When the item is located, it resides in a leaf containing at least d keys, so it can be removed

Delete – Top down d−1  d d−1 Borrow Fuse While conducting the search, make sure that each child you descend to contains at least d keys d−1  d d−1 Borrow Fuse

Delete – Top down What if the item to be deleted is in an internal node? Descend as before from the root until the item to be deleted is located Keep a pointer to the node containing the item Carry on descending towards the successor, making sure that nodes contain at least d keys When the successor is found, delete it from its leaf and use it to replace the item to be deleted

Number of fuse/splits (With bottom-up Insert/Delete) Bonus material Number of fuse/splits (With bottom-up Insert/Delete) The number of split and fuse operations in a sequence of m insert and delete operations on an initially empty (d,2d)-tree is at most O(m) Amortized no. of splits/fuses per update is O(1) (2d-node) = 2 (d-node) = 1 With top-down insertions and deletions, the amortized number of splits/fuses may be (logdn)