B-Trees (continued) Analysis of worst-case and average number of disk accesses for an insert. Delete and analysis. Structure for B-tree node.

Slides:



Advertisements
Similar presentations
B+-trees. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B = n pages I/O complexity:
Advertisements

1 B trees Nodes have more than 2 children Each internal node has between k and 2k children and between k-1 and 2k-1 keys A leaf has between k-1 and 2k-1.
B + -Trees Same structure as B-trees. Dictionary pairs are in leaves only. Leaves form a doubly-linked list. Remaining nodes have following structure:
Indexing (cont.). Insertion in a B+ Tree Another B+ Tree
B-Trees (continued) Analysis of worst-case and average number of disk accesses for an insert. Delete and analysis. Structure for B-tree node.
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
CS4432: Database Systems II
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
2-3 Trees Extended tree.  Tree in which all empty subtrees are replaced by new nodes that are called external nodes.  Original nodes are called internal.
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
B + -Trees Same structure as B-trees. Dictionary pairs are in leaves only. Leaves form a doubly-linked list. Remaining nodes have following structure:
2-3 Trees Extended tree.  Tree in which all empty subtrees are replaced by new nodes that are called external nodes.  Original nodes are called internal.
B-Tree – Delete Delete 3. Delete 8. Delete
B-Trees ( Rizwan Rehman) Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory.
B+-Tree Deletion Underflow conditions B+ tree Deletion Algorithm
COMP261 Lecture 23 B Trees.
Unit 9 Multi-Way Trees King Fahd University of Petroleum & Minerals
Lecture 15 Nov 3, 2013 Height-balanced BST Recall:
File Organization and Processing Week 3
Red Black Trees Colored Nodes Definition Binary search tree.
B-Trees B-Trees.
Multiway Search Trees Data may not fit into main memory
B-Trees B-Trees.
B+-Trees j a0 k1 a1 k2 a2 … kj aj j = number of keys in node.
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
Splay Trees Binary search trees.
Chapter 11: Multiway Search Trees
Lecture 16 Multiway Search Trees
Binary Search Tree Chapter 10.
Red-Black Trees Bottom-Up Deletion.
Dynamic Dictionaries Primary Operations: Additional operations:
Splay Trees Binary search trees.
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
Haim Kaplan and Uri Zwick November 2014
Chapter 6 Transform and Conquer.
Wednesday, April 18, 2018 Announcements… For Today…
Topics covered (since exam 1):
Lecture 26 Multiway Search Trees Chapter 11 of textbook
Height Balanced Trees 2-3 Trees.
Multi-Way Search Trees
(2,4) Trees /26/2018 3:48 PM (2,4) Trees (2,4) Trees
B-Trees CSE 373 Data Structures CSE AU B-Trees.
B-Trees.
B- Trees D. Frey with apologies to Tom Anastasio
B- Trees D. Frey with apologies to Tom Anastasio
B-Tree.
Mainly from Chapter 7 Adam Drozdek
B-Trees CSE 373 Data Structures CSE AU B-Trees.
CSIT 402 Data Structures II With thanks to TK Prasad
B- Trees D. Frey with apologies to Tom Anastasio
B-TREE ________________________________________________________
Red-Black Trees Bottom-Up Deletion.
(2,4) Trees /24/2019 7:30 PM (2,4) Trees (2,4) Trees
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
CSE 373, Copyright S. Tanimoto, 2002 B-Trees -
Red Black Trees (Guibas Sedgewick 78)
2-3 Trees Extended tree. Tree in which all empty subtrees are replaced by new nodes that are called external nodes. Original nodes are called internal.
(2,4) Trees /6/ :26 AM (2,4) Trees (2,4) Trees
B+-Trees j a0 k1 a1 k2 a2 … kj aj j = number of keys in node.
B-Trees CSE 373 Data Structures CSE AU B-Trees.
B-Trees.
Tree-Structured Indexes
B-Trees.
Heaps & Multi-way Search Trees
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
Binary Search Trees < > = Dictionaries
Splay Trees Binary search trees.
CS210- Lecture 20 July 19, 2005 Agenda Multiway Search Trees 2-4 Trees
Presentation transcript:

B-Trees (continued) Analysis of worst-case and average number of disk accesses for an insert. Delete and analysis. Structure for B-tree node

Worst-Case Disk Accesses 15 20 4 1 3 5 6 30 40 13 16 17 7 12 9 8 10 Insert 14. Insert 2. Insert 18.

Worst-Case Disk Accesses Assume enough memory to hold all h nodes accessed on way down. h read accesses on way down. 2s+1 write accesses on way up, s = number of nodes that split. Total h+2s+1 disk accesses. Max is 3h+1.

Average Disk Accesses Start with empty B-tree. Insert n pairs. Resulting B-tree has p nodes. # splits <= p –2, p > 2. # pairs >= 1+(ceil(m/2) – 1)(p – 1). savg <= (p – 2)/(1+(ceil(m/2) – 1)(p – 1)). So, savg < 1/(ceil(m/2) – 1). m = 200 => savg < 1/99. Average disk accesses < h + 2/99 + 1 ~ h + 1. Nearly minimum. Whenever the root splits, the number of nodes increases by 2. When any other node splits, the increase is by 1. The root splits at least once when p > 2. An insert must make at least h read and 1 write access. So, h+1 is the minimum number of disk accesses for each insert. Therefore, the average is at least h+1.

Delete (2-3 tree) Delete the pair with key = 8. 15 20 8 1 2 4 5 6 30 40 9 16 17 3 Delete the pair with key = 8. Transform deletion from interior into deletion from a leaf. Replace by largest in left subtree. Largest is in a leaf. In a BST, largest is in a leaf or a degree 1 node.

Delete From A Leaf Delete the pair with key = 16. 15 20 8 1 2 4 5 6 30 40 9 16 17 3 Delete the pair with key = 16. 3-node becomes 2-node.

Delete From A Leaf Delete the pair with key = 17. 8 2 4 15 20 1 3 5 6 9 30 40 17 Delete the pair with key = 17. Deletion from a 2-node. Check an adjacent sibling and determine if it is a 3-node. If so borrow a pair and a subtree via parent node.

Delete From A Leaf Delete the pair with key = 20. 8 2 4 15 30 1 3 5 6 9 40 20 Delete the pair with key = 20. Deletion from a 2-node. Check an adjacent sibling and determine if it is a 3-node. If not, combine with sibling and parent pair.

Delete From A Leaf Delete the pair with key = 30. 8 2 4 15 1 3 5 6 9 30 40 Delete the pair with key = 30. Deletion from a 3-node. 3-node becomes 2-node.

Delete From A Leaf Delete the pair with key = 3. 8 2 4 15 1 3 5 6 9 40 Delete the pair with key = 3. Deletion from a 2-node. Check an adjacent sibling and determine if it is a 3-node. If so borrow a pair and a subtree via parent node.

Delete From A Leaf Delete the pair with key = 6. 8 2 5 15 1 4 6 9 40 Delete the pair with key = 6. Deletion from a 2-node. Check an adjacent sibling and determine if it is a 3-node. If not, combine with sibling and parent pair.

Delete From A Leaf Delete the pair with key = 40. 8 2 15 1 4 5 9 40 Delete the pair with key = 40. Deletion from a 2-node. Check an adjacent sibling and determine if it is a 3-node. If not, combine with sibling and parent pair.

Delete From A Leaf Parent pair was from a 2-node. 8 2 1 4 5 9 15 Parent pair was from a 2-node. Check an adjacent sibling and determine if it is a 3-node. If not, combine with sibling and parent pair.

Delete From A Leaf Parent pair was from a 2-node. 2 8 1 4 5 9 15 Parent pair was from a 2-node. Check an adjacent sibling and determine if it is a 3-node. No sibling, so must be the root. Discard root. Left child becomes new root.

Delete From A Leaf 2 8 1 4 5 9 15 Height reduces by 1.

Delete A Pair Deletion from interior node is transformed into a deletion from a leaf node. Deficient leaf triggers bottom-up borrowing and node combining pass. Deficient node is combined with an adjacent sibling who has exactly ceil(m/2) – 1 pairs. After combining, the node has [ceil(m/2) – 2] (original pairs) + [ceil(m/2) – 1] (sibling pairs) + 1 (from parent) <= m –1 pairs.

Disk Accesses Minimum. Borrow. Combine. 15 20 4 5 6 30 40 13 16 17 15 20 4 5 6 30 40 13 16 17 7 12 9 8 10 3 Minimum. Minimum when deleting from a leaf is when you delete 5, for example. This min is h+1 accesses. When deleting from the interior, minimum is h+2. Borrow: read sibling, write 2 siblings and parent. Borrow terminates the delete. Combine: read sibling, write combined node, propagate up as parent degree has reduced by 1. Borrow. Combine.

Worst-Case Disk Accesses Assume enough memory to hold all h nodes accessed on way down. h read accesses on way down. h – 1 sibling read accesses on way up. h – 2 writes of combined nodes on way up. 3 writes of root and level 2 nodes for sibling borrowing at level 2. Total is 3h. Combine at level 21 write of level 2 node, 1 write of root.

Average Disk Accesses Start with B-tree that has n pairs and p nodes. Delete the pairs one by one. n >= 1+(ceil(m/2) – 1)(p – 1). p <= 1 + (n – 1)/(ceil(m/2) – 1). Upper bound on total number of disk accesses. Each delete does a borrow. The deletes together do at most p –1 combines/merges. # accesses <= n(h+4) + 2(p – 1). Actually, only p-2 combines are possible as when the number of nodes is down to 3, a combine reduces this to 1. H+4 => h reads on way down, a sibling read, 3 writes. Note that a borrow terminates the operation and so combines, if any, are done before the assumed borrow and cost 3 accesses each (sibling read and write of combined operation; write of parent is part of next level combine or terminating borrow). Or n(h+5) if you account for deletes from the interior which require you to write the interior node as well (note that an interior delete may take 1 more write than a delete from a leaf because restructuring may not propagate all the way back to the interior node from which the delete occurred). Merges can only be done until the number of nodes becomes 3. So, number of merges is at most p-3. We use p-1 for simplicity. 2(p-1) => p-1 sibling reads and p-1 writes

Average Disk Accesses Average # accesses <= [n(h+4) + 2(p – 1)]/ n Nearly minimum. A delete must make at least h read and 1 write access. So, h+1 is the minimum number of disk accesses for each delete. Therefore, the average is at least h+1.

Worst Case Alternating sequence of inserts and deletes. Each insert does h splits at a cost of 3h + 1 disk accesses. Each delete moves back up to root at a cost of 3h disk accesses. Average for this sequence is 3h + 1 for an insert and 3h for a delete. To avoid worst-case, do lazy deletion. Each delete causes deleted item to be flagged. When enough deletes have been flagged, we do real deletes. Doesn’t affect search time as tree height grows very slowly.

Internal Memory B-Trees Cache access time vs main memory access time. Reduce main memory accesses using a B-tree.

Node Structure q a0 p1 a1 p2 a2 … pq aq Node operations during a search. Search the node for a given key. a’s are subtree pointers, p’s are dictionary pairs

Node Operations For Insert Insert a dictionary pair and a pointer (p, a). m a0 p1 a1 p2 a2 … pm am ceil(m/2)-1 a0 p1 a1 p2 a2 … pceil(m/2)-1 aceil(m/2)-1 m-ceil(m/2) aceil(m/2) pceil(m/2)+1 aceil(m/2)+1 … pm am Find middle pair. 3-way split around middle pair.

Node Operations For Delete Delete a dictionary pair. Borrow. Delete, replace, insert. Combine. 3-way join.

Node Structure Each B-tree node is an array partitioned into indexed red-black tree nodes that will keep one dictionary pair each. Indexed red-black tree is built using simulated pointers (integer pointers).

Complexity Of B-Tree Node Operations Search a B-tree node … O(log m). Find middle pair … O(log m). Insert a pair … O(log m). Delete a pair … O(log m). Split a B-tree node … O(log m). Join 2 B-tree nodes … O(m). Need to copy indexed red-black tree that represents one B-tree node into the array space of the other B-tree node. For internal memory applications, split will need to copy one part to new memory using O(m) time. Note must use arrays and integer pointers for internal memory as well so as to fetch a B-tree node (that is comprised of many red-black nodes) with a single cache miss. For internal memory operations, the optimal m is in the 30 to 50 range and the array representation of a node works well.