Presentation is loading. Please wait.

Presentation is loading. Please wait.

B-Trees (continued) Analysis of worst-case and average number of disk accesses for an insert. Delete and analysis. Structure for B-tree node.

Similar presentations


Presentation on theme: "B-Trees (continued) Analysis of worst-case and average number of disk accesses for an insert. Delete and analysis. Structure for B-tree node."— Presentation transcript:

1 B-Trees (continued) Analysis of worst-case and average number of disk accesses for an insert. Delete and analysis. Structure for B-tree node

2 Worst-Case Disk Accesses
4 1 3 5 6 13 7 12 9 8 10 Insert 14. Insert 2. Insert 18.

3 Worst-Case Disk Accesses
Assume enough memory to hold all h nodes accessed on way down. h read accesses on way down. 2s+1 write accesses on way up, s = number of nodes that split. Total h+2s+1 disk accesses. Max is 3h+1.

4 Average Disk Accesses Start with empty B-tree. Insert n pairs.
Resulting B-tree has p nodes. # splits <= p –2, p > 2. # pairs >= 1+(ceil(m/2) – 1)(p – 1). savg <= (p – 2)/(1+(ceil(m/2) – 1)(p – 1)). So, savg < 1/(ceil(m/2) – 1). m = 200 => savg < 1/99. Average disk accesses < h + 2/ ~ h + 1. Nearly minimum. Whenever the root splits, the number of nodes increases by 2. When any other node splits, the increase is by 1. The root splits at least once when p > 2. An insert must make at least h read and 1 write access. So, h+1 is the minimum number of disk accesses for each insert. Therefore, the average is at least h+1.

5 Delete (2-3 tree) Delete the pair with key = 8.
8 1 2 4 5 6 9 3 Delete the pair with key = 8. Transform deletion from interior into deletion from a leaf. Replace by largest in left subtree. Largest is in a leaf. In a BST, largest is in a leaf or a degree 1 node.

6 Delete From A Leaf Delete the pair with key = 16.
8 1 2 4 5 6 9 3 Delete the pair with key = 16. 3-node becomes 2-node.

7 Delete From A Leaf Delete the pair with key = 17.
8 2 4 1 3 5 6 9 17 Delete the pair with key = 17. Deletion from a 2-node. Check an adjacent sibling and determine if it is a 3-node. If so borrow a pair and a subtree via parent node.

8 Delete From A Leaf Delete the pair with key = 20.
8 2 4 1 3 5 6 9 40 20 Delete the pair with key = 20. Deletion from a 2-node. Check an adjacent sibling and determine if it is a 3-node. If not, combine with sibling and parent pair.

9 Delete From A Leaf Delete the pair with key = 30.
8 2 4 15 1 3 5 6 9 Delete the pair with key = 30. Deletion from a 3-node. 3-node becomes 2-node.

10 Delete From A Leaf Delete the pair with key = 3.
8 2 4 15 1 3 5 6 9 40 Delete the pair with key = 3. Deletion from a 2-node. Check an adjacent sibling and determine if it is a 3-node. If so borrow a pair and a subtree via parent node.

11 Delete From A Leaf Delete the pair with key = 6.
8 2 5 15 1 4 6 9 40 Delete the pair with key = 6. Deletion from a 2-node. Check an adjacent sibling and determine if it is a 3-node. If not, combine with sibling and parent pair.

12 Delete From A Leaf Delete the pair with key = 40.
8 2 15 1 4 5 9 40 Delete the pair with key = 40. Deletion from a 2-node. Check an adjacent sibling and determine if it is a 3-node. If not, combine with sibling and parent pair.

13 Delete From A Leaf Parent pair was from a 2-node.
8 2 1 4 5 9 15 Parent pair was from a 2-node. Check an adjacent sibling and determine if it is a 3-node. If not, combine with sibling and parent pair.

14 Delete From A Leaf Parent pair was from a 2-node.
2 8 1 4 5 9 15 Parent pair was from a 2-node. Check an adjacent sibling and determine if it is a 3-node. No sibling, so must be the root. Discard root. Left child becomes new root.

15 Delete From A Leaf 2 8 1 4 5 9 15 Height reduces by 1.

16 Delete A Pair Deletion from interior node is transformed into a deletion from a leaf node. Deficient leaf triggers bottom-up borrowing and node combining pass. Deficient node is combined with an adjacent sibling who has exactly ceil(m/2) – 1 pairs. After combining, the node has [ceil(m/2) – 2] (original pairs) + [ceil(m/2) – 1] (sibling pairs) + 1 (from parent) <= m –1 pairs.

17 Disk Accesses Minimum. Borrow. Combine. 15 20 4 5 6 30 40 13 16 17
4 5 6 13 7 12 9 8 10 3 Minimum. Minimum when deleting from a leaf is when you delete 5, for example. This min is h+1 accesses. When deleting from the interior, minimum is h+2. Borrow: read sibling, write 2 siblings and parent. Borrow terminates the delete. Combine: read sibling, write combined node, propagate up as parent degree has reduced by 1. Borrow. Combine.

18 Worst-Case Disk Accesses
Assume enough memory to hold all h nodes accessed on way down. h read accesses on way down. h – 1 sibling read accesses on way up. h – 2 writes of combined nodes on way up. 3 writes of root and level 2 nodes for sibling borrowing at level 2. Total is 3h. Combine at level 21 write of level 2 node, 1 write of root.

19 Average Disk Accesses Start with B-tree that has n pairs and p nodes.
Delete the pairs one by one. n >= 1+(ceil(m/2) – 1)(p – 1). p <= 1 + (n – 1)/(ceil(m/2) – 1). Upper bound on total number of disk accesses. Each delete does a borrow. The deletes together do at most p –1 combines/merges. # accesses <= n(h+4) + 2(p – 1). Actually, only p-2 combines are possible as when the number of nodes is down to 3, a combine reduces this to 1. H+4 => h reads on way down, a sibling read, 3 writes. Note that a borrow terminates the operation and so combines, if any, are done before the assumed borrow and cost 3 accesses each (sibling read and write of combined operation; write of parent is part of next level combine or terminating borrow). Or n(h+5) if you account for deletes from the interior which require you to write the interior node as well (note that an interior delete may take 1 more write than a delete from a leaf because restructuring may not propagate all the way back to the interior node from which the delete occurred). Merges can only be done until the number of nodes becomes 3. So, number of merges is at most p-3. We use p-1 for simplicity. 2(p-1) => p-1 sibling reads and p-1 writes

20 Average Disk Accesses Average # accesses <= [n(h+4) + 2(p – 1)]/ n
Nearly minimum. A delete must make at least h read and 1 write access. So, h+1 is the minimum number of disk accesses for each delete. Therefore, the average is at least h+1.

21 Worst Case Alternating sequence of inserts and deletes.
Each insert does h splits at a cost of 3h + 1 disk accesses. Each delete moves back up to root at a cost of 3h disk accesses. Average for this sequence is 3h + 1 for an insert and 3h for a delete. To avoid worst-case, do lazy deletion. Each delete causes deleted item to be flagged. When enough deletes have been flagged, we do real deletes. Doesn’t affect search time as tree height grows very slowly.

22 Internal Memory B-Trees
Cache access time vs main memory access time. Reduce main memory accesses using a B-tree.

23 Node Structure q a0 p1 a1 p2 a2 … pq aq
Node operations during a search. Search the node for a given key. a’s are subtree pointers, p’s are dictionary pairs

24 Node Operations For Insert
Insert a dictionary pair and a pointer (p, a). m a0 p1 a1 p2 a2 … pm am ceil(m/2)-1 a0 p1 a1 p2 a2 … pceil(m/2)-1 aceil(m/2)-1 m-ceil(m/2) aceil(m/2) pceil(m/2)+1 aceil(m/2)+1 … pm am Find middle pair. 3-way split around middle pair.

25 Node Operations For Delete
Delete a dictionary pair. Borrow. Delete, replace, insert. Combine. 3-way join.

26 Node Structure Each B-tree node is an array partitioned into indexed red-black tree nodes that will keep one dictionary pair each. Indexed red-black tree is built using simulated pointers (integer pointers).

27 Complexity Of B-Tree Node Operations
Search a B-tree node … O(log m). Find middle pair … O(log m). Insert a pair … O(log m). Delete a pair … O(log m). Split a B-tree node … O(log m). Join 2 B-tree nodes … O(m). Need to copy indexed red-black tree that represents one B-tree node into the array space of the other B-tree node. For internal memory applications, split will need to copy one part to new memory using O(m) time. Note must use arrays and integer pointers for internal memory as well so as to fetch a B-tree node (that is comprised of many red-black nodes) with a single cache miss. For internal memory operations, the optimal m is in the 30 to 50 range and the array representation of a node works well.


Download ppt "B-Trees (continued) Analysis of worst-case and average number of disk accesses for an insert. Delete and analysis. Structure for B-tree node."

Similar presentations


Ads by Google