CSE 4101/5101 Prof. Andy Mirzaian B-trees 2-3-4 trees.

Slides:



Advertisements
Similar presentations
Simplifications of Context-Free Grammars
Advertisements

Mathematical Preliminaries
TK1924 Program Design & Problem Solving Session 2011/2012
한양대학교 정보보호 및 알고리즘 연구실 이재준 담당교수님 : 박희진 교수님
Chapter 13. Red-Black Trees
COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables II.
1
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
On-line Construction of Suffix Trees Chairman : Prof. R.C.T. Lee Speaker : C. S. Wu ( ) June 10, 2004 Dept. of CSIE National Chi Nan University.
David Burdett May 11, 2004 Package Binding for WS CDL.
CALENDAR.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
The 5S numbers game..
CSE 4101/5101 Prof. Andy Mirzaian Augmenting Data Structures.
CS16: Introduction to Data Structures & Algorithms
Binary Tree Structure a b fe c a rightleft g g NIL c ef b left right pp p pp left key.
Break Time Remaining 10:00.
Turing Machines.
PP Test Review Sections 6-1 to 6-6
Chapter 17 Linked Lists.
COMP171 Fall 2005 Lists.
Chapter 10: Applications of Arrays and the class vector
Abstract Data Types and Algorithms
Data structure is concerned with the various ways that data files can be organized and assembled. The structures of data files will strongly influence.
1 Linked Lists A linked list is a sequence in which there is a defined order as with any sequence but unlike array and Vector there is no property of.
Briana B. Morrison Adapted from William Collins
Hash Tables.
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
CSE 373 Data Structures and Algorithms
David Luebke 1 8/25/2014 CS 332: Algorithms Red-Black Trees.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
CSE 4101/5101 Prof. Andy Mirzaian. Lists Move-to-Front Search Trees Binary Search Trees Multi-Way Search Trees B-trees Splay Trees Trees Red-Black.
© 2012 National Heart Foundation of Australia. Slide 2.
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of CHAPTER 11: Priority Queues and Heaps Java Software Structures: Designing.
B Tree.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
10 -1 Chapter 10 Amortized Analysis A sequence of operations: OP 1, OP 2, … OP m OP i : several pops (from the stack) and one push (into the stack)
Binary Search Trees Ravi Chugh March 28, Review: Linked Lists Goal: Program that keeps track of friends Problem: Arrays have fixed length Solution:
Splay Trees Binary search trees.
Essential Cell Biology
Chapter 12 Binary Search Trees
CSE Lecture 17 – Balanced trees
Rizwan Rehman Centre for Computer Studies Dibrugarh University
PSSA Preparation.
Foundations of Data Structures Practical Session #7 AVL Trees 2.
Animated demo: B- Trees Slide Credit.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 13 Pointers and Linked Lists.
Physics for Scientists & Engineers, 3rd Edition
COL 106 Shweta Agrawal, Amit Kumar
EECS 4101/5101 Prof. Andy Mirzaian. Lists Move-to-Front Search Trees Binary Search Trees Multi-Way Search Trees B-trees Splay Trees Trees Red-Black.
EECS 4101/5101 Prof. Andy Mirzaian. Lists Move-to-Front Search Trees Binary Search Trees Multi-Way Search Trees B-trees Splay Trees Trees Red-Black.
AVL Tree Smt Genap Outline AVL Tree ◦ Definition ◦ Properties ◦ Operations Smt Genap
1 B trees Nodes have more than 2 children Each internal node has between k and 2k children and between k-1 and 2k-1 keys A leaf has between k-1 and 2k-1.
Tirgul 6 B-Trees – Another kind of balanced trees.
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
2-3 Tree. Slide 2 Outline  Balanced Search Trees 2-3 Trees Trees.
COMP261 Lecture 23 B Trees.
B+ Tree.
Wednesday, April 18, 2018 Announcements… For Today…
Multi-Way Search Trees
(2,4) Trees /6/ :26 AM (2,4) Trees (2,4) Trees
Presentation transcript:

CSE 4101/5101 Prof. Andy Mirzaian B-trees 2-3-4 trees

DICTIONARIES SELF ADJUSTING WORST-CASE EFFICIENT Lists Move-to-Front Search Trees Binary Search Trees Multi-Way Search Trees B-trees Splay Trees 2-3-4 Trees Red-Black Trees SELF ADJUSTING WORST-CASE EFFICIENT competitive competitive? Linear Lists Multi-Lists Hash Tables

References: [CLRS] chapter 18

B-trees R. Bayer, E.M. McCreight, “Organization and maintenance of large ordered indexes,” Acta Informatica 1(3), 173-189, 1972. Boeing Company B…

Definition B-trees are a special class of multi-way search trees. Node size: d[x] = degree of node x, i.e., number of subtrees of x. d[x] - 1 = number of keys stored in node x. (This is n[x] in [CLRS].) Definition: Suppose d  2 is a fixed integer. T is a B-tree of order d if it satisfies the following properties: 1. Search property: T is a multi-way search tree. 2. Perfect balance: All external nodes of T have the same depth (h). 3. Node size range: d  d[x]  2d  nodes x  root[T] 2  d[x]  2d for x = root[T].

Example A B-tree of order d = 3 with n = 27 keys. h = 3 height, 36 12 18 26 42 48 2 4 6 8 10 14 16 20 22 24 28 30 32 34 38 40 44 46 50 52 54 h = 3 height, including external nodes A B-tree of order d = 3 with n = 27 keys.

Warning Perfect Balance: All external nodes of T have the same depth. [CLRS] says: All leaves have the same depth. The only leaf in this tree

Applications External Memory Dictionary: Large dictionaries stored in external memory. Each node occupies a memory page. Each node access triggers a slow page I/O. Keep height low. Make d large. A memory page should hold the largest possible node (of degree 2d). Typical d is in the range 50 .. 2000 depending on page size. Internal Memory Dictionary: 2-3-4 tree = B-tree of order 2.

h = Q ( log d n) How small or large can a B-tree of order d and height h be? Maximum n occurs when every node is full, i.e., has degree 2d: Minimum n occurs when every node has degree d, except the root that has degree 2: So, height grows logarithmic in n and inverse logarithmic in d:

Height in B-trees & 2-3-4 trees This includes the level of external nodes.

SEARCH Algorithm: Simply follow the search path for the given key. Complexity: At most O(h) nodes probed along the search path. Each node has O(d) keys. # I/O operations: O(h) = O( logd n) = O(log n / log d). So, for external memory use keep d high. Page size is the limiting factor. Search time: binary search probing within node: O(h log d) = O(log n). sequential probing within node: O(hd) = O(d log n / log d). INSERT & DELETE (to be shown) also take O(hd) time. So, for internal memory use keep d low (e.g., 2-3-4 tree).

Local Restructuring INSERT and DELETE need the following local operations: Node splitting Node fusing Key sharing (or key borrowing) The first one is used by INSERT. The last 2 are used by DELETE. Each of them takes O(d) time.

Node Splitting & Fusing Split(x) Parent p gains a new key. (If p was full, it needs repair too.) If x was the root, p becomes the new root (of degree 2) & tree height grows by 1. Fuse (x’, ad, x’’) Parent p loses a key. (If non-root p was a d-node, it needs repair too.) If p was the root & becomes empty, x becomes the new root & tree height shrinks by 1. O(d) time p … bi-1 bi … T1 Td Td+1 T2d a1 … ad-1 ad ad+1 … a2d-1 x … a … bi-1 ad bi … x’ …… ……. a1 … … ad-1 ad+1 … a2d-1 x” a’ a”

Key Sharing …… ... … KeyShare(x,y) T1 Td x p Td+1 Td+2 y Td+3 Node y is the (left or right) immediate sibling of x. d[x]=d, d[y]>d. x “borrows” a key from y to make d[x] > d. Note: the inorder sequence of keys is not disturbed. O(d) time … ci-1 ci ci+1 … T1 Td x p Td+1 Td+2 …… ... a1 … … ad-1 b1 b2 b3 … y Td+3 … ci-1 b1 ci+1 … … a1 … … ad-1 ci b2 b3 …

Bottom-Up INSERT Bottom-Up INSERT (K,T) Follow the search path of K down B-tree T. if K is found then return Otherwise, we are at a leaf x where K needs to be inserted. K’  K (* the insertion key at the current node x *) while d[x] = 2d do (* repeated node splitting up the search path *) Split x into x’ and x’’ with middle key keyd[x] if K’ < keyd[x] then insert K’ into x’ else insert K’ into x” K’  keyd[x] x  p[x] end-while if x  nil then insert K’ into x else do create a new node r root[T]  r insert K’ into r make x’ and x’’ the two children of r end-else end O(d logd n) time

Top-Down INSERT Top-Down INSERT (K,T) As you follow the search path of K down B-tree T, keep splitting every full node along the way and insert its middle key into its (now non-full) parent. If the root was full and got split, a new degree 2 root would be created during this process. if K is found then return else insert K into the (now non-full) leaf you end up. end O(d logd n) time Pros and Cons: Top-Down Insert tends to over split. It could potentially split O(h) nodes, while the Bottom-Up Insert may not split at all! Top-Down makes only one pass down the search path and does not need parent pointers, while Bottom-Up may need to climb up to repair by repeated node splitting.

Top-Down DELETE Top-Down DELETE (K,T) The idea is to move the current node x down the search path of K in T and maintain: Loop Invariant: d[x] > d or x=root[T]. By LI, x can afford to lose a key. At the start x = root[T] and LI is maintained. Case 0: Kx and x is a leaf. Return Case 1: Kx and we have to move to child y of x. Case 1a: d[y] > d: x  y (*LI is maintained *) Case 1b: d[y] = d, d[z] > d, (z=left/right imm. sib. of y) KeyShare(y,z); x  y (*LI is maintained *) Case 1c: d[y] = d & d[z]=d separated by key D at x: x  Fuse(y,D,z) (* LI is maintained *) Case 2: Kx and we have to remove K from x: see next page x root[T] … D … y z Continued

Top-Down DELETE Top-Down DELETE (K,T) Case 2: Kx and we have to remove K from x: x root[T] … K … y z K’ K” Case 2a: x is a leaf: (* LI *) remove K from x If x=root[T] & becomes empty, set root[T]  nil Case 2b: x is not a leaf: Let y and z be children of x imm. left/right of K Case 2b1: d[y] > d (* LI *) Recursively remove predecessor K’ of K from subtree rooted at y and replace K in x by K’. Case 2b2: d[y] = d, d[z] > d (* LI *) Recursively remove successor K’’ of K from subtree rooted at z and replace K in x by K”. Case 2b3: d[y] = d[z] = d x  Fuze(y,K,z) (* LI *) Repeat Case 2 (one level lower in T). O(d logd n) time Bottom-Up Delete left as exercise

2-3-4 trees

2-3-4 tree This includes the level of external nodes. 2-3-4 tree is a B-tree of order 2, i.e., a multi-way search tree with 2-nodes, 3-nodes, 4-nodes, and all external nodes at the same depth. This includes the level of external nodes. 36 12 18 26 42 48 6 8 10 14 16 20 22 24 28 30 32 38 40 44 46 50 52 54 h = 3 height, including external nodes

Example: Bottom-Up Insert C D E F G’ H I J K L M G” i f, g l h, j e, k c d, m a b h=5 h=4 c, d, e j, k, l a, b, m A B C D E F f, h, i G H I J K L M Insert g Insert g

Exercises

Bottom-Up Delete: Design and analyze an efficient implementation of bottom-up Delete on B-trees. Insert & Delete Examples: Demonstrate the following processes, each time starting with the B-tree of order 3 example on page 6 of these slides. (a) Insert 1. (b) Delete 48 top-down. (c) Delete 48 bottom-up. B-tree Insertion sequence: Draw the B-tree of order d resulting from inserting the following keys, in the given order, using bottom-up insert, into an initially empty tree: 4, 40, 23, 50, 11, 34, 62, 78, 66, 22, 90, 59, 25, 72, 64, 77, 39, 12, (a) with d = 3. (b) with d = 4. 2-3-4 tree Insertion sequence: Insert integers 1..32 in that order into an initially empty 2-3-4 tree. (a) Show some intermediate snapshots as well as the final 2-3-4 tree. (b) Would you get a different resulting tree with top-down versus bottom-up Insert? (c) Can you describe the general pattern for the insertion sequence 1..n? [Hint: any connection with the Binary Counter with the Increment operation?] Split and Join on 2-3-4 trees: These are cut and paste operations on dictionaries. The Split operation takes as input a dictionary (a set of keys) A and a key value K (not necessarily in A), and splits A into two disjoint dictionaries B = { xA | key[x]  K } and C = { xA | key[x] > K }. (Dictionary A is destroyed as a result of this operation.) The Join operation is essentially the reverse; it takes two input dictionaries A and B such that every key in A < every key in B, and replaces them with their union dictionary C = AB. (A and B are destroyed as a result of this operation.) Design and analyze efficient Split and Join on 2-3-4 trees. [Note: Definition of Split and Join here is the same we gave on BST’s and slightly different than the one in [CLRS, Problem 18-2, pp: 503-504].]

B. -trees: A B. -tree T (of order d>0) is a variant of a B-tree B*-trees: A B*-tree T (of order d>0) is a variant of a B-tree. The only difference is the node size range. More specifically, for each non-root node x, 2d  d[x]  3d ( i.e., at least 2/3 full. ) (a) Specify appropriate lower and upper bounds on d[root[T]] of a B*-tree. (b) Describe an insertion procedure for B*-trees. What is the running time? [Hint: Before splitting a node see whether its sibling is full. Avoid splitting if possible.) (c) What are the advantages of a B*-tree over a standard B-tree?

END