File Organization and Processing Week Tree Tree
2-3 Tree
Outline Balanced Search Trees 2-3 Trees Trees
Why care about advanced implementations? Same entries, different insertion sequence: Not good! Would like to keep tree balanced.
2-3 Trees each internal node has either 2 or 3 children all leaves are at the same level Features
2-3 Trees with Ordered Nodes 2-node3-node leaf node can be either a 2-node or a 3-node
Example of 2-3 Tree
What did we gain? What is the time efficiency of searching for an item?
Gain: Ease of Keeping the Tree Balanced Binary Search Tree 2-3 Tree both trees after inserting items 39, 38,... 32
Inserting Items Insert 39
Inserting Items Insert 38 insert in leaf divide leaf and move middle value up to parent result
Inserting Items Insert 37
Inserting Items Insert 36 insert in leaf divide leaf and move middle value up to parent overcrowded node
Inserting Items... still inserting 36 divide overcrowded node, move middle value up to parent, attach children to smallest and largest result
Inserting Items After Insertion of 35, 34, 33
Inserting so far
Inserting Items How do we insert 32?
Inserting Items creating a new root if necessary tree grows at the root
Inserting Items Final Result
70 Deleting Items Delete 70 80
Deleting Items Deleting 70: swap 70 with inorder successor (80)
Deleting Items Deleting 70:... get rid of 70
Deleting Items Result
Deleting Items Delete 100
Deleting Items Deleting 100
Deleting Items Result
Deleting Items Delete 80
Deleting Items Deleting 80...
Deleting Items Deleting 80...
Deleting Items Deleting 80...
Deleting Items Final Result comparison with binary search tree
Deletion Algorithm I 1.Locate node n, which contains item I 2.If node n is not a leaf swap I with inorder successor deletion always begins at a leaf 3.If leaf node n contains another item, just delete item I else try to redistribute nodes from siblings (see next slide) if not possible, merge node (see next slide) Deleting item I:
Deletion Algorithm II A sibling has 2 items: redistribute item between siblings and parent No sibling has 2 items: merge node move item from parent to sibling Redistribution Merging
Deletion Algorithm III Internal node n has no item left redistribute Redistribution not possible: merge node move item from parent to sibling adopt child of n If n's parent ends up without item, apply process recursively Redistribution Merging
Deletion Algorithm IV If merging process reaches the root and root is without item delete root
Operations of 2-3 Trees all operations have time complexity of log n
2-3-4 Tree
39 Introduction Multi-way Trees are trees that can have up to four children and three data items per node Trees: Very nice features –Are balanced, like RB Trees. –Slightly less efficient but easier to program. – Serve as an introduction to the understanding of B- Trees!! B-Trees: another kind of multi-way tree particularly useful in organizing external storage. –B-Trees can have dozens or hundreds of children.
40 Introduction to Trees Shape of nodes is a lozenge-shaped node. In a tree, all leaf nodes are at the same level. (but data can appear in all nodes)
41 Introduction The 2, 3, and 4 in the name refer to how many links to child nodes can potentially be contained in a given node. For non-leaf nodes, three arrangements are possible: –A node with only one data item always has two children –A node with two data items always has three children –A node with three data items always has four children.
Introduction Non-leaf nodes must always have one more child than it has data items (see below); –Equivalently, if the number of child links is L and the number of data items is D, then L = D
More Introductory stuff This critical relationship determines the structure of trees. A leaf node has no children, but can still contain one, two, or three data items; cannot be empty. Because a tree can have nodes with up to four children, it’s called a multiway tree of order
44 More Introductory stuff Binary and RB trees may be referred to as multiway trees of order 2 - each node can have up to two children. But note: in a binary tree, a node may have up to two child links (one or more may be null). In a tree, nodes with a single link are NOT permitted; a node with one data item must have two links (unless it is a leaf); nodes with two data items must have three children; nodes with three data items must have four children. (You will see this more clearly once we talk about how they are actually built.) These numbers are important. If a node has one data item, then it also points to lower level nodes that have values less than the value of this item and a pointer to a node that has values greater than or equal to this value. Nodes with two links is called a 2-node; a node with three links is called a 3-node; with four links, a 4-node. (no such thing as a 1-node).
Do you see any 2-nodes? 3-nodes? 4-nodes? Do you see: a node with one data item that has two links? a node with two data items having three children; a node with three data items having four children? node 4 node More Introductory Stuff
Nodes in a Tree
Tree Organization Very different organization than for a binary tree. First, we number the data items in a node 0,1,2 and the child links: 0,1,2,3. Very Important. Data items are always ascending: left to right in a node. Relationships between data items and child links is critical:
More on Tree’s Organization A B C Nodes w/key C See below: (Equal keys not permitted; leaves all on same level; upper level nodes often not full; tree balanced! Its construction always maintains its balance, even if you add additional data items. (ahead) All children in the subtree rooted at child 0 have key values less than key 0. All children in the subtree rooted at child 1 have key values greater than key 0 but less than key 1. All children in the subtree rooted at child 2 have key values greater than key 1 but less than key 2. All children in the subtree rooted at child 3 have key values greater than key
49 Searching a Tree A very nice feature of these trees. You have a search key; Go to root. If hit, –done. Else –Select the link that leads to the subtree with the appropriate range of values. –If you don’t find your target here, go to next child. (notice items are sequential – VIP later) – etc. Perhaps data will be ‘not found.’
50 Try it: search for 64, 40, node 4 node Start at the root. You search the root, but don't find the item. Because 64 is larger than 50, you go to child 1, which we will represent as 60/70/80. (Remember that child 1 is on the right, because the numbering of children and links starts at 0 on the left.) You don't find the data item in this node either, so you must go to the next child. Here, because 64 is greater than 60 but less than 70, you go again to child 1. This time you find the specified item in the 62/64/66 link.
51 Node Insertion Can be quite easy; sometimes very complex. –Can do a top-down or a bottom-up approach… But structure of tree must be maintained at all costs. Easy Approach: –Start with searching to find a spot. We like to insert at the leaf level. So, –This may very likely involve moving a data item around to maintain the sequential nature of the data in a leaf. –We will take the top down approach.
52 Node Split We will use a top-down tree. Full nodes are split on the way down, if we encounter a full node in looking for the insertion point. This approach keeps the tree balanced.
53 Node Split – Insertion Upon encountering a full node (searching for a place to insert…) 1. split that node at that time. 2. move highest data item from the current (full) node into new node to the right. 3. move middle value of node undergoing the split up to parent node (Know we can do all this because parent node was not full) 4. Retain lowest item in node. 5. Note: new node (to the right) only has one data item (the highest value)
54 Node Split – Insertion Upon encountering a full node: 6. Original (formerly full) node contains the lowest of the three values. 7. Rightmost two children of original full node are disconnected and connected to the new sibling of the original full node (to the right) (They must be disconnected, since their parent data is changed) Hooked to new sibling with links ‘lower than’ first data and >= first (and only) data item 7. Insert new data item into the proper leaf node. Please note: there can be multiple splits encountered en route to finding the insertion point.
55 Insert: Split is NOT the Root Node Insert … other stuff Split this node… 99 to be inserted … Want to add a data value of 99
56 Case 1 Insert: Split is NOT the root node Insert … other stuff moves up stays starts a new node 4. Two rightmost children of split node are reconnected to new node. 5. New data item moved in.
57 Splitting the Root Let’s label the 3 items say A, B and C respectively. A new node is created that becomes the new root and the parent of the node being split. A second new node is created that becomes a sibling of the node being split. Data item C is moved into the new sibling. Data item B is moved into the new root. Data item A remains where it is. The two rightmost children of the node being split are disconnected from it and connected to the new right-hand node. This process creates a new root that's at a higher level than the old one. Thus the overall height of the tree is increased by one.
58 Root Node Split
59 Splitting on the Way Down Note: once we hit a node that must be split (on the way down), we know that when we move a data value ‘up’ that ‘that’ node was not full. –May be full ‘now,’ but it wasn’t on way down. Algorithm is reasonably straightforward. Just remember: –1. You are splitting a 4-node. Node being split has three data values. Data on the right goes to a new node. Data on the left remains; data in middle is promoted upward; new data item is inserted appropriately. –2. We do a node split any time we encounter a full node and when we are trying to insert a new data value.
2-3-4 Tree: Insertion Inserting 60, 30, 10, 20, 50, 40, 70, 80, 15, 90, 100
2-3-4 Tree: Insertion Inserting 60, 30, 10, , 40...
2-3-4 Tree: Insertion Inserting 50, ,...
2-3-4 Tree: Insertion Inserting , 15...
2-3-4 Tree: Insertion Inserting 80,
2-3-4 Tree: Insertion Inserting
2-3-4 Tree: Insertion Inserting
2-3-4 Tree: Insertion Procedure Splitting 4-nodes during Insertion
2-3-4 Tree: Insertion Procedure Splitting a 4-node whose parent is a 2-node during insertion
2-3-4 Tree: Insertion Procedure Splitting a 4-node whose parent is a 3-node during insertion
Trees and Red-Black Trees
Trees and Red-Black Trees As we shall see later, these two data structures have very much in common. One of the main common features is they provide for balanced trees One can be converted into the other very easily. Even the operations applied to the two trees are equivalent. –We keep the tree balanced via node splits; in RB Trees, we balance via color flips and rotations.
Trees and Red-Black Trees - more One thing to remember, though, (we will see a lot about this later) is that the Red Black tree has nodes that contain a single item; trees contain nodes that can contain three data items and can have four children. –Thus, the number of levels of the two equivalent tree structures is quite different and –The amount of data stored at each node is quite different. –These factors affect search times and storage requirements. –These things come to bear later!!
73 Efficiency Considerations for Trees Searching: RB Trees: one node at each level must be visited Trees: one node must be visited here too, but –More data per node / level. –Searches are faster. recognize all data items at node must be checked in a tree, but this is very fast. Overall height of the RB Tree is generally twice tree, But all nodes in the tree are NOT always full Yet overall speed is slightly better in trees. Overall, for trees, the increased number of items (which increases processing / search times) per node processing tends to cancel out the increases gained from the decreased height of the tree – fewer node retrievals. So, the search times for a tree and for a balanced binary tree are approximately equal and both are O(log 2 n)
74 Efficiency Considerations for Trees Storage Trees: a node can have three data items and up to four references. –Can be an array of references or four specific variables. –IF not all of it is used there can be considerable waste. – In trees, quite common to see many nodes not full. RB Trees: balanced; contain few nodes w/ only one child. Almost all references are used. –Each node contains max number of data items: one. RB Trees: more efficient use of storage than trees for storage.