Presentation is loading. Please wait.

Presentation is loading. Please wait.

1431227-3 File Organization and Processing Week 3 2-3 Tree 2-3-4 Tree.

Similar presentations


Presentation on theme: "1431227-3 File Organization and Processing Week 3 2-3 Tree 2-3-4 Tree."— Presentation transcript:

1 1431227-3 File Organization and Processing Week 3 2-3 Tree 2-3-4 Tree

2 2-3 Tree

3 Outline  Balanced Search Trees 2-3 Trees 2-3-4 Trees

4 Why care about advanced implementations? Same entries, different insertion sequence:  Not good! Would like to keep tree balanced.

5 2-3 Trees  each internal node has either 2 or 3 children  all leaves are at the same level Features

6 2-3 Trees with Ordered Nodes 2-node3-node leaf node can be either a 2-node or a 3-node

7 Example of 2-3 Tree

8 What did we gain? What is the time efficiency of searching for an item?

9 Gain: Ease of Keeping the Tree Balanced Binary Search Tree 2-3 Tree both trees after inserting items 39, 38,... 32

10 Inserting Items Insert 39

11 Inserting Items Insert 38 insert in leaf divide leaf and move middle value up to parent result

12 Inserting Items Insert 37

13 Inserting Items Insert 36 insert in leaf divide leaf and move middle value up to parent overcrowded node

14 Inserting Items... still inserting 36 divide overcrowded node, move middle value up to parent, attach children to smallest and largest result

15 Inserting Items After Insertion of 35, 34, 33

16 Inserting so far

17

18 Inserting Items How do we insert 32?

19 Inserting Items  creating a new root if necessary  tree grows at the root

20 Inserting Items Final Result

21 70 Deleting Items Delete 70 80

22 Deleting Items Deleting 70: swap 70 with inorder successor (80)

23 Deleting Items Deleting 70:... get rid of 70

24 Deleting Items Result

25 Deleting Items Delete 100

26 Deleting Items Deleting 100

27 Deleting Items Result

28 Deleting Items Delete 80

29 Deleting Items Deleting 80...

30 Deleting Items Deleting 80...

31 Deleting Items Deleting 80...

32 Deleting Items Final Result comparison with binary search tree

33 Deletion Algorithm I 1.Locate node n, which contains item I 2.If node n is not a leaf  swap I with inorder successor  deletion always begins at a leaf 3.If leaf node n contains another item, just delete item I else try to redistribute nodes from siblings (see next slide) if not possible, merge node (see next slide) Deleting item I:

34 Deletion Algorithm II A sibling has 2 items:  redistribute item between siblings and parent No sibling has 2 items:  merge node  move item from parent to sibling Redistribution Merging

35 Deletion Algorithm III Internal node n has no item left  redistribute Redistribution not possible:  merge node  move item from parent to sibling  adopt child of n If n's parent ends up without item, apply process recursively Redistribution Merging

36 Deletion Algorithm IV If merging process reaches the root and root is without item  delete root

37 Operations of 2-3 Trees all operations have time complexity of log n

38 2-3-4 Tree

39 39 Introduction Multi-way Trees are trees that can have up to four children and three data items per node. 2-3-4 Trees: Very nice features –Are balanced, like RB Trees. –Slightly less efficient but easier to program. –  Serve as an introduction to the understanding of B- Trees!! B-Trees: another kind of multi-way tree particularly useful in organizing external storage. –B-Trees can have dozens or hundreds of children.

40 40 Introduction to 2-3-4 Trees Shape of nodes is a lozenge-shaped node. In a 2-3-4 tree, all leaf nodes are at the same level. (but data can appear in all nodes) 50 3060 70 80 10 20405562 64 667583 86 30

41 41 Introduction The 2, 3, and 4 in the name refer to how many links to child nodes can potentially be contained in a given node. For non-leaf nodes, three arrangements are possible: –A node with only one data item always has two children –A node with two data items always has three children –A node with three data items always has four children.

42 Introduction Non-leaf nodes must always have one more child than it has data items (see below); –Equivalently, if the number of child links is L and the number of data items is D, then L = D+1. 50 3060 70 80 10 20405562 64 667583 86 30

43 More Introductory stuff This critical relationship determines the structure of 2-3-4 trees. A leaf node has no children, but can still contain one, two, or three data items; cannot be empty.  Because a 2-3-4 tree can have nodes with up to four children, it’s called a multiway tree of order 4. 50 3060 70 80 10 20405562 64 667583 86 30

44 44 More Introductory stuff Binary and RB trees may be referred to as multiway trees of order 2 - each node can have up to two children. But note: in a binary tree, a node may have up to two child links (one or more may be null). In a 2-3-4 tree, nodes with a single link are NOT permitted; a node with one data item must have two links (unless it is a leaf); nodes with two data items must have three children; nodes with three data items must have four children. (You will see this more clearly once we talk about how they are actually built.) These numbers are important. If a node has one data item, then it also points to lower level nodes that have values less than the value of this item and a pointer to a node that has values greater than or equal to this value. Nodes with two links is called a 2-node; a node with three links is called a 3-node; with four links, a 4-node. (no such thing as a 1-node).

45 Do you see any 2-nodes? 3-nodes? 4-nodes? Do you see: a node with one data item that has two links? a node with two data items having three children; a node with three data items having four children? 50 3060 70 80 10 20405562 64 667583 86 30 2 node 4 node More Introductory Stuff

46 Nodes in a 2-3-4 Tree

47 47 2-3-4 Tree Organization Very different organization than for a binary tree. First, we number the data items in a node 0,1,2 and the child links: 0,1,2,3. Very Important. Data items are always ascending: left to right in a node. Relationships between data items and child links is critical:

48 More on 2-3-4 Tree’s Organization A B C Nodes w/key C See below: (Equal keys not permitted; leaves all on same level; upper level nodes often not full; tree balanced! Its construction always maintains its balance, even if you add additional data items. (ahead) All children in the subtree rooted at child 0 have key values less than key 0. All children in the subtree rooted at child 1 have key values greater than key 0 but less than key 1. All children in the subtree rooted at child 2 have key values greater than key 1 but less than key 2. All children in the subtree rooted at child 3 have key values greater than key 2. 0 12 3

49 49 Searching a 2-3-4 Tree A very nice feature of these trees. You have a search key; Go to root. If hit, –done. Else –Select the link that leads to the subtree with the appropriate range of values. –If you don’t find your target here, go to next child. (notice items are sequential – VIP later) – etc. Perhaps data will be ‘not found.’

50 50 Try it: search for 64, 40, 65 50 3060 70 80 10 20405562 64 667583 86 30 2 node 4 node Start at the root. You search the root, but don't find the item. Because 64 is larger than 50, you go to child 1, which we will represent as 60/70/80. (Remember that child 1 is on the right, because the numbering of children and links starts at 0 on the left.) You don't find the data item in this node either, so you must go to the next child. Here, because 64 is greater than 60 but less than 70, you go again to child 1. This time you find the specified item in the 62/64/66 link.

51 51 Node Insertion Can be quite easy; sometimes very complex. –Can do a top-down or a bottom-up approach… But structure of tree must be maintained at all costs. Easy Approach: –Start with searching to find a spot. We like to insert at the leaf level. So, –This may very likely involve moving a data item around to maintain the sequential nature of the data in a leaf. –We will take the top down approach.

52 52 Node Split We will use a top-down 2-3-4 tree. Full nodes are split on the way down, if we encounter a full node in looking for the insertion point. This approach keeps the tree balanced.

53 53 Node Split – Insertion Upon encountering a full node (searching for a place to insert…) 1. split that node at that time. 2. move highest data item from the current (full) node into new node to the right. 3. move middle value of node undergoing the split up to parent node (Know we can do all this because parent node was not full) 4. Retain lowest item in node. 5. Note: new node (to the right) only has one data item (the highest value)

54 54 Node Split – Insertion Upon encountering a full node: 6. Original (formerly full) node contains the lowest of the three values. 7. Rightmost two children of original full node are disconnected and connected to the new sibling of the original full node (to the right) (They must be disconnected, since their parent data is changed) Hooked to new sibling with links ‘lower than’ first data and >= first (and only) data item 7. Insert new data item into the proper leaf node. Please note: there can be multiple splits encountered en route to finding the insertion point.

55 55 Insert: Split is NOT the Root Node Insert 99 62 83 92 104 7487 8997112 … other stuff Split this node… 99 to be inserted … Want to add a data value of 99

56 56 Case 1 Insert: Split is NOT the root node Insert 99 62 92 83 7487 8997 99112 … other stuff 2. 92 moves up 104 3. 83 stays 1. 104 starts a new node 4. Two rightmost children of split node are reconnected to new node. 5. New data item moved in.

57 57 Splitting the Root Let’s label the 3 items say A, B and C respectively. A new node is created that becomes the new root and the parent of the node being split. A second new node is created that becomes a sibling of the node being split. Data item C is moved into the new sibling. Data item B is moved into the new root. Data item A remains where it is. The two rightmost children of the node being split are disconnected from it and connected to the new right-hand node. This process creates a new root that's at a higher level than the old one. Thus the overall height of the tree is increased by one.

58 58 Root Node Split

59 59 Splitting on the Way Down Note: once we hit a node that must be split (on the way down), we know that when we move a data value ‘up’ that ‘that’ node was not full. –May be full ‘now,’ but it wasn’t on way down. Algorithm is reasonably straightforward. Just remember: –1. You are splitting a 4-node. Node being split has three data values. Data on the right goes to a new node. Data on the left remains; data in middle is promoted upward; new data item is inserted appropriately. –2. We do a node split any time we encounter a full node and when we are trying to insert a new data value.

60 2-3-4 Tree: Insertion Inserting 60, 30, 10, 20, 50, 40, 70, 80, 15, 90, 100

61 2-3-4 Tree: Insertion Inserting 60, 30, 10, 20...... 50, 40...

62 2-3-4 Tree: Insertion Inserting 50, 40...... 70,...

63 2-3-4 Tree: Insertion Inserting 70...... 80, 15...

64 2-3-4 Tree: Insertion Inserting 80, 15...... 90...

65 2-3-4 Tree: Insertion Inserting 90...... 100...

66 2-3-4 Tree: Insertion Inserting 100...

67 2-3-4 Tree: Insertion Procedure Splitting 4-nodes during Insertion

68 2-3-4 Tree: Insertion Procedure Splitting a 4-node whose parent is a 2-node during insertion

69 2-3-4 Tree: Insertion Procedure Splitting a 4-node whose parent is a 3-node during insertion

70 70 2-3-4 Trees and Red-Black Trees

71 71 2-3-4 Trees and Red-Black Trees As we shall see later, these two data structures have very much in common. One of the main common features is they provide for balanced trees One can be converted into the other very easily. Even the operations applied to the two trees are equivalent. –We keep the 2-3-4 tree balanced via node splits; in RB Trees, we balance via color flips and rotations.

72 72 2-3-4 Trees and Red-Black Trees - more  One thing to remember, though, (we will see a lot about this later) is that the Red Black tree has nodes that contain a single item; 2-3-4 trees contain nodes that can contain three data items and can have four children. –Thus, the number of levels of the two equivalent tree structures is quite different and –The amount of data stored at each node is quite different. –These factors affect search times and storage requirements. –These things come to bear later!!

73 73 Efficiency Considerations for 2-3-4 Trees Searching: RB Trees: one node at each level must be visited. 2-3-4 Trees: one node must be visited here too, but –More data per node / level. –Searches are faster. recognize all data items at node must be checked in a 2-3-4 tree, but this is very fast. Overall height of the RB Tree is generally twice 2-3-4 tree, But all nodes in the 2-3-4 tree are NOT always full Yet overall speed is slightly better in 2-3-4 trees.  Overall, for 2-3-4 trees, the increased number of items (which increases processing / search times) per node processing tends to cancel out the increases gained from the decreased height of the tree – fewer node retrievals.  So, the search times for a 2-3-4 tree and for a balanced binary tree are approximately equal and both are O(log 2 n)

74 74 Efficiency Considerations for 2-3-4 Trees Storage 2-3-4 Trees: a node can have three data items and up to four references. –Can be an array of references or four specific variables. –IF not all of it is used there can be considerable waste. –  In 2-3-4 trees, quite common to see many nodes not full. RB Trees: balanced; contain few nodes w/ only one child. Almost all references are used. –Each node contains max number of data items: one. RB Trees: more efficient use of storage than 2-3-4 trees for storage.


Download ppt "1431227-3 File Organization and Processing Week 3 2-3 Tree 2-3-4 Tree."

Similar presentations


Ads by Google