Data Structures 2-3-4 Trees Phil Tayco Slide version 1.0 Apr. 23, 2015.

Slides:



Advertisements
Similar presentations
Chapter 4: Trees Part II - AVL Tree
Advertisements

B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
A balanced life is a prefect life.
Binary Search Trees Briana B. Morrison Adapted from Alan Eugenio.
Data Structures Hash Tables
Data Structures Advanced Sorts Part 2: Quicksort Phil Tayco Slide version 1.0 Mar. 22, 2015.
CMPT 225 Priority Queues and Heaps. Priority Queues Items in a priority queue have a priority The priority is usually numerical value Could be lowest.
Data Structures Data Structures Topic #8. Today’s Agenda Continue Discussing Table Abstractions But, this time, let’s talk about them in terms of new.
Trees and Red-Black Trees Gordon College Prof. Brinton.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Self-Balancing Search Trees Chapter 11. Chapter 11: Self-Balancing Search Trees2 Chapter Objectives To understand the impact that balance has on the performance.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
Fall 2007CS 2251 Self-Balancing Search Trees Chapter 9.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Self-Balancing Search Trees Chapter 11. Chapter Objectives  To understand the impact that balance has on the performance of binary search trees  To.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
Chapter 12 Trees. Copyright © 2005 Pearson Addison-Wesley. All rights reserved Chapter Objectives Define trees as data structures Define the terms.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
CS4432: Database Systems II
1 CSC 427: Data Structures and Algorithm Analysis Fall 2010 transform & conquer  transform-and-conquer approach  balanced search trees o AVL, 2-3 trees,
1 HEAPS & PRIORITY QUEUES Array and Tree implementations.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
1 Search Trees - Motivation Assume you would like to store several (key, value) pairs in a data structure that would support the following operations efficiently.
CSCE 3110 Data Structures & Algorithm Analysis Binary Search Trees Reading: Chap. 4 (4.3) Weiss.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 6.
Searching: Binary Trees and Hash Tables CHAPTER 12 6/4/15 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education,
Data Structures Binary Trees Phil Tayco Slide version 1.0 Mar. 22, 2015.
INTRODUCTION TO BINARY TREES P SORTING  Review of Linear Search: –again, begin with first element and search through list until finding element,
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
Chapter 13 B Advanced Implementations of Tables – Balanced BSTs.
COSC 2007 Data Structures II Chapter 15 External Methods.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Starting at Binary Trees
File Organization and Processing Week Tree Tree.
Chapter 12 B+ Trees CS 157B Spring 2003 By: Miriam Sy.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
Balanced Search Trees Chapter 19 Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013.
 B-tree is a specialized multiway tree designed especially for use on disk  B-Tree consists of a root node, branch nodes and leaf nodes containing the.
AVL Trees and Heaps. AVL Trees So far balancing the tree was done globally Basically every node was involved in the balance operation Tree balancing can.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
1 Binary Search Trees  Average case and worst case Big O for –insertion –deletion –access  Balance is important. Unbalanced trees give worse than log.
Course: Programming II - Abstract Data Types HeapsSlide Number 1 The ADT Heap So far we have seen the following sorting types : 1) Linked List sort by.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
(c) University of Washington20c-1 CSC 143 Binary Search Trees.
B/B+ Trees 4.7.
Multiway Search Trees Data may not fit into main memory
Extra: B+ Trees CS1: Java Programming Colorado State University
B+-Trees.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B+-Trees.
B+-Trees.
Lecture 22 Binary Search Trees Chapter 10 of textbook
COMP 103 Binary Search Trees.
Phil Tayco Slide version 1.0 May 7, 2018
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Find in a linked list? first last 7  4  3  8 NULL
B- Trees D. Frey with apologies to Tom Anastasio
B- Trees D. Frey with apologies to Tom Anastasio
A Robust Data Structure
2-3-4 Trees Red-Black Trees
B- Trees D. Frey with apologies to Tom Anastasio
Self-Balancing Search Trees
CSC 143 Binary Search Trees.
Tree (new ADT) Terminology: A tree is a collection of elements (nodes)
Presentation transcript:

Data Structures Trees Phil Tayco Slide version 1.0 Apr. 23, 2015

2-3-4 Trees Binary trees revisited Binary trees combine the best of both worlds of dynamic memory usage and performing binary search like you could with a sorted array The search algorithm with a binary tree will only achieve O(log n) as long as the tree is balanced The balance of a tree is dependent on the inserting and deleting of nodes which can lead to imbalance Imbalance leads to O(n) search performance which is basically a linked list

2-3-4 Trees Advanced tree ideas As with other data structures, we try to address the cons For trees, we want to efficiently maintain balance as inserts and deletes are performed There are tree algorithms that already look at ways to do this: –AVL trees –Red-black trees These trees keep the basic structure of a node As you would guess, the function algorithms are more complex than the standard tree

2-3-4 Trees Multiway tree What if we modified the tree node instead? Notice each node here contains multiple data elements and multiple child links The modified structure is interesting, but needs to work within a set of rules to guarantee balance 40 root

2-3-4 Trees Multiway tree A non-leaf node with 1 data item always has 2 children root

2-3-4 Trees Multiway tree A non-leaf node with 2 data item always has 3 children 40 root

2-3-4 Trees Multiway tree A non-leaf node with 3 data item always has 4 children 40 root

2-3-4 Trees Multiway tree Leaf nodes can have any number of data items 40 root

2-3-4 Trees Multiway tree As before, child nodes to the left and right of a data item are less and greater to maintain order 40 root

2-3-4 Trees Similarities to Binary trees While the number of items and node children have increased, the basic order is the same This promotes a search and insert performance similar to binary trees at O(log n) Search starts at root examining data items against the search value and traverses down nodes appropriately Insert adds new data items at the appropriate leaf level The algorithms will show that balance will always be achieved. This makes search and insert perform at O(log n)

2-3-4 Trees Insert New data items will be inserted at the leaf level In order to maintain balance, as we perform the normal search for the appropriate leaf to insert the new data element, we add a rule to the algorithm: –When visiting any node, if it is full, “split” the node –Whether or not a split has occurred, continue down the path using the standard search until a leaf node is reached –Once a leaf is reached, add the new data element to it (if it is full, perform another “split”)

2-3-4 Trees Split The splitting of a node requires creating a new or modifying an existing parent node as well as creating a new sibling node Data elements are moved and child pointers are readjusted as follows: –A new node is created as a sibling to the full node –The 3 rd data item of the full node is moved to the sibling node as its 1 st data item –The 2 nd data item of the full node is added to the parent node –The 1 st data item of the full node remains where it is –The 3 rd and 4 th child pointers of the full node move to the sibling node as its 1 st and 2 nd child pointers

2-3-4 Trees Split example 1 We want to add 5 to the tree below. We start at root, 1 st data item is 14 so we go down the 1 st child pointer. We see it’s full so we must split it root

2-3-4 Trees root (parent) Step 1: Create new sibling node Notice parent node in this case is root and the sibling is not yet attached to the parent (the 2 nd child pointer of root is still connected as such) (current) (sibling)

2-3-4 Trees root (parent) (current) 10 (sibling) Step 2: Move 3 rd item to as 1 st item of new node 10 of current moves to new sibling node

2-3-4 Trees 14 root (parent) (current) 10 (sibling) Step 3: Move 2 nd item to parent Notice 6 is inserted into the data item list of parent. This shifts 14 as well as its 2 child pointers

2-3-4 Trees 14 root (parent) (current) 10 (sibling) Step 5: Move 3 rd and 4 th child pointers as 1 st and 2 nd child pointers of sibling This keeps the parent-child relationships and orders intact and balanced

2-3-4 Trees Split Analysis The split keeps the non-leaf and leaf rules intact Guarantees non-leaf nodes with 1, 2 or 3 data items have 2, 3 or 4 child nodes The split is performed as full nodes are encountered on the way down In the previous example, the insert of 5 still has not been performed The insert process resumes at the parent. Note that if the parent is full as a result of the split, a split at that node is not performed

2-3-4 Trees Resume insert at parent 5 is less than 6 so we go down child pointer 1. 5 is greater than 3 and there is only 1 data item, so we go down 2 nd child pointer. Node with data item 4 is a leaf and is not full so we add 5 there.

2-3-4 Trees Insert Analysis The algorithm keeps the tree balanced New nodes are created as needed by adding siblings before adding levels Levels are increased when the root node is the one that requires splitting When splitting the root, the same split algorithm applies, but instead of adding the 2 nd data item to the parent node, a new parent node is created (as the new root)

2-3-4 Trees Splitting the root Here, we will insert 15. Before we even go down a child node, we must split the root because it is full 40 root

2-3-4 Trees Step 1: Create the sibling node The algorithm works the same as before, except there is no “parent” node (yet) 40 root (current) (sibling)

2-3-4 Trees Step 2: Create new root as parent Since the current node is root, we create another new node to be the parent (and new root) 40 root (current) (sibling) (parent)

2-3-4 Trees Step 3: Move data items The normal split occurs. 3 rd item of current moves to 1 st of sibling and 2 nd item of current moves to 1 st of parent root (current) (sibling) 40 (parent)

2-3-4 Trees Step 4: Update pointers 3 rd and 4 th child pointers of current become 1 st and 2 nd of sibling. 1 st and 2 nd of new parent get current and sibling nodes respectively root (current) (sibling) 40 (parent)

2-3-4 Trees Step 5: New root and continue Make the parent the new root of the tree. Resume the insert from the root (15 will end up going down and added to leaf node with 10) Notice the full leaf node 30, 31, 32 is not split. This is because it is never visited (root)

2-3-4 Trees Insert Analysis Splitting will only occur when a visited node is full, keeping the tree rules intact Levels of the tree increase “upward” when the root node is full (because the new parent is created at that moment and becomes the new root) Splitting a leaf node will never result in more than 4 children for a parent node (if the parent node had 4 children, it would be full and split before reaching any of the child leaf nodes) Balance is maintained because even if one side gets “heavy” with data items, the number of nodes will remain balanced because of the splitting algorithm Best practice at understanding the algorithm is to insert a series of numbers and draw the resulting tree

2-3-4 Trees public class Node234 { private int numItems; private Node234 parent; private Node234[] children; private int[] dataItems;

2-3-4 Trees public Node234() { numItems = 0; parent = null; children = new Node234[4]; dataItems = new int[3]; for (int n = 0; n < 4; n++) children[n] = null; for (int n = 0; n < 3; n++) dataItems[n] = -1; }

2-3-4 Trees public class Tree234 { private Node234 root; public Tree234() { root = new Node234(); }

2-3-4 Trees Node234 and Tree234 Code More properties needed here for the node –numItems to keep track of how many data items are in the node –Reference to parent node (useful for handling splits) –Array of child pointers –Array of data items The array sizes are defined in the constructor and initialized to null (for children) and -1 (for data items) We could also use a Linked List for the child and data arrays, but they are so small, we don’t necessarily need to (and simplifying the code to start) The Tree is just the root node. Note that it is not initialized to null, but to a new Node234 object with no data items

2-3-4 Trees public void insert(int value) { Node234 current = root; while(true) { if(current.isFull()) { split(current); current = current.getParent(); current = getNextChild(current, value); }

2-3-4 Trees Tree234 Insert Code We start with a current node at root The loop plans to go down child nodes of the tree until we reach a leaf Along the way, if the node.isFull method returns true, we have to split it After the split, we set current to its parent followed by finding the appropriate child to go to based on the value to be inserted Many methods being used here: isFull, split, getParent and getnextChild

2-3-4 Trees public boolean isFull() { return (numItems == 3); } public Node234 getParent() { return parent; } // Note: these methods appear in the Node234 class (split and getNextChild are in Tree234)

2-3-4 Trees private void split(Node234 n) { int thirdItem = n.removeItem(); int secondItem = n.removeItem(); Node234 fourthChild = n.removeChild(3); Node234 thirdChild = n.removeChild(2); Node234 sibling = new Node234(); Node234 parent;

2-3-4 Trees Tree234 Split Code It is important now if you haven’t been drawing pictures to go through code that you do so now… Split begins with removing the 2 nd and 3 rd data items from the full node and storing their values – these will be transferred to the parent and sibling nodes respectively We do the same with disconnecting the 3 rd and 4 th child pointers of the node (so we can transfer them to the sibling) We then create a new sibling node and a parent pointer (parent is not a new node yet as we haven’t determined if the full node is root at this point) The setup is complete, but there are 2 new methods in Node234 to review: removeItem and removeChild

2-3-4 Trees public int removeItem() { int lastItem = dataItems[numItems - 1]; dataItems[--numItems] = -1; return lastItem; } // This removes the last data item in the data array (setting it to -1), decrements numItems and returns the value that was removed

2-3-4 Trees public Node234 removeChild(int n) { Node234 child = children[n]; children[n] = null; return child; } // This sets the given child of the node to null while returning a reference to that child // Now we can look at the next part of the split function…

2-3-4 Trees if (n == root) { parent = new Node234(); root = parent; root.setChild(0, n); } else parent = n.getParent(); // If the node being split is root, now create a new node as parent and root and set its first child to the current node // Otherwise, a parent exists and we just get it

2-3-4 Trees int itemLocation = parent.insertItem(secondItem); int parentItems = parent.getNumItems(); int c = parentItems - 1; while (c > itemLocation) { Node234 temp = parent.removeChild(c); parent.setChild(c + 1, temp); c--; } parent.setChild(itemLocation + 1, sibling);

2-3-4 Trees Tree234 Split Code – adjusting the parent The second item from the full node being split is inserted into the parent node using the Node’s insertItem function The location of that insert can vary, so it is returned here to determine how to adjust the child pointers of the parent This is done by getting the number of items in the parent and using a loop down to the location of the new item that was inserted –At each iteration, we remove the child pointer on its right and set it equal to the pointer on its left – this shifts the child pointers to the right that are after the inserted item Once that shift is complete, there will be a “hole” to the right of where the item inserted into the parent took place This hole is filled by connecting it to the new sibling node just created! Notice we have more Node234 functions: insertItem and getNumItems…

2-3-4 Trees public int getNumItems() { return numItems; } // This method is a standard get function of a class, returning the numItems property // insertItem is not as simple…

2-3-4 Trees public int insertItem(int data) { numItems++; int c = 0; for (int n = 2; n >= 0; n--) { if (dataItems[n] == -1) continue; // From right to left of the data items array, we check for non-empty data items (denoted as not equal to -1), if a spot is empty, ignore it

2-3-4 Trees else { int d = dataItems[n]; if (data < d) dataItems[n + 1] = dataItems[n]; else { dataItems[n + 1] = data; return n + 1; } dataItems[0] = data; return 0; }

2-3-4 Trees Node234 Code – inserting a data item The “else” branch here deals with encountering a data item as we go right to left in the data array looking for the correct place to insert the new data item When a data item is found, compare it to the new item –If the new item is less than it, the new item belongs to the left so we shift the data item in the array to the right by 1 –Otherwise, the new data item belongs to the right of this item in the array so we set it there and return that index If we reach the end of the loop, that means all data items in the array shifted to the right and the new item belongs in the first spot (index 0). We insert it there and return that index A lot of bouncing back and forth between Node234 and Tree234! We’re almost done though. At this point, the we’ve created the sibling node, and inserted the 2 nd data item of the full node into the parent (created or existing) All that is left in the split function is to set the sibling to the new data and child pointers

2-3-4 Trees sibling.insertItem(thirdItem); sibling.setChild(0, thirdChild); sibling.setChild(1, fourthChild); } // Using the Node functions previously discussed, we insert the 3 rd data item from the full node into the sibling and set its 1 st and 2 nd child pointers to what was once the full node’s 3 rd and 4 th children

2-3-4 Trees Efficiency The insert algorithm and the splits with the tree guarantee balance The balance leads to an O(log n) category performance Each node contains 3 data items which imply extra data usage and impact to performance Question: is the impact on performance on with traversing each node’s data array significant? Question 2: is the array allocation of 3 elements per node a significant amount of data storage?

2-3-4 Trees Performance Worst case searches mean for each node visited at each level, the entire data array is traversed before finding the element or determining the next level to descend (this is also the tree’s maximum value) Because of the way the insert and split algorithms work, it is rare to see full nodes that haven’t been split on each level Also, even if each node on each level was full when visited, the number of data item searches will still be O(log n) proportional to the total number of data elements This makes the search performance ultimately comparable to a balanced binary search tree

2-3-4 Trees Data Storage With most nodes in the tree not usually full, that implies an amount of unused data space The math works out to about 2/7 of unused space based on the number of elements inserted into the tree Compared to self-balancing trees like red black trees and AVL trees, the amount of overhead to balance the tree is comparable to the amount of unused space (you get a little better performance with than the balancing trees with a relative price in data storage) Why not use a linked list instead of an array? There is an increased amount of overhead with doing that as well, but if that is necessary to relieve the unused space, it can be used

2-3-4 Trees Tree Traversal Displaying data in order with a binary search tree involved using simple recursion of displaying the subtree on the left, printing the current element and then displaying the subtree on the right The same concept can apply with a tree except you must now account for the multiple data items and child pointers: –If the current node is not null, print the child[0] subtree, print data item[0], print child[1] subtree –If the current node has 2 data items, also print data item[1] and then print the child[2] subtree –If the current node is full, also print data item[2] and then print the child[3] subtree

2-3-4 Trees private void displayInOrder(Node234 current) { if (current != null) { displayInOrder(current.getChild(0)); int n = current.getNumItems(); for (int c = 0; c < n; c++) { System.out.println(current.getItem(c)); displayInOrder(current.getChild(c+1)); }

2-3-4 Trees Delete As you can imagine, the delete function appears quite challenging: –Removing an item at the leaf level is not hard –Removing an item at a non-leaf level requires rearranging nodes and child pointers The “cop out” discussed with Binary Trees is even more necessary here –Make each data item a class with an additional “isDeleted” property –Mark data items as true for isDeleted when removed –Rebuild the tree as needed walking through the tree and inserting elements into a new tree that are not flagged for deletion –The new tree will still be a balanced tree

2-3-4 Trees Applications Guaranteed balance is a big advantage that you get with a over a binary search tree Minimized node count and balance also reduces the amount of node visits Reduced node visits can be useful in applications where nodes representing a significant data element is captured –Disk blocks as nodes mean less time to find a block of data on a track that takes time to find –Disk storage is a popular use of this data structure

2-3-4 Trees Other Multiway Trees 2-3 trees are similar to trees: –2 data items and 3 child pointers –Same non-leaf node rules apply Larger sized data item trees follow the same rules for number of data items and children (links = data items + 1) – this makes the insert and split algorithm the same 2-3 trees split only when the leaf is full and recursively split full parents up the tree (this keeps the number of splits necessary per insert to a minimum)

2-3-4 Trees Summary Whether self-balancing binary search tree or type tree, balance is the theme to keep performance at O(log n) Self balancing trees reduce the memory usage and makes that more dynamic while the algorithms for trees are not as complex The search is optimized by way of storing the data in some determined order Search can reach O(1) performance if that order was not as significant and the data elements could be mapped in a different way…