Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Trees.

Similar presentations


Presentation on theme: "Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Trees."— Presentation transcript:

1 Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Trees

2 Motivation Accessing elements in a linear linked list can be prohibitive especially for large amounts of input Trees are simple data structures for which the running time of most operations is O(log N) on average For example, if N = 1 million:  Searching an element in a linear linked list requires at most O(N) comparisons (i.e. 1 million comparisons)  Searching an element in a binary search tree (a kind of tree) requires O(log 2 N) comparisons (  20 comparisons)

3 Motivation (cont’d) Trees are one of the most important and extensively used data structures in computer science File systems of several popular operating systems are implemented as trees  For example My Documents/ My Pictures/ 0001.jpg 0002.jpg My Videos/ Nice.mpg My Music/ School Files/ CptS223/ FunStuffs /

4 Overview Tree data structure Binary search trees (BSTs)  Support O(log 2 N) operations  Balanced trees (AVL trees, Splay trees) B-trees for accessing secondary storage STL set and map classes Applications of trees

5 Trees An example tree A generic tree G is parent of N and child of A M is a child of F and grandchild of A

6 Definitions A tree T is a set of nodes  Each non-empty tree has a root node and zero or more sub- trees T 1, T 2, …, T k  Each subtree is a tree  The root of a tree is connected to the root of each sub-tree by a directed edge If node n 1 connects to sub-tree rooted at n 2, then  n 1 is the parent of n 2  n 2 is a child of n 1 Each node in a tree has only one parent  Except the root node, which has no parent

7 Definitions (cont’d) Nodes with no children are leaves Nodes with children are non-leaves Nodes with the same parent are siblings A path from nodes n 1 to n k is a sequence of nodes n 1, n 2, n 3, …, n k such that n i is the parent of n i + 1 for 1  i < k  The length of a path is the number of edges on the path (i.e. k – 1)  Each node has a path of length 0 to itself  There is exactly one path from the root to each node in the tree

8 Definitions (cont’d) A path from nodes n 1 to n k is a sequence of nodes n 1, n 2, n 3, …, n k such that n i is the parent of n i + 1 for 1  i < k  Nodes n i, n i + 1, …, n k are descendants of n i and ancestors of n k  Nodes n i + 1, …, n k are proper descendants of n i  Nodes n i, n i + 1, …, n k – 1 are proper ancestors of n k

9 Definitions (cont’d)  B, C, D, E, F, G are siblings  B, C, H, I, P, Q, K, L, M, N are leaves (or leaf nodes)  K, L, M are siblings  The path from A to Q is A—E—J—Q  A, E, J are proper ancestors of Q  E, J, Q (as well as I and P) are proper descendants of A

10 Definitions (cont’d) The depth of a node n i is the length of the unique path from the root to n i  The root node has a depth of 0  The depth of the tree is the depth of its deepest leaf The height of a node n i is the length of the longest path from n i to a leaf  All leaves have a height of 0  The height of a tree is the height of its root node The height of a tree equals its depth

11 Definitions (cont’d)  Height of each node?  Height of tree?  Depth of each node?  Depth of tree?

12 Implementation of Trees Solution 1: Vector of children Solution 2: List of children struct TreeNode { Object element; vector children; } struct TreeNode { Object element; list children; }

13 Implementation of Trees (cont’d) Solution 3: First child, next-sibling struct TreeNode { Object element; TreeNode *firstChild; TreeNode *nextSibling; }

14 Binary Tree A binary tree is a tree where each node has no more than two children If a node is missing one or both children, then that child pointer is NULL struct BinaryTreeNode { Object element; BinaryTreeNode *leftChild; BinaryTreeNode *rightChild; }

15 Example: Expression Trees Store expressions in a binary tree  Leaves of trees are operands (e.g. constants, variables)  Other internal nodes are unary or binary operators Used by compilers to parse and evaluate expressions  Arithmetic, logic, relational, etc. E.g. (a + b * c) + ((d * e + f) * g)

16 Example: Expression Trees (cont’d) E.g. (a + b * c) + ((d * e + f) * g)

17 Example: Expression Trees (cont’d) Evaluate expression  Recursively evaluate left and right subtrees  Apply operator at root node to results from subtrees  Use post-order traversal: left, right, root Tree traversals  Pre-order traversal: root, left, right The work at a node is performed before (pre) its children are processed  In-order traversal: left, root, right  Post-order traversal: left, right, root The work at a node is performed after (post) its children are evaluated

18 Traversals  Pre-order traversal: Print out the operator first and the recursively print out the left and right subtrees  + + a * b c * + * d e f g  In-order traversal: Recursively producing a parenthesized left expression, then printing out the operator at the root and finally recursively producing a parenthesized right expression  (a+ (b*c)) + (((d*e)+f)*g)  Post-order traversal: Recursively print-out the left subtree, the right subtree and then the operator  a b c * + d e * f + g * +

19 Example: Expression Trees (cont’d) Constructing an expression tree from postfix notation  Use a stack of pointers to trees  Read postfix expression from left to right  If the symbol is a operand, create a one node tree and push a pointer to it onto a stack  If operator, then: Create BinaryTreeNode with operator as the element Pop top two items off the stack Insert these items as left and right child of new node Push pointer to node on the stack

20 E.g., a b + c d e + * * Example: Expression Trees a b (1) a b (2) + a b (3) + e d c top a b (4) + e d c top +

21 21 E.g., a b + c d e + * * Example: Expression Trees a b (5) + e d c top + * a b (6) + e d c top + * *

22 Binary Search Trees Application of Binary Tree  Searching  Each node in the tree stores an item  Assume items are integers arbitrarily complex items are also possible  Items are distinct with no duplicates Property that makes a Binary Tree  Binary Search Tree  For every node X, in the tree, the values of all the items in a left subtree are smaller than item in X  And values of all the items in its right subtree are larger than item in X  All the elements in the tree are ordered in some consistent manner

23 Binary Search Trees Binary search tree (BST) is a tree where:  For any node n, items in left subtree of n  item in node n  items in right subtree of n  Average depth of a BST is O(logN)  Which of the two trees is a BST? BST?

24 Operations on a BST search(T, x) or contains(T, x): O(?)  Returns true if x is in BST T; returns false, otherwise findMin(T) : O(?)  Returns the minimum element or smallest element in the left subtree of BST T findMax(T) : O(?)  Returns the maximum element or largest element in the right subtree of BST T printTree(T) : O(?)  Pre-order, in-order, or post-order traversals

25 Operations on a BST (cont’d) insert(T, x): O(?)  Inserts x into the BST T remove(T, x): O(?)  Removes x from the BST T makeEmpty(T): O(?)  Deletes all the nodes of the BST T

26 Operations on a BST (cont’d) search(T, x) or contains(T, x)  T is a pointer to the root node of the BST  x is the element we want to search contains(T, x) { if (T == NULL) return false; else if (T->element == x) return true; else if (x element) return contains(T->leftChild, x); else return contains(T->rightChild, x); }

27 Operations on a BST (cont’d) search(T, x) or contains(T, x)  Returns a pointer to x in the BST if x is in T; returns NULL, otherwise contains(T, x) { if (T == NULL) return NULL; else if (T->element == x) return T; else if (x element) return contains(T->leftChild, x); else return contains(T->rightChild, x); }

28 Operations on a BST (cont’d) search(T, x) or contains(T, x)  Typically, we assume no duplicate elements in the BST  If duplicates, then store counts in nodes, or each node has a list of objects

29 Operations on a BST (cont’d) search(T, x) or contains(T, x)  Complexity of searching a BST with N nodes is O(?)  Complexity of searching a BST of height h is O(h)  h = f(N)? 1 2 3 4 6 8 4 1 2 36 8

30 Operations on a BST (cont’d) findMin(T)  Returns the minimum element or smallest element in the left subtree of BST T  Start at the root and go left as long as there is a left child The stopping point is the smallest element  Complexity: O(?) findMin(T) { if (T == NULL) return NULL; else if (T->leftChild == NULL) return T; else return findMin(T->leftChild); }

31 Operations on a BST (cont’d) findMax(T)  Returns the maximum element or largest element in the right subtree of BST T  Complexity: O(?) findMax(T) { if (T == NULL) return NULL; else if (T->rightChild == NULL) return T; else return findMax(T->rightChild); }

32 Operations on a BST (cont’d) preorder(T)  Prints the elements of BST T in pre-order (root, left, right)  Complexity: O(?) preorder(T) { if (T != NULL) { cout element; preorder(T->leftChild); preorder(T->rightChild); } 6 2 1 4 3 8

33 Operations on a BST (cont’d) inorder(T)  Prints the elements of BST T in in-order (left, root, right)  Complexity: O(?) inorder(T) { if (T != NULL) { inorder(T->leftChild); cout element; inorder(T->rightChild); } 1 2 3 4 6 8

34 Operations on a BST (cont’d) postorder(T)  Prints the elements of BST T in post-order (left, right, root)  Complexity: O(?) postorder(T) { if (T != NULL) { postorder(T->leftChild); postorder(T->rightChild); cout element; } 1 3 4 2 8 6

35 Operations on a BST (cont’d) insert(T, x)  Inserts x into the BST T  E.g. insert 5

36 Operations on a BST (cont’d) insert(T, x)  “Search” for x until we reach the end of tree; insert x there insert(T, x) { if (T == NULL) T = new Node(x); else if (x element) if (T->leftChild == NULL) T->leftChild = new Node(x); else insert(T->leftChild, x); else if (T->rightChild == NULL) T->rightChild = new Node(x); else insert(T->rightChild, x); } Complexity: O(?)

37 Operations on a BST (cont’d) remove(T, x)  Removes x from the BST T  Case 1: Node to remove has 0 or 1 child If a node is a leaf, just delete it If the node has one child, the node can be deleted after its parent adjust a link to bypass the node E.g. remove 4

38 Operations on a BST (cont’d) remove(T, x)  Removes x from the BST T  Case 2: Node to remove has 2 children Replace the data of this node with the smallest data of the right subtree and recursively delete that node (which is now empty)  Smallest node in the right subtree cannot have a left child, the second remove is an easy one First Replace node element with successor Remove successor (Case 1) E.g. remove 2

39 Operations on a BST (cont’d) remove(T, x) { if (T == NULL) return; else if (x == T->element) if (T->leftChild == NULL && T->rightChild != NULL) { T = T->rightChild; Delete the node } else if (T->leftChild != NULL && T->rightChild == NULL) { T = T->leftChild; Delete the node } else { // Case 2 successor = findMin(T->rightChild); T->element = successor->element; remove(T->rightChild, T->element); } else if (x element) remove(T->leftChild, x); else remove(T->rightChild, x); } Complexity: O(?)

40 Operations on a BST (cont’d) makeEmpty(T)  Deletes all the nodes in the BST T  Complexity: O(?) makeEmpty(T) { if (T != NULL) { makeEmpty(T->leftChild); makeEmpty(T->rightChild); delete T; } T = NULL; } What kind of traversal?

41 Implementation of BST Why “Comparable ?

42 Pointer to tree node passed by reference so it can be reassigned within function Implementation of BST (cont’d)

43 Public member functions calling private recursive member functions Implementation of BST (cont’d)

44 Implementation of BST: contains

45 45 Implementation of BST: findMin and findMax

46 Implementation of BST: Insert

47 Case 2: Copy successor data Delete successor Case 1: Just delete it

48 Post-order traversal Implementation of BST: Destructor

49 49 Pre-order or Post-order traversal ? Copy Assignment Operator

50 Implementation of BST See Figures 4.16 to 4.28 on pages 126 to 134

51 Analysis of BST printTree(T), preorder(T), inorder(T), postorder(T), makeEmpty(T)  Always O(N) contains(T, x), findMin(T), findMax(T), insert(T, x), remove(T, x)  O(d), where d = depth of tree  Worst case: d = ?  Best case: d = ? (not when N = 0)  Average case: d = ?

52 BST Average-Case Analysis Internal path length  Sum of the depths of all nodes in the tree Compute average internal path length over all possible insertion sequences  Assume all insertion sequences are equally likely E.g., “1 2 3 4 5 6 7”, “7 6 5 4 3 2 1”,…, “4 2 6 1 3 5 7”  Result: O(N log 2 N) Thus, average depth = O(N log 2 N) / N = O(log 2 N)

53 Randomly Generated 500-node BST (Insert Only) Average node depth = 9.98 log 2 500 = 8.97

54 Previous BST After 500 2 Random Insert/Remove Pairs Average node depth = 12.51 log 2 500 = 8.97

55 BST Average-Case Analysis After randomly inserting N nodes into an empty BST  Average depth = O(log 2 N) After Θ(N 2 ) random insert/remove pairs into an N- node BST  Average depth = Θ(N 1/2 ) Why? Solutions?  Overcome problematic average cases?  Overcome worst case?

56 Problem with BST On the worst-case, accessing elements in a BST is O(N) Solution: Use balanced BSTs such that depth of the tree is O(logn)  Conditions: Left and right subtrees has same height Every node must have left and right subtrees of the same height  Perfectly balanced trees of 2 k -1 nodes  Balanced condition is too rigid to be useful and should be relaxed  AVL trees  Splay trees

57 Balanced BSTs AVL trees  The height of the left and right subtrees at every node in the BST differ by at most 1  Maintained via rotations operations  Depth always O(log 2 N) Splay trees  After a node is accessed, push it to the root via AVL rotations  Average depth per operation is O(log 2 N)

58 AVL Trees AVL (Adelson-Velskii and Landis, 1962) For every node in the BST, the heights of its left and right subtrees differ by at most 1 Height is always O(log 2 N)  Actually, 1.44 log 2 (N + 2) – 1.328  Minimum number of nodes S(h) in AVL tree of height h S(h) = S(h – 1) + S(h – 2) + 1 Similar to Fibonnaci recurrence

59 AVL Trees (cont’d) Which of the following is an AVL tree? AVL tree?

60 Maintaining Balance Condition If we can maintain balance condition, then all AVL operations are O(log 2 N) Maintain height h(t) at each node t  h(t) = max(h(t->left), h(t->right)) + 1  h(empty tree) = -1 Which operations can upset the balance condition?

61 AVL Remove Assume remove is accomplished using lazy deletion  Removed nodes are only marked as deleted, but not actually removed from the AVL tree  Unmarked when same object is re-inserted Re-allocation time is avoided  Does not affect O(log2 N) height as long as deleted nodes are not in the majority  Does require additional memory per node Can also accomplish remove without lazy deletion

62 AVL Insert Insert can violate the AVL balance condition Can be fixed by a rotation Inserting 6 violates AVL balance condition Rotating 7-8 restores balance

63 AVL Insert (cont’d) Only nodes along the path to insertion have their balance altered Follow path back to root, looking for violations Fix violations using single or double rotations root inserted node Check for violations x Fix at the violated node

64 AVL Insert (cont’d) Assume node k needs to be rebalanced Four cases leading to violation 1. An insertion into the left subtree of the left child of k 2. An insertion into the right subtree of the left child of k 3. An insertion into the left subtree of the right child of k 4. An insertion into the right subtree of the right child of k  Cases 1 and 4 are handled by single rotation  Cases 2 and 3 are handled by double rotation

65 Identifying Cases for AVL Insert k Let this be the node with the violation (i.e, imbalance) (nearest to the last insertion site) Insert CASE 1 Insert CASE 2 Insert CASE 3 Insert CASE 4 right child left child left subtree right subtree

66 Case 1: AVL insert k Let this be the node with the violation (i.e, imbalance) (nearest to the last insertion site) Insert CASE 1 left child left subtree

67 AVL Insert (cont’d) Case 1: Single rotation to the right X, Y, Z could be empty trees or single node trees or multiple node trees Violation AVL balance condition okay BST ordering okay inserted Invariant: X < k 1 < Y < k 2 < Z

68 AVL Insert (cont’d) Case 1: Example ImbalanceBalanced inserted

69 Rules: Fixing violations after AVL tree insertions Locate the deepest node with the height imbalance Locate which part of its subtree caused the imbalance  This will be same as locating the subtree site of insertion Identify the Case (1 or 2 or 3 or 4) Do the corresponding rotation

70 Case 4: AVL insert k Let this be the node with the violation (i.e, imbalance) (nearest to the last insertion site) Insert CASE 4 right child right subtree

71 AVL Insert (cont’d) Case 4: Single rotation to the left (mirror case of Case 1) Violation AVL balance condition okay BST ordering okay inserted Invariant: X < k 1 < Y < k 2 < Z Balanced Imbalance

72 AVL Insert (cont’d) Case 4 example inserted Imbalance 4 2 5 6 7 4 2 6 7 5 Fix this node Automatically fixed will this be true always? balanced

73 AVL Insert (cont’d) Case 4 example Start with an empty AVL tree and insert the items 3, 2, 1 and then 4 through 7 in sequential order

74 Case 2: AVL insert k Let this be the node with the violation (i.e, imbalance) (nearest to the last insertion site) Insert CASE 2 left child right subtree

75 AVL Insert (cont’d) Case 2: Single rotation fails  X, Z can be empty trees or single node trees or multiple node trees  But Y should have at least one or more nodes in it because of insertion Violation Imbalance remains Imbalance Think of Y as = Single rotation does not fix the imbalance

76 AVL Insert (cont’d) Case 2: Left-right double rotation  Bring k2 at k3’s place Violation AVL balance condition okay BST ordering okay Balanced Invariant: A < k1 < B < k2 < C < k3 < D inserted #1 #2

77 AVL Insert (double rotation) Case 2 example inserted Imbalance Approach: Bring 3 to 5’s place Bring k2 at k3’s place 3 2 5 4 1 6 Balanced #2 5 3 6 4 1 2 5 2 6 3 4 1 #1 k2 k3 k1 k2 k3k1

78 Case 3: AVL insert k Let this be the node with the violation (i.e, imbalance) (nearest to the last insertion site) Insert CASE 3 right child left subtree

79 AVL Insert (cont’d) Case 3: Right-left double rotation  Mirror case of Case 2 Violation AVL balance condition okay BST ordering okay inserted Balanced Invariant: A < k1 < B < k2 < C < k3 < D #1 #2

80 Exercise for AVL deletion/remove 10 7 5 2 8 15 13 19 11 14 17 25 16 18 imbalance Delete(2); Can be fixed by Case 4 How much time will it take to identify the case?

81 Exercise AVL Insert Case 3 + Case 4 + Case 1 + Case 2: Example Start with an empty AVL tree and insert the items 3, 2, 1 and then 4 through 7 in sequential order  Insert 10 through 16 in reverse order, followed by 8 and then 9 CASE 3: Right-left double rotation 4 2 6 3 7 1 5 CASE 1: Single right rotation CASE 2: Left-right double rotation

82 AVL Tree Implementation

83

84 84 Case 1 Case 2 Case 4 Case 3 Insert first and then fix Locate insertion site relative to the imbalanced node (if any)

85 Similarly, write rotateWithRightChild() for Case 4 No change New 2 New 1 // New 1 // New 2

86 #1 #2 // #1 // #2

87 AVL Tree Implementation (cont’d) See Figures 4.40 to 4.46 on pages 146 to 149

88 Splay Tree After a node is accessed, push it to the root via a series of AVL rotations  If a node is deep, there are also many other nodes on the path which are relatively deep  Restructuring helps to make future access cheaper  Studies have shown that if a node is accessed, it is likely to be accessed again Guarantees that any M consecutive operations on an empty tree will take at most O(M log 2 N) time Amortized cost per operation is O(log 2 N) Still, some operations may take O(N) time Does not require maintaining height or balance information

89 Splay Tree (cont’d) Solution 1 (a bad solution)  Perform single rotations bottom-up with the accessed/new node and parent until the accessed/new node is the root  Problem Pushes other nodes deep in the tree In general, can result in O(M * N) time for M operations E.g. Insert 1, 2, 3, 4, …, N

90 Splay Tree (cont’d) Solution 1 (a bad solution) example N

91 Splay Tree: Splaying Solution 2 (a better solution)  “Splaying”  Still rotate tree on the path from the accessed/new node X to the root  But rotations are more selected based on node, its parent, and its grandparent  If X is a child of the root, then rotate X with the root  Otherwise, see the next options (Zig-zag and Zig-zig)

92 Splay Tree: Splaying (Zig-zag) Solution 2 (a better solution)  If node X is a right-child of parent P, which is a left-child of grandparent G (or vice-versa)  Perform double rotation (left, right)  “Zig-zag”

93 Splay Tree: Splaying (Zig-zig) Solution 2 (a better solution)  If node X is a left-child of parent P, which is a left-child of grandparent G (or right-right)  Perform double rotation (right-right)  “Zig-zig”

94 Splay Tree: Splaying (cont’d) Example: Consider the previous worst-case scenario: insert 1, 2, …, N; then access 1 CASE: Zig-zig

95 Splay Tree Remove Access the node to be removed (so now it is at the root) Remove node leaving two subtrees T L and T R Access largest element in T L  Now at root; no right child Make T R right child of root of T L

96 Splay Tree: Remove (cont’d) 6 7 4 2 1 5 6 7 4 2 1 5 4 2 1 5 7 5 7 4 2 1 Example: Remove 6

97 Balanced BSTs AVL trees  Guarantees O(log 2 N) behavior  Requires maintaining height information Splay trees  Guarantees amortized O(log 2 N) behavior  Moves frequently-accessed elements closer to the root of the tree Both assume N-node tree can fit in the main memory  What if all nodes cannot fit in main memory?

98 Motivation: Top 10 Largest Databases OrganizationDatabase Size WDCC6,000 TBs NERSC2,800 TBs AT&T323 TBs Google 33 trillion rows/database entries (91 million insertions/ searches per day) Sprint3 trillion rows (100 million insertions per day) ChoicePoint250 TBs Yahoo!100 TBs YouTube45 TBs Amazon42 TBs Library of Congress20 TBs Source: www.businessintelligencelowdown.com, 2007www.businessintelligencelowdown.com

99 Use a BST? Google: 33 trillion items Indexed by IP (duplicates) Access time  h = log 2 (33 x 10 12 ) = 44.9  Assume 120 disk accesses per second  Each search takes 0.37 second

100 Idea Use a 3-way search tree (instead of 2-way) Each node stores 2 keys and has at most 3 children Each node access brings in 2 keys and 3 child pointers Height of a balanced 3-way search tree? 36 452718

101 Better and Bigger Idea Use an M-ary search tree (instead of just 2 or 3) Each node access brings in (M – 1) keys and M child pointers Choose M so node size = disk page size Height of tree = log M N  N is the number of keys in the tree

102 Example Standard disk page size = 8,192 bytes Assume keys use 32 bytes, pointers use 4 bytes  Keys uniquely identify data elements 32 * (M – 1) + 4 * M = 8,192 bytes  Solve for M, M = 228 Google example: log 228 (33 x 10 12 ) = 5.7 disk accesses  Assume 120 disk accesses per second Each search takes 0.047 second

103 B-tree A B-tree (also called a B + Tree) of order M is an M-ary tree with the following properties: 1. Data items are stored at the leaves 2. Non-leaf nodes store up to M-1 keys Key i represents the smallest key in subtree i + 1 3. Root node is either a leaf or has between 2 and M children 4. Non-leaf nodes have between  M / 2  and M children 5. All leaves at same depth and have between  L / 2  and L items Requiring nodes to be half full avoids degeneration into binary tree

104 B-tree (cont’d) B-tree of order 5 (i.e. M = 5)  Node has 2 to 4 keys and 3 to 5 children  Leaves have 3 to 5 data elements

105 B-tree: Choosing L Assuming a data element requires 256 bytes Leaf node capacity of 8,192 bytes implies L = 32 Each leaf node has between 16 and 32 data elements Worst-case for Google  Leaves = (33 x 10 12 ) / 16 = 2 x 10 12  log M/2 2 x 10 12 = log 114 2 x 10 12 = 5.98

106 B-tree: Insertion Case 1: Insert into a non-full leaf node  E.g., insert 57 into the previous order 5 tree

107 B-tree: Insertion (cont’d) Case 2: Insert into a full leaf node, but parent has room  Split leaf and promote middle element to parent  E.g., insert 55 into previous tree

108 B-tree: Insertion (cont’d) Case 3: Insert into full leaf, parent has no room  Split parent, promote parent’s middle element to grandparent  Continue until non-full parent or split root  E.g., insert 40 into previous tree

109 B-tree: Insertion (cont’d) E.g., try inserting 43 and 45 into previous tree

110 B-tree: Deletion Case 1: Leaf node containing item not at minimum  E.g., remove 16 from previous tree

111 B-tree: Deletion (cont’d) Case 2: Leaf node containing item has minimum elements, neighbor not at minimum  Adopt element from neighbor  E.g., remove 6 from previous tree 10 8

112 B-tree: Deletion (cont’d) Case 3: Leaf node containing item has minimum elements, neighbors have minimum elements  Merge with neighbor and intermediate key  If parent now below minimum, continue up the tree  E.g., remove 99 from previous tree

113 B-tree: Case 3 Deletion (cont’d)

114 B-trees B-trees are ordered search tree optimized for large N and secondary storage B-trees are M-ary trees with height log M N  M = O(10 2 ) based on disk page sizes  E.g., trillions of elements stored in tree of height 6 Worst-case for Google Leaves = (33 x 10 12 ) / 16 = 2 x 10 12 log M/2 2 x 10 12 = log 114 2 x 10 12 = 5.98 Basis of many database architectures

115 C++ STL Sets and Maps vector and list STL classes are inefficient for search C++ requires STL set and map classes guarantee logarithmic insert, delete, and search Implementation is a balanced BST  Typically AVL tree is not used, instead top-down red-black tress are often used

116 STL set Class An ordered container that does not allow duplicates Like lists and vectors, sets provide iterators and related methods: begin, end, empty, and size Sets also support insert, erase, and find

117 set Insertion insert adds an item to the set and returns an iterator to it Because a set does not allow duplicates, insert may fail  In this case, insert returns an iterator to the item causing the failure To distinguish between success and failure, insert actually returns a pair of results  This pair structure consists of an iterator and a Boolean indicating success or failure pair insert(const Object & x);

118 Sidebar: STL pair Class pair Methods: first, second, first_type, second_type #include pair insert(const Object & x) { iterator it; bool found; … return pair ; }

119 set Insertion (cont’d) Giving insert a hint For good hints, insert is O(1) Otherwise, reverts to one-parameter insert E.g. pair insert(iterator hint, const Object & x); set s; for (int i = 0; i < 1000000; i++) s.insert(s.end(), i);

120 set Deletion int erase(const Object & x);  Remove x, if found  Return number of items deleted (0 or 1) iterator erase(iterator itr);  Remove object at position given by iterator  Return iterator for object after deleted object iterator erase(iterator start, iterator end);  Remove objects from start up to (but not including) end  Returns iterator for object after last deleted object

121 set Search iterator find(const Object & x) const;  Returns iterator to object (or end() if not found)  Unlike contains, which returns Boolean find runs in logarithmic time

122 STL map Class Stores items, where an item consists of a key and a value Like a set instantiated with a key/value pair Keys must be unique Different keys can map to the same value map keeps items in order by key

123 STL map Class (cont’d) Methods  begin, end, size, empty  insert, erase, find Iterators reference items of type pair Inserted elements are also of type pair

124 STL map Class (cont’d.) Main benefit: overloaded operator[] If key is present in the map  Returns a reference to corresponding value If key is not present in the map  Key is inserted into the map with a default value  Reference to the default value is returned ValueType & operator[](const KeyType & key); map salaries; salaries[“Pat”] = 75000.0;

125 STL map Class (cont’d) Example struct ltstr { bool operator()(const char* s1, const char* s2) const { return strcmp(s1, s2) < 0; } }; int main() { map months; months["january"] = 31; months["february"] = 28; months["march"] = 31; months["april"] = 30;... Comparator if key type is not primitive

126 STL map Class (cont’d) Example (cont’d)... months["may"] = 31; months["june"] = 30; months["july"] = 31; months["august"] = 31; months["september"] = 30; months["october"] = 31; months["november"] = 30; months["december"] = 31; cout " << months["june"] << endl; map ::iterator cur = months.find("june"); map ::iterator prev = cur; map ::iterator next = cur; ++next; --prev; cout << "Previous (in alphabetical order) is " << (*prev).first << endl; cout << "Next (in alphabetical order) is " << (*next).first << endl; }

127 Implementation of set and map Support insertion, deletion, and search in worst-case logarithmic time Use balanced binary search tree Support for iterator  Hard part is advancing to the next node  Tree node points to its predecessor and successor  BST with N Nodes has N+1 NULL pointers, ½ the space allocated for pointer is wasted If a node has NULL left child  make left child link to its inorder predessor If a node has NULL right child  make right child link to its inorder successor This is known as a “threaded tree” A C D E F G H I B

128 Summary: Trees Trees are ubiquitous in software Search trees important for fast search  Support logarithmic searches  Must be kept balanced (AVL, Splay, B-tree) STL set and map classes use balanced trees to support logarithmic insert, delete, and search


Download ppt "Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Trees."

Similar presentations


Ads by Google