Optimal Binary Search Tree We now want to focus on the construction of binary search trees for a static set of identifiers. And only searches are performed.

Optimal Binary Search Tree We now want to focus on the construction of binary search trees for a static set of identifiers. And only searches are performed. To find an optimal binary search tree for a given static file, a cost measure must be determined for search trees. It ’ s reasonable to use the level number of a node as the cost.

Binary Search Tree Example for do while return if for do return if while 4 comparisons in worst case 3 comparisons in worst case

Extended Binary Tree Example for do while return if for do return if while (a) (b)

External Path Length and Internal Path Length External path length of a binary tree is the sum over all external nodes of the lengths of the paths from the root to those nodes. Internal path length is the sum over all internal nodes of the lengths of the paths from the root to those nodes. Let the internal path length be I and external path length E, then the binary tree of (a) has I = 0+1+1+2+3 = 7, E = 2+2+4+4+3+2 = 17.

External Path Length and Internal Path Length (Cont.) It can be shown that E = I + 2n. Binary trees with maximum E also have maximum I. For all binary trees with n internal nodes, –maximum I = (skew tree) –minimum I = (complete binary tree)

Binary Search Tree Containing A Symbol Table Let ’ s look at the problem of representing a symbol table as a binary search tree. If a binary search tree contains the identifiers a 1, a 2, …, a n with a 1 < a 2 < … < a n, and the probability of searching for each a i is p i, then the total cost of any binary search tree is when only successful searches are made.

Binary Search Tree Containing A Symbol Table For unsuccessful searches, let ’ s partitioned the identifiers not in the binary search tree into n+1 classes E i, 0 ≤ i ≤ n. If q i is the probability that the identifier being sought is in E i, then the cost of the failure node is Therefore, the total cost of a binary search tree is An optimal binary search tree for the identifier set a 1, …, a n is one that minimize the above equation over all possible binary search trees for this identifier set. Since all searches must terminate either successfully or unsuccessfully, we have

Binary Search Tree With Three Identifiers Example while if do if dowhile do if while do if do while if (a) (b) (c) (d) (e)

Cost of Binary Search Tree In The Example With equal probabilities, p i = q j = 1/7 for all i and j, we have cost(tree a) = 15/7; cost(tree b) = 13/7 cost(tree c) = 15/7; cost(tree d) = 15/7 cost(tree e) = 15/7 Tree b is optimal. With p 1 =0.5, p 2 =0.1, p 3 =0.05, q 0 =0.15, q 1 =0.1, q 2 =0.05, and q 3 =0.05 we have cost(tree a) = 2.65; cost(tree b) = 1.9 cost(tree c) = 1.5; cost(tree d) = 2.05 cost(tree e) = 1.6 Tree c is optimal.

Determine Optimal Binary Search Tree So to determine which is the optimal binary search, it is not practical to follow the above brute force approach since the complexity is O(n4 n /n 3/2 ). Now let’s take another approach. Let T ij denote an optimal binary search tree for a i+1, …, a j, i<j. Let c ij be the cost of the search tree T ij. Let r ij be the root of T ij and let w ij be the weight of T ij, where Therefore, by definition r ii =0, w ii =q i, 0 ≤ i ≤ n. T 0n is an optimal binary search tree for a 1, …, a n. Its cost function is c 0n, it weight w 0n, and it root is r 0n.

Determine Optimal Binary Search Tree (Cont.) If T ij is an optimal binary search tree for a i+1, …, a j, and r ij =k, then i< k <j. T ij has two subtrees L and R. L contains a i+1, …, a k-1, and R contains a k+1, …, a j. So the cost c ij of T ij is c ij = p k + cost(L) + cost(R) + weight(L) + weight(R) c ij = p k + c i,k-1 + c kj + w i,k-1 + w kj = w ij + c i,k-1 + c kj Since T ij is optimal, we have w ij + c i,k-1 + c kj =

Example 10.2 Let n=4, (a1, a2, a3, a4) = (do, if return, while). Let (p1, p2, p3, p4)=(3,3,1,1) and (q0, q1, q2, q3, q4)=(2,3,1,1,1). w ii = q ii, c ii =0, and r ii =0, 0 ≤ i ≤ 4. w 01 = p 1 + w 00 + w 11 = p 1 +q 1 +w 00 = 8 c 01 = w 01 + min{c 00 +c 11 } = 8 r 01 = 1 w 12 = p 2 + w 11 + w 22 = p 2 +q 2 +w 11 = 7 c 12 = w 12 + min{c 11 +c 22 } = 7 r 12 = 2 w 23 = p 3 + w 22 + w 33 = p 3 +q 3 +w 22 = 3 c 23 = w 23 + min{c 22 +c 33 } = 3 r 23 = 3 w 34 = p 4 + w 33 + w 44 = p 4 +q 4 +w 33 = 3 c 34 = w 34 + min{c 33 +c 44 } = 3 r 34 = 4

Example 10.2 Computation w 00 =2 c 00 =0 r 00 =0 w 11 =3 c 11 =0 r 11 =0 w 22 =1 c 22 =0 r 22 =0 w 00 =2 c 00 =0 r 00 =0 w 33 =1 c 33 =0 r 33 =0 w 44 =1 c 44 =0 r 44 =0 w 01 =8 c 01 =8 r 01 =1 w 12 =7 c 12 =7 r 12 =2 w 23 =3 c 23 =3 r 23 =3 w 34 =3 c 34 =3 r 34 =4 w 02 =12 c 02 =19 r 02 =1 w 13 =9 c 13 =12 r 13 =2 w 24 =5 c 24 =8 r 24 =3 w 03 =14 c 03 =25 r 03 =2 w 14 =11 c 14 =19 r 14 =2 w 04 =16 c 04 =32 r 04 =2 0 1 2 3 4 4 0 1 2 3

Computation Complexity of Optimal Binary Search Tree To evaluate the optimal binary tree we need to compute c ij for (j-i)=1, 2, …,n in that order. When j-i=m, there are n-m+1 c ij ’ s to compute. The computation of each c ij ’ s can be computed in time O(m). The total time for all c ij ’ s with j-i=m is therefore O(nm- m 2 ). The total time to evaluate all the c ij ’ s and r ij ’ s is The computing complexity can be reduced to O(n 2 ) by limiting the search of the optimal l to the range of r i,j-1 ≤ l ≤ r i+1,j according to D. E. Knuth.

AVL Trees Dynamic tables may also be maintained as binary search trees. Depending on the order of the symbols putting into the table, the resulting binary search trees would be different. Thus the average comparisons for accessing a symbol is different.

Binary Search Tree for The Months of The Year JAN APR AUG DEC SEPT OCT NOV FEBMAR MAY JUNE JULY Input Sequence: JAN, FEB, MAR, APR, MAY, JUNE, JULY, AUG, SEPT, OCT, NOV, DEC Max comparisons: 6 Average comparisons: 3.5

A Balanced Binary Search Tree For The Months of The Year JULY Input Sequence: JULY, FEB, MAY, AUG, DEC, MAR, OCT, APR, JAN, JUNE, SEPT, NOV APRDECJUNE MAR NOVSEPT OCTAUG FEB MAY JAN Max comparisons: 4 Average comparisons: 3.1

Degenerate Binary Search Tree APR Input Sequence: APR, AUG, DEC, FEB, JAN, JULY, JUNE, MAR, MAY, NOV, OCT, SEPT AUG DEC FEB JAN JULY JUNE MAR SEPT NOV OCT MAY Max comparisons: 12 Average comparisons: 6.5

Minimize The Search Time of Binary Search Tree In Dynamic Situation From the above three examples, we know that the average and maximum search time will be minimized if the binary search tree is maintained as a complete binary search tree at all times. However, to achieve this in a dynamic situation, we have to pay a high price to restructure the tree to be a complete binary tree all the time. In 1962, Adelson-Velskii and Landis introduced a binary tree structure that is balanced with respect to the heights of subtrees. As a result of the balanced nature of this type of tree, dynamic retrievals can be performed in O(log n) time if the tree has n nodes. The resulting tree remains height-balanced. This is called an AVL tree.

AVL Tree Definition: An empty tree is height-balanced. If T is a nonempty binary tree with T L and T R as its left and right subtrees respectively, then T is height- balanced iff (1) T L and T R are height-balanced, and (2) |h L – h R | ≤ 1 where h L and h R are the heights of T L and T R, respectively. Definition: The Balance factor, BF(T), of a node T is a binary tree is defined to be h L – h R, where h L and h R, respectively, are the heights of left and right subtrees of T. For any node T in an AVL tree, BF(T) = -1, 0, or 1.

Balanced Trees Obtained for The Months of The Year MAR 0 MAY 0 (a) Insert MARCH (b) Insert MAY MAR -2 MAY (c) Insert NOVEMBER NOV 0 MAY 0 NOV 0 0 MAR RR (d) Insert AUGUST MAY +1 NOV 0 +1 MAY 0 AUG

Balanced Trees Obtained for The Months of The Year (Cont.) (e) Insert APRIL MAY +2 NOV 0 +2 MAR +1 AUG 0 APR LL MAY +1 NOV 0 0 AUG 0 APR MAR 0 MAY +2 NOV 0 AUG 0 APR MAR +1 0 JAN (f) Insert JANUARY MAR 0 MAY 0 AUG 0 APR JAN 0 NOV 0 LR

Balanced Trees Obtained for The Months of The Year (Cont.) MAR +1 MAY AUG 0 APR JAN +1 NOV 0 0 DEC (g) Insert DECEMBER MAR +1 MAY AUG 0 APR JAN 0 NOV 0 0 DEC (h) Insert JULY JULY 0

Balanced Trees Obtained for The Months of The Year (Cont.) (i) Insert FEBRUARY MAR +2 MAY -2 AUG 0 APR JAN +1 NOV 0 DEC JULY 0 FEB 0 MAR +1 MAY 0 DEC +1 AUG JAN 0 NOV 0 0 FEB JULY 0 RL 0 APR

Balanced Trees Obtained for The Months of The Year (Cont.) (j) Insert JUNE LR 0 JAN AUGFEB NOV JULYMAY APR JUNE DECMAR +1 0 0 00 0 +2 MAR AUGJAN JULY NOV DECMAY +1 0 JUNE 0 FEB 0 APR 0

Balanced Trees Obtained for The Months of The Year (Cont.) APR AUGFEB JAN DECMAR JULYMAY NOV OCT JUNE +1 -2 0 0 0 +1 0 (k) Insert OCTOBER APR AUGFEB JAN DECMAR JULYNOV MAY OCT JUNE +1 0 0 0 0 0 0 +1 0 RR

Balanced Trees Obtained for The Months of The Year (Cont.) (i) Insert SEPTEMBER JAN DECMARAUGFEBJULYNOV +1 0 OCT MAY 0 JUNE 0 APR 0 SEPT 0 +1

Rebalancing Rotation of Binary Search Tree LL: new node Y is inserted in the left subtree of the left subtree of A LR: Y is inserted in the right subtree of the left subtree of A RR: Y is inserted in the right subtree of the right subtree of A RL: Y is inserted in the left subtree of the right subtree of A. If a height – balanced binary tree becomes unbalanced as a result of an insertion, then these are the only four cases possible for rebalancing.

Rebalancing Rotation LL +1 A 0B0B BLBL BRBR ARAR h+2 h +2 A +1 B BLBL BRBR ARAR 0B0B BLBL 0A0A BRBR ARAR h+2 LL height of B L increases to h+1

Rebalancing Rotation RR A 0B0B BRBR BLBL ALAL h+2 -2 A B BRBR BLBL ALAL 0B0B BRBR 0A0A BLBL ALAL h+2 RR height of B R increases to h+1

Rebalancing Rotation LR(a) +1 A 0B0B +2 A B 0C0C 0B0B 0C0C 0A0A LR(a)

Rebalancing Rotation LR(b) +1 A 0B0B 0C0C BLBL h CLCL CRCR ARAR h+2 h +2 A B +1 C BLBL CLCL CRCR ARAR LR(b) 0C0C 0B0B A CLCL BLBL CRCR ARAR h+2 h

Rebalancing Rotation LR(c) +2 A B C BLBL CLCL CRCR ARAR LR(c) 0C0C +1 B 0A0A CLCL BLBL CRCR ARAR h+2 h

AVL Trees (Cont.) Once rebalancing has been carried out on the subtree in question, examining the remaining tree is unnecessary. To perform insertion, binary search tree with n nodes could have O(n) in worst case. But for AVL, the insertion time is O(log n).

AVL Insertion Complexity Let N h be the minimum number of nodes in a height-balanced tree of height h. In the worst case, the height of one of the subtrees will be h-1 and that of the other h-2. Both subtrees must also be height balanced. N h = N h-1 + N h-2 + 1, and N 0 = 0, N 1 = 1, and N 2 = 2. The recursive definition for N h and that for the Fibonacci numbers F n = F n-1 + F n-2, F 0 =0, F 1 = 1. It can be shown that N h = F h+2 – 1. Therefore we can derive that. So the worst-case insertion time for a height-balanced tree with n nodes is O(log n).

Probability of Each Type of Rebalancing Rotation Research has shown that a random insertion requires no rebalancing, a rebalancing rotation of type LL or RR, and a rebalancing rotation of type LR and RL, with probabilities 0.5349, 0.2327, and 0.2324, respectively.

Comparison of Various Structures OperationSequential ListLinked ListAVL Tree Search for x O(log n)O(n)O(log n) Search for kth item O(1)O(k)O(log n) Delete x O(n)O(1) 1 O(log n) Delete kth item O(n - k)O(k)O(log n) Insert x O(n)O(1) 2 O(log n) Output in order O(n) 1.Doubly linked list and position of x known. 2.Position for insertion known

2-3 Trees If search trees of degree greater than 2 is used, we ’ ll have simpler insertion and deletion algorithms than those of AVL trees. The algorithms ’ complexity is still O(log n). Definition: A 2-3 tree is a search tree that either is empty or satisfies the following properties: (1) Each internal ndoe is a 2-node or a 3-node. A 2-node has one element; a 3-node has two elements. (2) Let LeftChild and MiddleChild denote the children of a 2-node. Let dataL be the element in this node, and let dataL.key be its key. All elements in the 2-3 subtree with root LeftChild have key less than dataL.key, whereas all elements in the 2-3 subtree with root MiddleChild have key greater than dataL.key. (3) Let LeftChild, MiddleChild, and RightChild denote the children of a 3-node. Let dataL and dataR be the two elements in this node. Then, dataL.key < dataR.key; all keys in the 2-3 subtree with root LeftChild are less than dataL.key; all keys in the 2-3 subtree with root MiddleChild are less than dataR.key and greater than dataL.key; and all keys in the 2-3 subtree with root RightChild are greater than dataR.key. (4) All external nodes are at the same level.

2-3 Tree Example 40 102080 A B C

The Height of A 2-3 Tree Like leftist tree, external nodes are introduced only to make it easier to define and talk about 2-3 trees. External nodes are not physically represented inside a computer. The number of elements in a 2-3 tree with height h is between 2 h - 1 and 3 h - 1. Hence, the height of a 2-3 tree with n elements is between and

2-3 Tree Data Structure template class Two3; class Two3Node { friend class Two3 ; private: Element dataL, dataR; Two3Node *LeftChild, *MiddleChild, *RightChild; }; template class Two3{ public: Two3(KeyType max, Two3Node * int=0) : MAXKEY(max), root(init) {}; // constructor Boolean Insert(const Element &); Boolean Delete(const Element &); Two3Node * Search(const Element &); private: Two3Node * root; KeyType MAXKEY; };

Searching A 2-3 Tree The search algorithm for binary search tree can be easily extended to obtain the search function of a 2-3 tree (Two3::Search()). The search function calls a function compare that compares a key x with the keys in a given node p. It returns the value 1, 2, 3, or 4, depending on whether x is less than the first key, between the first key and the second key, greater than the second key, or equal to one of the keys in node p.

Searching Function of a 2-3 Tree template Two3Node * Two3 :: Search(const Element & x) // Search the 2-3 tree for an element x. If the element is not in the tree, then return 0. // Otherwise, return a pointer to the node that contains this element. { for (Two3Node * p = root; p;) switch(p->compare(x)){ case 1: p = p->LeftChild; break; case 2: p = p->MiddleChild; break; case 3: p = p->RightChild; break; case 4: return p; // x is one of the keys in p }

Insertion Into A 2-3 Tree First we use search function to search the 2-3 tree for the key that is to be inserted. If the key being searched is already in the tree, then the insertion fails, as all keys in a 2-3 tree are distinct. Otherwise, we will encounter a unique leaf node U. The node U may be in two states: – the node U only has one element: then the key can be inserted in this node. –the node U already contains two elements: A new node is created. The newly created node will contain the element with the largest key from among the two elements initially in p and the element x. The element with the smallest key will be in the original node, and the element with median key, together with a pointer to the newly created node, will be inserted into the parent of U.

Insertion to A 2-3 Tree Example 40 10207080 A B C (a) 70 inserted 2040 1030 A B D (b) 30 inserted 8070 C

Insertion of 60 Into Figure 10.15(b) 20 1030 A B D 60 C 80 E 70 F 40 G

Node Split From the above examples, we find that each time an attempt is made to add an element into a 3-node p, a new node q is created. This is referred to as a node split.

2-3 Tree Insertion Function template Boolean Two3 ::Insert(const Element & y) { Two3Node * p; Element x = y; if (x.key>=MAXKEY) return FALSE; // invalid key if (!root) {NewRoot(x, 0); return TRUE;} if (!(p = FindNode(x))){ InsertionError(); return FALSE;} for (Two3Node *a = 0;;) if (p->dataR.key == MAXKEY) { // p is a 2-node p->PutIn(x, a); return TRUE; } else { // p is a 3-node Two3Node * olda = a; a = new(Two3Node ); x = Split(p, x, olda, a); if (root == p) { // root has been split NewRoot(x, a); return TRUE; } else p = p->parent(); }

Deletion From a 2-3 Tree If the element to be deleted is not in a leaf node, the deletion operation can be transformed to a leaf node. The deleted element can be replaced by either the element with the largest key on the left or the element with the smallest key on the right subtree. Now we can focus on the deletion on a leaf node.

Deletion From A 2-3Tree Example 5080 10206070 A B D 9590 C 5080 102060 A B D 9590 C 5080 102060 A B D 95 C (a) Initial 2-3 tree (b) 70 deleted (c) 90 deleted

Deletion From A 2-3Tree Example (Cont.) 2080 1050 A B D 95 C (d) 60 deleted 20 105080 A B C (e) 95 deleted 20 1080 A B C (f) 50 deleted 2080 B (g) 10 deleted

Rotation and Combine As shown in the example, deletion may invoke a rotation or a combine operations. For a rotation, there are three cases –the leaf node p is the left child of its parent r. –the leaf node p is the middle child of its parent r. –the leaf node p is the right child of its parent r.

Three Rotation Cases x? yz a b d c r pq y? xz a b d c r pq z? xy a b d c r pq y? xz a b d c r pq wz r p xy q b d c e wy r z p x q b d c e a (a) p is the left child of r (b) p is the middle child of r(c) p is the right child of r

Steps in Deletion From a Leaf Of a 2-3 Tree Step 1: Modify node p as necessary to reflect its status after the desired element has been delete. Step 2: for (; p has zero elements && p != root; p = r) { let r be the parent of p, and let q be the left or right sibling of p; if (q is a 3-node) perform a rotation else perform a combine; } Step 3: If p has zero elements, then p must be the root. The left child of p becomes the new root, and node p is deleted.

Combine When p is the Left Child of r xz y a b c r pq z x y a b c r p xz y a b c r pq a r z x b d p (a) (b) c d

2-3-4 Trees Definition: A 2-3-4 tree is a search tree that either is empty or satisfies the following properties: (1) Each internal node is a 2-, 3-, or 4-node. A 2-node has one element, a 3-node has two elements, and a 4-node has three elements. (2) Let LeftChild and LeftMidChild denote the children of a 2-node. Let dataL be the element in this node, and let dataL.key be its key. All elements in the 2-3-4 subtree with root LeftMidChild have key greater than dataL.key. (3) LeftChild, LeftMidChild, and RightMidChild denote the children of a 3-node. Let dataL and dataM be the two elements in this node. Then, dataL.key < dataM.key; all keys in the 2-3-4 subtree with root LeftChild are less than dataL.key; all keys in the 2-3-4 subtree with root LeftMidChild are less than datM.key and greater than dataL.key; and all keys in the 2-3-4 subtree with root RightMidChild are greater than dataM.key.

2-3-4 Trees (Cont.) (4) Let LeftChild, LeftMidChild, RigthMidChild, RightChild denote the children of a 4-node. Let dataL, dataM, dataR be the three elements in this node. The, dataL.key < dataM.key < dataR.key; all keys in the 2-3-4 subtree with root LeftChild are less than dataL.key; all keys in the 2-3-4 subtree with root LeftMidChild are less than dataM.key and greater than dataL.key; all keys in the 2-3-4 subtree with root RightMideChild are greater than dataM.key but less than dataR.key; and all keys in the 2-3-4 subtree with root RightChild are greater than dataR.key. (5) All external nodes are at the same level.

2-3-4 Tree Example 50 10 7080 57830406075859092

2-3-4 Trees (Cont.) Similar to the 2-3 tree, the height of a 2-3-4 tree with n nodes h is bound between and 2-3-4 tree has an advantage over 2-3 trees in that insertion and deletion can be performed by a single root- to-leaf pass rather than by a root-to-leaf pass followed by a leaf-to-root pass. So the corresponding algorithms in 2-3-4 trees are simpler than those of 2-3 trees. Hence 2-3-4 trees can be represented efficiently as a binary tree (called red-black tree). It would result in a more efficient utilization of space.

Top-Down Insertion If the leaf node into which the element is to be inserted is a 2-node or 3-node, then it ’ s easy. Simply insert the element into the leaf node. If the leaf node into which the element is to be inserted is a 4-node, then this node splits and a backward (leaf-to-root) pass is initiated. This backward pass terminates when either a 2-node or 3-node is encountered, or when the root is split. To avoid the backward pass, we split 4-nodes on the way down the tree. As a result, the leaf node into which the insertion is to be made is guaranteed to be a 2- or 3-node. There are three different cases to consider for a 4-node: (1) It is the root of the 2-3-4 tree. (2) Its parent is a 2-node. (3) Its parent is a 3-node.

Transformation When the 4-Node Is The Root x yz abcd y xz abcd root Increase height by one

Transformation When the 4- Node is the Child of a 2-Node z e w xy abcd x z wy ab c d w a x yz bcde w y xz bcde (a) (b) a e

Transformation When the 4-Node is the Left Child of a 3-Node y z e v wx bcda f w yz v x abc d e f

Transformation When the 4-Node is the Left Middle Child of a 3-Node v z a w xy bcde v xz wy bcde a f f

Transformation When the 4-Node is the Right Middle Child of a 3-Node v w a x yz b c d e v wy xz cdef a f b

Top-Down Deletion The deletion of an arbitrary element from a 2-3-4 tree may be reduced to that of a deletion of an element that is in a leaf node. If the element to be deleted is in a leaf that is a 3-node or 4-node, the its deletion leaves behind a 2-node or a 3- node. No restructure is required. To avoid a backward restructuring path, it is necessary to ensure that at the time of deletion, the element to be deleted is in a 3-node or a 4- node. This is accomplished by restructuring the 2- 3-4 tree during the downward pass.

Top-Down Deletion (Cont.) Suppose the search is presently at node p and will move next to node q. The following cases need to be considered: (1) p is a leaf: The element to be deleted is either in p or not in the tree. (2) q is not a 2-node. In this case, the search moves to q, and no restructuring is needed. (3) q is a 2-node, and its nearest sibling, r, is also a 2-node. –if p is a 2-node, p must be root, and p, q, r are combined by reserving the 4-node being the root splitting transformation. –if p is a 3-node or a 4-node, perform, in reverse, the 4-node splitting transformation. (4) q is a 2-node, and its nearest sibling, r, is a 3-node. (5) q is a 2-node and its nearest sibling, r, is a 4-node.

Deletion Transformation When the Nearest Sibling is a 3-Node w z vxy ab c d f p q e x z vwy ab d e f p q r r v yz uwx ab c d f p q e w yz uvx ab d e f p q r r g g (a) q is the left child of a 3-node (b) q is the left child of a 4-node

Red-Black Trees A red-black tree is a binary tree representation of a 2-3-4 tree. The child pointer of a node in a red-black tree are of two types: red and black. –If the child pointer was present in the original 2-3-4 tree, it is a black pointer. –Otherwise, it is a red pointer. A node in a 2-3-4 is transformed into its red-black representation as follows: (1)a 2-node p is represented by the RedBlackNode q with both its color data members black, and data = dataL; q->LeftChild = p->LeftChild, and q->RightChild = p ->LeftMidChild. (2) A 3-node p is represented by two RedBlackNodes connected by a red pointer. There are two ways in which this may be done. (3)A 4-node is represented by three RedBlackNodes, one of which is connected to the remaining two by red pointers.

Transforming a 3-Node into Two RedBlackNodes x y bca y x a b c or x y c b a

Transforming a 4-Node into Three RedBlackNodes x yz bca y x a b z d c d

Red-Black Trees (Cont.) One may verify that a red-black tree satisfies the following properties: (P1) It is a binary search tree. (P2) Every root-to-external-node path has the same number of black links. (P3) No root-to-external-node path has two or more consecutive red pointers. An alternate definition of a red-black tree is given in the following: (Q1) It is a binary search tree. (Q2) The rank of each external node is 0 (Q3) Every internal node that is the parent of an external node has rank 1. (Q4) For every node x that has a parent p(x), rank(x) ≤ rank(p) ≤ rank(x) + 1. (Q5) For every node x that has a grandparent gp(x), rank(x) < rank(gp(x)).

Red-Black Trees (Cont.) Each node x of a 2-3-4 tree T is represented by a collection of nodes in its corresponding red-black tree. All nodes in this collection have a rank equal to height(T) – level(x) +1. Each time there is a rank change in a path from the root of the red-black tree, there is a level change in the corresponding 2-3-4 tree. Black pointers go from a node of a certain rank to one whose rank is one less. Red pointers connect two nodes of the same rank.

Lemma 10.1 Lemma 10.1: Every red-black tree RB with n (internal) nodes satisfies the following: (1) (2) (3)

Searching a Red-Black Tree Since a red-black tree is a binary search tree, the search operation can be done by following the same search algorithm used in a binary search tree.

Red-Black Tree Insertion An insertion to a red-black tree can be done in two ways: top-down or bottom-up. In a top-down insertion, a single root-to-leaf pass is made over the red-black tree. A bottom-up insertion makes both a root-to-leaf and a leaf-to-root pass.

Top-Down Insertion We can detect a 4-node simply by looking for nodes q for which both color data members are red. Such nodes, together with their two children, form a 4-node. When such a 4-node q is detected, the following transformations are needed: (1)Change both the colors of q to black. (2)If q is the left (right) child of its parent, then change the left (right) color of its parent to red. (3) If we now have two consecutive red pointers, then one is from the grandparent, gp, of q to the parent, p, of q and the other from p to q. Let the direction of the first of these be X and that of the second be Y. Depending on XY = LL, LR, RL, and RR, transformations are performed to remove violations.

Transformation for a Root 4-Node y x a b z d c root y x a b z d c

Transformation for a 4-Node That is the Child of a 2-Node x w a b y d c z e x w a b y d c z e y z e d x b c w a y z e d x b c w a

Transformation for a 4-Node That is the Left Child of a 3-Node z y w vx a b c d e f w vx a b c d y z e f w vx a b c d y z e f w vx a b c d y z e f (a) LL rotation (b) color change

Transformation for a 4-Node That is the Left Middle Child of a 3-Node x wy b c d e v z f a w v x a b c y z f d e x yw z v a f w v x a b c y z f d e b c d e (a) LR rotation (b) RL rotation

Bottom-Up Insertion In bottom-up insertion, the element to be inserted is added as the appropriate child of the node last encountered. A red pointer is used to join the new node to its parent. However, this might violates the red-black tree definition since there might be two consecutive red pointers on the path. To resolve this problem, we need to perform color transformation. Let s be the sibling of node q. The violation is classified as an XYZ violation, where X=L if is a left pointer, and X=R otherwise; Y=L if is a left pointer, and Y=R otherwise; and Z=r if s ≠ 0 and is a red pointer, and Z= b otherwise.

Bottom-Up Insertion (Cont.) The color changes potentially propagate the violation up the tree and may need to be reapplied several times. Note that the color change would not affect the number of black pointers on a root-to-external path.

LLr and LRr Color Changes for Bottom-Up Insertion w x y a z e b c d w x y a z e b c d x w y a z e b c d x w y b z e c d a (a) LLr color change (b) LRr color change

LLb and LRb Rotations for Bottom-Up Insertion w x y a z e b c d w x y e c d x w y a z e b c d z w x b y e c a (a) LLb rotation (b) LRb rotation z a b d

Comparison of Top-Down and Bottom-Up In comparing the top-down and the bottom-up insertion methods, the top- down method, O(log n) rotations can be performed, whereas only one rotation is possible in the bottom up method. Both methods may perform O(log n) color changes. However, the top-down method can be used in pipeline mode to perform several insertions in sequence. The bottom-up cannot be used in this way.

Deletion from a Red-Black Tree If the node to be delete is root, then the result is an empty red-black tree. If the leaf node to be deleted has a red pointer to its parent, then it can be deleted directly because it is part of 3-node or 4-node. If the leaf node to be deleted has a black pointer, then the leaf is a 2-node. Deletion from a 2-node requires a backward restructuring pass. This is not desirable. To avoid deleting a 2-node, insertion transformation is used in the reverse direction to ensure that the search for the element to be deleted moves down a red pointer. Since most of the insertion and deletion transformations can be accomplished by color changes and require no pointer changes or data shifts, these operations take less time using red-black trees than when a 2-3-4 tree is represented using nodes of type Two34Node.

Joining and Splitting Red- Black Trees In binary search tree we have the following operations defined: ThreeWayJoin, TwoWayJoin, and Split. These operations can be performed on red-black trees in logarithmic time.

Large Search Tree That Does Not Fit in Memory The aforementioned balanced search trees (AVL trees, 2-3 trees, 2-3-4 trees) only work fine when the table can fit in the internal memory. If the table is larger than the internal memory, then a search may require O(h) disk accesses, where h is the height of the tree. Since disk accesses tend to take significant amount of time compared to internal memory accesses, it is desirable we develop a structure to minimize the number of disk accesses.

M-Way Search Tree Definition: An m-way search tree, either is empty or satisfies the following properties: (1)The root has at most m subtrees and has the following structures: n, A 0, (K 1, A 1 ), (K 2, A 2 ), …, (K n, A n ) where the A i, 0 ≤ i ≤ n ≤ m, are pointers to subtrees, and the K i, 1 ≤ i ≤ n ≤ m, are key values. (2) K i < K i +1, 1 ≤ i ≤ n (3) All key values in the subtree A i are less than K i +1 and greater then K i, 0 ≤ i ≤ n (4) All key values in the subtree A n are greater than K n, and those in A 0 are less than K 1. (5) The subtrees A i, 0 ≤ i ≤ n, are also m-way search trees.

Searching an m-Way Search Tree Suppose to search a m-Way search tree T for the key value x. Assume T resides on a disk. By searching the keys of the root, we determine i such that K i ≤ x < K i+1. –If x = K i, the search is complete. –If x ≠ K i, x must be in a subtree A i if x is in T. –We then proceed to retrieve the root of the subtree A i and continue the search until we find x or determine that x is not in T.

Searching an m-Way Search Tree The maximum number of nodes in a tree of degree m and height h is Therefore, for an m-Way search tree, the maximum number of keys it has is m h - 1. To achieve a performance close to that of the best m-way search trees for a given number of keys n, the search tree must be balanced.

B-Tree Definition: A B-tree of order m is an m-way search tree that either is empty or satisfies the following properties: (1)The root node has at least two children. (2) All nodes other than the root node and failure nodes have at least children. (3)All failure nodes are at the same level.

B-Tree (Cont.) Note that 2-3 tree is a B-tree of order 3 and 2-3-4 tree is a B-tree of order 4. Also all B-trees of order 2 are full binary trees. A B-tree of order m and height l has at most m l -1 keys. For a B-tree of order m and height l, the minimum number of keys ( N ) in such a tree is If there are N key values in a B-tree of order m, then all nonfailure nodes are at levels less than or equal to l,. The maximum number of accesses that have to be made for a search is l. For example, a B-tree of order m= 200, an index with N ≤ 2x10 6 -2 will have l ≤ 3.

The Choice of m B-trees of high order are desirable since they result in a reduction in the number of disk accesses. If the index has N entries, then a B-tree of order m=N+1 has only one level. But this is not reasonable since all the N entries can not fit in the internal memory. In selecting a reasonable choice for m, we need to keep in mind that we are really interested in minimizing the total amount of time needed to search the B-tree for a value x. This time has two components: (1)the time for reading in the node from the disk (2) the time needed to search this node for x.

The Choice of m (Cont.) Assume a node of a B-tree of order m is of a fixed size and is large enough to accommodate n, A 0, and m-1 triple ( K i, A i, B i ), 1 ≤ j < m. If the K i are at most characters α long and A i and B i each characters β long, then the size of a node is about m ( α+2β ). Then the time to access a node is t s + t l + m ( α+2β ) t c = a+bm where a = t s + t l = seek time + latency time b = (α+2β ) t c, and t c = transmission time per character. If binary search is used to search each node of the B-tree, then the internal processing time per node is c log 2 m+d for some constants c and d. The total processing time per node is τ= a + bm + c log 2 m+d The maximum search time is where f is some constant.

Figure 10.36: Values of (35+0.06m)/log 2 m mSearch time (sec) 235.12 417.62 811.83 168.99 327.38 646.47 1286.10 2566.30 5127.30 10249.64 204814.35 409623.40 819240.50

Figure 10.37: Plot of (35+0.06m)/log 2 m 50 400 125 5.7 6.8 m Total maximum search time

Insertion into a B-Tree Instead of using 2-3-4 tree ’ s top-down insertion, we generalize the two-pass insertion algorithm for 2-3 trees because 2-3-4 tree ’ s top-down insertion splits many nodes, and each time we change a node, it has to be written back to disk. This increases the number of disk accesses. The insertion algorithm for B-trees of order m first performs a search to determine the leaf node p into which the new key is to be inserted. –If the insertion of the new key into p results p having m keys, the node p is split. –Otherwise, the new p is written to the disk, and the insertion is complete. Assume that the h nodes read in during the top-down pass can be saved in memory so that they are not to be retrieved from disk during the bottom-up pass, then the number of disk accesses for an insertion is at most h (downward pass) +2(h-1) (nonroot splits) + 3(root split) = 3h+1. The average number of disk accesses is approximately h+1 for large m.

Figure 10.38: B-Trees of Order 3 10, 30 10 25, 30 20 1025, 30 20, 28 10 (a) p = 1, s = 0 (b) p = 3, s = 1 (c) p = 4, s = 2 p is the number of nonfailure nodes in the final B-tree with N entries. s is the number of split

Deletion from a B-Tree The deletion algorithm for B-tree is also a generalization of the deletion algorithm for 2-3 trees. First, we search for the key x to be deleted. –If x is found in a node z, that is not a leaf, then the position occupied by x in z is filled by a key from a leaf node of the B-tree. –Suppose that x is the i th key in z ( x =K i ). Then x may be replaced by either the smallest key in the sbutree A i or the largest in the subtree A i-1. Since both nodes are leaf nodes, the deletion of x from a nonleaf node is transformed into a deletion from a leaf.

Deletion from a B-Tree (Cont.) There are four possible cases when deleting from a leaf node p. –In the first case, p is also the root. If the root is left with at least one key, the changed root is written back to disk. Otherwise, the B-tree is empty following the deletion. –In the second case, following the deletion, p has at least keys. The modified leaf is written back to disk. –In the third case, p has keys, and its nearest sibling, q, has at least keys. Check only one of p ’ s nearest siblings. p is deficient, as it has one less than the minimum number of keys required. q has more keys than the minimum required. As in the case of a 2-3 tree, a rotation is performed. In this rotation, the number of keys in q decreases by one, and the number in p increases by one. –In the fourth case, p has keys, and q has keys. p is deficient and q has minimum number of keys permissible for a nonroot node. Nodes p and q and the keys K i are combined to form a single node.

Figure 10.39 B-Tree of Order 5 22035 21015225303404550

Splay Trees If we only interested in amortized complexity rather than worst-case complexity, simpler structures can be used for search trees. By using splay trees, we can achieve O(log n) amortized time per operation. A splay tree is a binary search tree in which each search, insert, delete, and join operations is performed in the same way as in an ordinary binary search tree except that each of these operations is followed by a splay. Before a split, a splay is performed. This makes the split very easy to perform. A splay consists of a sequence of rotations.

Starting Node of Splay Operation The start node for a splay is obtained as follows: (1)search: The splay starts at the node containing the element being sought. (2)insert: The start node for the splay is the newly inserted node. (3)delete: The parent of the physically deleted node is used as the start node for the splay. If this node is the root, then no splay is done. (4) ThreeWayJoin: No splay is done. (5)split: Suppose that we are splitting with respect to the key i and that key i is actually present in the tree. We first perform a splay at the node that contains i and then split the tree.

Splay Operation Splay rotations are performed along the path from the start node to the root of the binary search tree. Splay rotations are similar to those performed for AVL trees and red-black trees. If q is the node at which splay is being performed. The following steps define a splay (1)If q either is 0 or the root, then splay terminates. (2)If q has a parent p, but no grandparent, then the rotation of Figure 10.42 is performed, and the splay terminates. (3)If q has a parent, p, and a grandparent, gp, then the rotation is classified as LL (p is the left child of gp, and q is left child of p), LR ( p is the left child of gp, q is right child of p), RR, or RL. The splay is repeated at the new location of q.

Splay Amortized Cost Note that all rotations move q up the tree and that following a splay, q becomes the new root of the search tree. As a result, splitting the tree with respect to a key, i, is done simply by performing a splay at i and then splitting at the root. The analysis for splay trees will use a potential technique. Let P 0 be the initial potential of the search tree, and let P i be its potential following the i th operation in a sequence of m operations. The amortized time for the i th operation is defined as (actual time for the i th operation) + P i – P i-1 So the actual time for the ith operation is (amortized time for the i th operation) + P i – P i-1 Hence, the actual time needed to perform the m operations in the sequence is

Figure 10.42: Rotation when q is Right Child and Has no Grandparent a b c p q a b c p q a, b, and c are substrees

Figure 10.43 RR and RL Rotations a b c d gp a b c d b c d a a b c d p q p q p q p q (a) Type RR (b) Type RL

Figure 10.44 Rotations In A Splay Beginning At Shaded Node 1 9 8 2 7 6 3 4 5 f e d c b a g h i j (a) Initial search tree 1 9 8 2 7 6 5 b a g h i j 4 3 d c f e (b) After RR rotation

Figure 10.44: Rotations In A Splay Beginning At Shaded Node (Cont.) 1 9 8 2 5 a i j 4 3 d c e 6 7 g h f b (c) After LL rotation (d) After LR rotation 1 9 5 2 a j 4 3 b d c e 8 6 7 i g h f

Figure 10.44: Rotations In A Splay Beginning At Shaded Node (Cont.) 1 2 4 e b 5 3 d c a 9 8 6 f i 7 g h j (e) After RL rotation

Upper Bound of Splay ’ s Amortized Cost Let the size, s(i) of the subtree with root i be the total number of nodes in it. The rank, r(i), of node i is equal to. The potential of the tree is. Lemma 10.2: Consider a binary search tree that has n elements/nodes. The amortized cost of a splay operation that begins at node q is at most.

Splay Tree Complexity Theorem 10.1: The total time needed for a sequence of m search, insert, delete, join, and split operations performed on a collection of initially empty splay trees is O( m log n ), where n, n > 0, is the number of inserts in the sequence.

Digital Search Trees A digital search tree is a binary tree in which each node contains one element. The element-to- node assignment is determined by the binary representation of the element keys. Suppose we number the bits in the binary representation of a key from left to right beginning at one. Then bit one of 1000 is 1. All keys in the left subtree of a node at level i have bit i equal to 0 whereas those in the right subtree of nodes at this level have bit i = 1.

Figure 10.45 Digital Search 1000 0010 0001 0000 1001 1100 1000 0010 0001 0000 1001 1100 0011 (a) Initial tree (b) After 0011 inserted a b c d e f a b c d e f g

Digital Search Trees (Cont.) The digital search tree functions to search and insert are quite similar to the corresponding functions for binary search trees. During insert or search, the subtree to move to is determined by a bit in the search key rather than by the result of the comparison of the search key and the key in the current node. Deleting an item in a leaf node is easy by simply removing the node. Deleting the key in a non-leaf node, the deleted item is replaced by a value from any leaf in its subtree and that leaf is removed. Each of these operations can be performed in O(h) time, where h is the height of the digital search tree. If each key in a digital search tree has KeySize bits, then the height of the digital search tree is at most KeySize +1.

Binary Tries When we are dealing with very long keys, the cost of a key comparison is high. The cost can be reduced to one by using a related structure called Patricia (Practical algorithm to retrieve information coded in alphanumeric). Three steps to develop the structure: –First, introduce a structure called binary trie. –Then, transform binary tries into compressed binary tries. –Finally, from compressed binary tries we obtain Patricia. A binary trie is a binary tree that has two kinds of nodes: branch nodes and element nodes. –A branch node has two data members LeftChild and RightChild but no data data member. –An element node has single data member data. Branches nodes are used to build binary tree search structure similar to that of a digital search tree.

Figure 10.46: Example of A Binary Trie 0000 0001 0010 1000 1001 1100

Compressed Binary Trie Observe that a successful search in a binary trie always ends at an element node. Once this element node is reached, key comparison is performed. Observe from Figure 10.46, we found that there are some degree one node in the tree. We can use another data member BitNumber to eliminate all degree-one branch nodes from the trie. The BitNumber gives the bit number of the key that is to be used at this node. A binary trie that has been modified to contain no branch nodes of degree one is called a compressed binary trie.

Figure 10.47: Binary Trie of Figure 10.46 With Degree-One Nodes Eliminated 0000 0001 0010 1000 1001 1100 1 2 3 4 4

Patricia Compressed binary tries may be represented using nodes of a single type. The new nodes, called augmented branch nodes, are the original branch nodes augmented by the data member data. The resulting structure is called Patricia and obtained from a compressed binary trie in the following way: (1)Replace each branch node by an augmented branch node. (2) Eliminate the element nodes. (3) Store the data previously in the element nodes in the data data members of the augmented branch nodes. Since every nonempty compressed binary trie has one less branch node than it has element nodes, it is necessary to add one augmented branch node. This node is called head node. The remaining structure is the left subtree of the head node. The node has BitNumber equal to 0. The assignment of data to the augmented branch node is done in such a way that the BitNumber in the augmented branch node is less than or equal to that in the parent of the element node that contained this data. (4) Replace the original pointers to element nodes by pointers to the respective augmented branch nodes.

Figure 10.48: An Example of Patricia 1100 0000 0010 0001 1001 1000 0 1 3 4 2 4

Figure 10.49: Insertion Into Patricia 1000 0 root 0010 1 1000 0010 1 1000 1001 0 4 0010 1 1000 1100 0 2 1001 4 0010 1 1000 1100 0 2 1001 4 0000 3 0010 1 1000 1100 0 2 1001 4 0000 3 0001 4 (a) 1000 inserted (b) 0010 inserted (c) 1001 inserted (d) 1100 inserted (e) 0000 inserted (f) 0001 inserted 0

Analysis of Patricia Insertion The complexity of Patricia insertion is O(h) where h is the height of the Patricia. h can be as large as min{KeySize+1, n}, where KeySize is the number of bits in a key and n is the number of elements. When the keys are uniformly distributed, the height is O(log n).

Tries A trie is an index structure that is particularly useful when key values are of varying size. It is the generalized of the binary trie. A trie is a tree of degree m ≥ 2 in which the branching at any level is determined not by the entire key value, but by a portion of it. A trie contains two types of nodes: element node, and branch node. –An element node has only data data member. –A branch node contains pointers to subtrees. Assume each character is one of the 26 letters of the alphabet, a branch node has 27 pointer data members. The extra data member is used for the blank character (denoted as b ). It is used to terminates all keys.

Figure 10.50: Trie Created Using Characters Of Key Value From Left To Right, One At A Time bluebirdbunting cardinalchickadee godwitgoshawk gull oriolewren a thrasherthrush bcgotwb l u aho u h r au ds

Figure 10.51: Trie Showing Need For A Terminal Character totogether t o gb

Sampling Strategies Given a set of key values to be represented in an index, the number of levels in the trie will depend on the strategy or key sampling technique used to determine the branching at each level. The trie structure we just discussed had sample(x,i) = ith character. We could choose different sample functions that will result in different trie structure. Ideally, with a fixed set of keys, we should be able to find a best trie structure that has the fewest number of levels. However, in reality, it is very difficult to do so. If we consider dynamic situation with keys being added and deleted, it is even more difficult. In such case, we wish to optimize average performance. Without the knowledge of future key values, the best sampling function probably is the randomized sampling function.

Sampling Strategies (Cont.) The sampling strategy is not limited to one character at a time. In fact, multiple characters can be used in one sampling. In some cases, we want to limit the number of levels. We can achieve this by allowing nodes to hold more than one key value. If the maximum number of levels allowed is l, then all key values that are synonyms up to level l-1 are entered into the same element node.

Figure 10.52: Trie Constructed For Data Of Figure 10.50 Sampling One Character At A Time, From Right To Left abcdefghijklmnopqrstuvwxyzb bluebirdbunting cardinal chickadee godwitgoshawkthrasher thrush gull oriole wren e l a l

Figure 10.53: An Optimal Trie For The Data Of Figure 10.50 Sampling On The First Level Done By The Fourth Character Of The Key Values abcdefghijklmnopqrstuvwxyzb bluebird buntingcardinal chickadee godwitgoshawkthrasher thrushgulloriole wren

Figure 10.54: Trie Obtained For Data Of Figure 10.50 When Number Of Levels Is Limited To 3; Keys Have Been Sampled From Left To Right, One Character At A Time abcdefghijklmnopqrstuvwxyzb bluebirdbunting cardinalchickadee godwit goshawk thrasher thrush gull oriolewren ah luouh

Figure 10.55: Selection of Trie of Figure 10.50 Showing Changes Resulting from Inserting Bobwhite and Bluejay bluebirdbluejay buntingbobwhite b lou u e bj δ1δ1 δ2δ2 δ3δ3 σ ρ

Optimal Binary Search Tree We now want to focus on the construction of binary search trees for a static set of identifiers. And only searches are performed.

Similar presentations

Presentation on theme: "Optimal Binary Search Tree We now want to focus on the construction of binary search trees for a static set of identifiers. And only searches are performed."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Optimal Binary Search Tree We now want to focus on the construction of binary search trees for a static set of identifiers. And only searches are performed.

Similar presentations

Presentation on theme: "Optimal Binary Search Tree We now want to focus on the construction of binary search trees for a static set of identifiers. And only searches are performed."— Presentation transcript:

Similar presentations

About project

Feedback