Binary Search Trees Cormen (cap 12, Edition 3) Estruturas de Dados e seus Algoritmos (Cap 4)

Slides:

Advertisements

Similar presentations

AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.

Advertisements

AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.

1 AVL-Trees (Adelson-Velskii & Landis, 1962) In normal search trees, the complexity of find, insert and delete operations in search trees is in the worst.

Chapter 4: Trees Part II - AVL Tree

AVL Trees COL 106 Amit Kumar Shweta Agrawal Slide Courtesy : Douglas Wilhelm Harder, MMath, UWaterloo

Trees Types and Operations

Binary Search Tree AVL Trees and Splay Trees

Time Complexity of Basic BST Operations Search, Insert, Delete – These operations visit the nodes along a root-to- leaf path – The number of nodes encountered.

B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree

CS202 - Fundamental Structures of Computer Science II

Comp 122, Fall 2004 Dynamic Programming. dynprog - 2 Lin / Devi Comp 122, Spring 2004 Longest Common Subsequence  Problem: Given 2 sequences, X =  x.

CS Data Structures Chapter 10 Search Structures (Selected Topics)

CSE332: Data Abstractions Lecture 7: AVL Trees Tyler Robison Summer

AVL-Trees (Part 1) COMP171. AVL Trees / Slide 2 * Data, a set of elements * Data structure, a structured set of elements, linear, tree, graph, … * Linear:

Tirgul 5 AVL trees.

TCSS 342 AVL Trees v1.01 AVL Trees Motivation: we want to guarantee O(log n) running time on the find/insert/remove operations. Idea: keep the tree balanced.

CSC311: Data Structures 1 Chapter 10: Search Trees Objectives: Binary Search Trees: Search, update, and implementation AVL Trees: Properties and maintenance.

AVL-Trees (Part 1: Single Rotations) Lecture COMP171 Fall 2006.

1 Theory I Algorithm Design and Analysis (3 - Balanced trees, AVL trees) Prof. Th. Ottmann.

Binary Search Trees1 ADT for Map: Map stores elements (entries) so that they can be located quickly using keys. Each element (entry) is a key-value pair.

B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.

CSC 2300 Data Structures & Algorithms February 13, 2007 Chapter 4. Trees.

Chapter 10 Search Structures Instructors: C. Y. Tang and J. S. Roger Jang All the material are integrated from the textbook "Fundamentals of Data Structures.

1 BST Trees A binary search tree is a binary tree in which every node satisfies the following: the key of every node in the left subtree is.

Advanced Data Structures and Algorithms COSC-600 Lecture presentation-6.

1 AVL-Trees: Motivation Recall our discussion on BSTs –The height of a BST depends on the order of insertion E.g., Insert keys 1, 2, 3, 4, 5, 6, 7 into.

AVL Trees Amanuel Lemma CS252 Algoithms Dec. 14, 2000.

CS Data Structures Chapter 10 Search Structures.

Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.

2IL50 Data Structures Fall 2015 Lecture 7: Binary Search Trees.

Search Trees. Binary Search Tree (§10.1) A binary search tree is a binary tree storing keys (or key-element pairs) at its internal nodes and satisfying.

1 Trees 4: AVL Trees Section 4.4. Motivation When building a binary search tree, what type of trees would we like? Example: 3, 5, 8, 20, 18, 13, 22 2.

Lec 15 Oct 18 Binary Search Trees (Chapter 5 of text)

D. ChristozovCOS 221 Intro to CS II AVL Trees 1 AVL Trees: Balanced BST Binary Search Trees Performance Height Balanced Trees Rotation AVL: insert, delete.

Binary Search Trees (BSTs) 18 February Binary Search Tree (BST) An important special kind of binary tree is the BST Each node stores some information.

Data Structures AVL Trees.

CIS 068 Welcome to CIS 068 ! Lesson 12: Data Structures 3 Trees.

AVL trees1 AVL Trees Height of a node : The height of a leaf is 1. The height of a null pointer is zero. The height of an internal node is the maximum.

CSE373: Data Structures & Algorithms Lecture 7: AVL Trees Linda Shapiro Winter 2015.

Lecture 10COMPSCI.220.FS.T Binary Search Tree BST converts a static binary search into a dynamic binary search allowing to efficiently insert and.

Binary Search Trees1 Chapter 3, Sections 1 and 2: Binary Search Trees AVL Trees   

COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables II.

CSE332: Data Abstractions Lecture 7: AVL Trees

Lecture 15 Nov 3, 2013 Height-balanced BST Recall:

Lec 13 Oct 17, 2011 AVL tree – height-balanced tree Other options:

BCA-II Data Structure Using C

Balancing Binary Search Trees

UNIT III TREES.

CSIT 402 Data Structures II

CS202 - Fundamental Structures of Computer Science II

CS202 - Fundamental Structures of Computer Science II

CSE373: Data Structures & Algorithms Lecture 7: AVL Trees

AVL Tree Mohammad Asad Abbasi Lecture 12

CSE373: Data Structures & Algorithms Lecture 7: AVL Trees

AVL Trees CENG 213 Data Structures.

CS202 - Fundamental Structures of Computer Science II

CSE373: Data Structures & Algorithms Lecture 5: AVL Trees

Copyright © Aiman Hanna All rights reserved

AVL Trees CSE 373 Data Structures.

CSE 332: Data Abstractions AVL Trees

CS202 - Fundamental Structures of Computer Science II

AVL-Trees (Part 1).

Lecture 10 Oct 1, 2012 Complete BST deletion Height-balanced BST

Richard Anderson Spring 2016

1 Lecture 13 CS2013.

Chapter 12&13: Binary Search Trees (BSTs)

CS202 - Fundamental Structures of Computer Science II

CS202 - Fundamental Structures of Computer Science II

Presentation transcript:

Binary Search Trees Cormen (cap 12, Edition 3) Estruturas de Dados e seus Algoritmos (Cap 4)

2 Dictionary Data Structures Goal: – Design a data structure to store a small set of keys S={k 1,k 2,..,k n } from a large universe U. – It shall efficiently support –Query(x): determine whether a key x is in S or not –Insert(x): Add x to the set S if x is not there –Delete(x): Remove x from S if x is there Additional Goals – Low memory consumption – Efficient construction

3 Dictionary Data Structures Linked Lists  Query(x): O(n) time  Insert(x): Insert at the beginning of the list: O(1) time  Delete(x): Find and then remove, O(n) time  Construction time: O(n) time  Space consumption: O(n)

4 Binary Search Trees n A Binary search tree(BST) T for a list K= (k 1 < ··· < k n ) of n keys is a rooted tree that satisfies the following properties: – It has n internal nodes and n+1 leaves – Each internal node of T is associated with a distinct key – [Search Property] If node v is associated with key k i then:  all nodes at the left subtree of v are associated with keys smaller than k i  all nodes at the right subtree of v are associated with keys larger than k i

How does the BST works? Search Property: x y  x

How does the BST works? Search Property: x y  x x  z

7 Binary Search Trees n Binary Search Trees k2k2 k1k1 k4k4 k3k3 k5k5

8 Binary Search Trees n Keys are elements from a totally ordered set U – U can be the set of integers – U can be the set of students from a university

9 Binary Search Trees n Additional Properties – The key with minimum value is stored in the leftmost node of the tree – The key with maximum value is stored in the righttmost node of the tree

10 Binary Search Trees n Basic Operations – Query(x): Determine whether x belongs to T or not – Insert(x): if x is not in T, then insert x in T. – Delete(x): If x in T, then remove x from T

BST: Query(x) Algorithm Query(x) If x = leaf then Return “element was not found” End If If x = root then Return “element was found” Else if x < root then search in the left subtree else search in the right subtree End If

12 Binary Search Trees n Query(x) k2k2 k1k1 k4k4 k3k3 k5k5

Inseting a new key Add a new element in the tree at the correct position in order to keep the search property. Algorithm Insert(x, T) If x = root.key then Return ‘x is already in T’ End If If root(T) is a leaf then Associate the leaf with x Return End If If x < root.key then Insert (x, left tree of T) else Insert (x, right tree of T) End If

Example: Insert(50), Insert(20), Insert(39), Insert(8), Insert(79), Insert(26) Inseting a new key

Removing a node in a BST SITUATIONS: Removing a leaf Removing an internal node with a unique child Removing an internal node with two children

Removing a Leaf

Removing a node in a BST SITUATIONS: Removing a leaf Removing an internal node with a unique child Removing an internal node with two children

It is necessary to correct the pointer, “jumping” the node: the only grandchild becomes the right (left) son. Removing an internal node with a unique child

Removing a node in a BST SITUATIONS: Removing a leaf Removing an internal node with a unique child Removing an internal node with two children

Find the element which preceeds the element to be removed considering the ordering n (this corresponds to remove the rightmost element from the left subtree) Switch the information of the node to be removed with the node found Removing an internal node with two children

34 Binary Search Trees: Operations Complexity n Basic Operations – Query(x): Determine whether x belongs to T or not  Number of operations = O( height of T) – Insert(x): if x is not in T, then insert x in T.  Number of operations = O( height of T) – Delete(x): If x in T, then remove x from T  Number of operations = O( height of T) – Max(T) and Min(T)  Number of operations = O( height of T)

35 Binary Search Trees: Operations Complexity n Basic Operations – Query(x): Determine whether x belongs to T or not  Number of operations = O( height of T) – Insert(x): if x is not in T, then insert x in T.  Number of operations = O( height of T) – Delete(x): If x in T, then remove x from T  Number of operations = O( height of T) – Max(T) and Min(T)  Number of operations = O( height of T) n Shallow trees are desirable

36 Binary Search Trees: Construction Simple Approach: let k 1,…, k n be the set of key not necessarily ordered. Proceed as follows: insert( k 1 ), insert( k 2 ),..., insert( k n )

Example: 50, 20, 39, 8, 79, 26, 58, 15, 88, 4, 85, 96, 71, 42,

38 Binary Search Trees: Construction Simple Approach: let k 1,…, k n be the set of key not necessarily ordered. Proceed as follows: insert( k 1 ), insert( k 2 ),..., insert( k n )  The structure has height O(n) if the set of keys is ordered.  For a random permutation of the n first integers, its expected height is O(log n) (Cormen, 12.3)

39 Binary Search Trees: Construction Simple Approach: let k 1,…, k n be the set of key not necessarily ordered. Proceed as follows: Sort the keys BST(1:n) root(T)  ‘median key’ left(root) <- BST(1,n/2) right(root) <- BST(n/2+1,n) End

Relation between #nodes and height of a binary tree At each level the number of nodes duplicates, such that for a binary tree with height h we have at most: h-1 = 2 h – 1 nodes

Relation between #nodes and height of a binary tree At each level the number of nodes duplicates, such that for a binary tree with height h we have at most: h-1 = 2 h – 1 nodes Or equivalently: The height of every binary search tree with n nodes is at least log n

The tree may become unbalanced Remove: node

The tree may become unbalanced Remove: node 8 Remove node

The tree may become unbalanced Remove: node 8 Remove node

The tree may become unbalanced The binary tree may become degenerate after operations of insertion and remotion: becoming a list, for example.

Balanced Trees Cormen (cap 13, Edition 3) Estruturas de Dados e seus Algoritmos (Cap 5)

AVL TREES (Adelson-Velskii and Landis 1962) BST trees that maintain a reasonable balanced tree all the time. Key idea: if insertion or deletion get the tree out of balance then fix it immediately All operations insert, delete,… can be done on an AVL tree with N nodes in O(log N) time (worst case)

AVL TREES AVL Tree Property: It is a BST in which the heights of the left and right subtrees of the root differ by at most 1 and in which the right and left subtrees are also AVL trees Height: length of the longest path from the root to a leaf.

AVL TREES: Example: An example of an AVL tree where the heights are shown next to the nodes:

AVL TREES Other Examples:

AVL TREES Other Examples:

Let r be the root of an AVL tree of height h Let N h denote the minimum number of nodes in an AVL tree of height h Relation between #nodes and height of na AVL tree

Let r be the root of an AVL tree of height h Let N h denote the minimum number of nodes in an AVL tree of height h r Te Td T Relation between #nodes and height of na AVL tree

Let r be the root of an AVL tree of height h Let N h denote the minimum number of nodes in an AVL tree of height h r Te Td h-1 T Relation between #nodes and height of na AVL tree

Let r be the root of an AVL tree of height h Let N h denote the minimum number of nodes in an AVL tree of height h r Te Td h-1h-1 ou h-2 T Relation between #nodes and height of na AVL tree

Let r be the root of an AVL tree of height h Let N h denote the minimum number of nodes in an AVL tree of height h It grows faster than Fibonacci series  Nh ≥ h-2 r Te Td h-1h-1 ou h-2 T Nh ≥ 1 + Nh-1 + Nh-2 Relation between #nodes and height of na AVL tree

Let r be the root of an AVL tree of height h Let N h denote the minimum number of nodes in an AVL tree of height h It grows faster than Fibonacci series  Nh ≥ h-2 Height of AVL Tree <= 1.44 log N (N is the number of nodes) r Te Td h-1h-1 ou h-2 T Nh ≥ 1 + Nh-1 + Nh-2 Relation between #nodes and height of na AVL tree

58

Relation between #nodes and height of na AVL tree Nh ≥ 1 + Nh-1 + Nh-2 ≥ 2Nh-2 + 1

Relation between #nodes and height of na AVL tree Nh ≥ 1 + Nh-1 + Nh-2 ≥ 2Nh ≥ 2Nh-2

Relation between #nodes and height of na AVL tree Nh ≥ 1 + Nh-1 + Nh-2 ≥ 2Nh ≥ 2Nh-2 ≥ 2(2Nh-4)

Relation between #nodes and height of na AVL tree Nh ≥ 1 + Nh-1 + Nh-2 ≥ 2Nh ≥ 2Nh-2 ≥ 2(2Nh-4) ≥ 2 2 (Nh-4)

Relation between #nodes and height of na AVL tree Nh ≥ 1 + Nh-1 + Nh-2 ≥ 2Nh ≥ 2Nh-2 ≥ 2(2Nh-4) ≥ 2 2 (Nh-4) ≥ 2 2 (2 Nh-6)

Relation between #nodes and height of na AVL tree Nh ≥ 1 + Nh-1 + Nh-2 ≥ 2Nh ≥ 2Nh-2 ≥ 2(2Nh-4) ≥ 2 2 (Nh-4) ≥ 2 2 (2 Nh-6) ≥ 2 3 Nh-6

Relation between #nodes and height of na AVL tree Nh ≥ 1 + Nh-1 + Nh-2 ≥ 2Nh ≥ 2Nh-2 ≥ 2(2Nh-4) ≥ 2 2 (Nh-4) ≥ 2 2 (2 Nh-6) ≥ 2 3 Nh-6 ≥ 2 i Nh-2i

Relation between #nodes and height of na AVL tree Nh ≥ 1 + Nh-1 + Nh-2 ≥ 2Nh ≥ 2Nh-2 ≥ 2(2Nh-4) ≥ 2 2 (Nh-4) ≥ 2 2 (2 Nh-6) ≥ 2 3 Nh-6 ≥ 2 i Nh-2i Cases: h=1  Nh = 1 h=2  Nh = 2

Relation between #nodes and height of na AVL tree Nh ≥ 1 + Nh-1 + Nh-2 ≥ 2Nh ≥ 2Nh-2 ≥ 2(2Nh-4) ≥ 2 2 (Nh-4) ≥ 2 2 (2 Nh-6) ≥ 2 3 Nh-6 ≥ 2 i Nh-2i Cases: h=1  Nh = 1 h=2  Nh = 2 Solving the base case we get: n(h) > 2 h/2-1 Thus the height of an AVL tree is O(log n)

Relation between #nodes and height of na AVL tree Nh ≥ 1 + Nh-1 + Nh-2 ≥ 2Nh ≥ 2Nh-2 ≥ 2(2Nh-4) ≥ 2 2 (Nh-4) ≥ 2 2 (2 Nh-6) ≥ 2 3 Nh-6 ≥ 2 i Nh-2i Cases: h=1  Nh = 1 h=2  Nh = 2 Solving the base case we get: n(h) > 2 h/2-1 Thus the height of an AVL tree is O(log n) We can also get to this limit by the Fibonacci number (Nh =Nh-1 + Nh-2) We can also get to this limit by the Fibonacci number (Nh =Nh-1 + Nh-2)

Height of AVL Tree Height of the tree is O(logN) n Where N is the number of elements contained in the tree This implies that tree search operations n Query(), Max(), Min() take O(logN) time.

Insertion in an AVL Tree Insertion is as in a binary search tree (always done by expanding an external node)

Insertion in an AVL Tree Insertion is as in a binary search tree (always done by expanding an external node) Example:

Insertion in an AVL Tree Insertion is as in a binary search tree (always done by expanding an external node) Example: Insert node

Insertion in an AVL Tree Insertion is as in a binary search tree (always done by expanding an external node) Example: Insert node

Insertion in an AVL Tree Insertion is as in a binary search tree (always done by expanding an external node) Example: Insert node

Insertion in an AVL Tree Insertion is as in a binary search tree (always done by expanding an external node) Example: Insert node

Insertion in an AVL Tree Insertion is as in a binary search tree (always done by expanding an external node) Example: Insert node

Insertion in an AVL Tree Insertion is as in a binary search tree (always done by expanding an external node) Example: Insert node Unbalanced!!

Insertion in an AVL Tree Insertion is as in a binary search tree (always done by expanding an external node) Example: Insert node Unbalanced!!

After insertion and deletion we will examine the tree structure and see if any node violates the AVL tree property n If the AVL property is violated at node x, it means the heights of left(x) and right(x) differ by exactly 2 If it does violate the property we can modify the tree structure using “rotations” to restore the AVL tree property How does the AVL tree work?

Rotations Two types of rotations n Single rotations – two nodes are “rotated” n Double rotations – three nodes are “rotated”

Localizing the problem Two principles: Imbalance will only occur on the path from the inserted/deleted node to the root (only these nodes have had their subtrees altered - local problem) Rebalancing should occur at the deepest unbalanced node (local solution too)

Single Rotation (Right): Case I Rotate x with left child y x and y satisfy the AVL property after the rotation

Single Rotation (Left): Case II Rotate x with right child y x and y satisfy the AVL property after the rotation

Single Rotation - Example Tree is an AVL tree by definition. h h+1

h h+2 Node 02 added Tree violates the AVL definition! Perform rotation. Single Rotation - Example

Tree has this form. h h h+1 A B C x y Single Rotation - Example

Example – After Rotation Tree has this form. A BC x y

Single Rotation Sometimes a single rotation fails to solve the problem k2 k1 X Y Z X Y Z k2 h+2 h h In such cases, we need to use a double-rotation

Double Rotations: Case IV

Tree is an AVL tree by definition. h h+1 Delete node 94 Double Rotations

AVL tree is violated. h h+2 Double Rotations

Tree has this form. B1B2 C A x y z Double Rotations

AB1B2C xy z Tree has this form After Double Rotations

Insertion We keep the height of each node x to check the AVL properrty Part 1. Perform normal BST insertion Part 2. Check AVL property and restore the property if necessary. – To check whether the AVL property persists we only need to check the nodes in the path from the new leaf to the root of the BST because the balance of the other nodes are not affected – Check if node x is balanced using the identity Height(x) = 1 + max { Height (left(x)), Height(right(x) } – We should update the heights of the visited nodes in this process

Insertion: Part 2 Detailed For each x in the path from the inserted leaf towards the root. If the heights of left(x) and right(x) height differ at most by 1 Do ‘nothing’ Else we know that one of the subtrees of x has height h and the other h+2 If the height of left(x) is h+2 then – If the height of left(left(x)) is h+1, we single rotate with left child (case 1) – Otherwise, the height of right(left(x)) is h+1 and we double rotate with left child (case 3) Otherwise, the height of right(x) is h+2 – If the height of right(right(x)) is h+1, then we rotate with right child (case 2) – Otherwise, the height of left(right(x)) is h+1 and we double rotate with right child (case 4) Break For

Insertion: Correctness Let x be the deepest node that does not satisfy the AVL property. Assume that case 2 occurs (the new element is inserted in tree C) x and y satisfy the property after the rotation. The ancestors of x satisfy the property because the height(x) before the insertion is h+2 and height(y) after the rotation is also h+2

Insertion: Correctness Let x be the deepest node that does not satisfy the AVL property. Assume that case 2 occurs (the new element is inserted in tree C) The nodes in the path between the new element and y also satisfy the AVL property due to the assumption that x is the deepest node for which the AVL property does not hold Nodes that are not in the path from the root to the new element are not affected

Insertion: Correctness Let x be the deepest node that does not satisfy the AVL property. Assume that case 4 occurs (the new element is inserted in tree B1) x, y and z satisfy the property after the rotation The ancestors of x are balanced after the rotation because the height of x is h+2 before the insertion and the height of z is h+2 after the rotation.

Insertion: Correctness Let x be the deepest node that does not satisfy the AVL property. Assume that case 4 occurs (the new element is inserted in tree B1) The remaining nodes in the path between the new element and x also satisfy the property due to the assumption that x is the deepest node that does not satisfy the AVL property The nodes that are not in the path between the new element and x are not affected.

Insertion: Complexity The time complexity to perform a rotation is O(1) since we just update a few pointers The time complexity to find a node that violates the AVL property depends on the height of the tree, which is log(N)

Deletion Perform normal BST deletion Perform verification similar to those employed for the insertion to restore the tree property

Summary AVL Trees Maintains a Balanced Tree Modifies the insertion and deletion routine n Performs single or double rotations to restore structure Guarantees that the height of the tree is O(logn) n The guarantee directly implies that functions find(), min(), and max() will be performed in O(logn)

Other Balanced trees Red Black Trees (Cormen Cap 13, Jayme cap 6) 2-3 Trees (Hopcroft)

104 Dictionary Problem: non uniform access probabilities n We want to keep a data structure to support a sequence of INSERT, QUERY, DELETE operations – Some elements are accessed much more often than others  non-uniform access probabilities

Consider the following AVL Tree Dictionary Problem: non uniform access probabilities

Consider the following AVL Tree Dictionary Problem: non uniform access probabilities Suppose we want to search for the following sequence of elements: 48, 48, 48, 48, 62, 62, 62, 48, 62.

Consider the following AVL Tree Suppose we want to search for the following sequence of elements: 48, 48, 48, 48, 62, 62, 62, 48, 62. Dictionary Problem: non uniform access probabilities In this case, is this a good structure?

Consider the following AVL Tree Suppose we want to search for the following sequence of elements: 48, 48, 48, 48, 62, 62, 62, 48, 62. Dictionary Problem: non uniform access probabilities This structure is much better! 17 88

109 Dictionary Problem: non uniform access probabilities Application: Building Inverted indexes n Given a text T, we want to design an inverted index S for T, that is, a structure that maintains for every word x of T, the list of positions where x occurs. T ALO ALO MEU AMIGO …. ALO AMIGO MEU Positions ALO  1,4,30 AMIGO  12,34 MEU  9, 40

110 Dictionary Problem: non uniform access probabilities Application: Building Inverted indexes n Given a text T, we want to design an inverted index S for T, that is, a structure that maintains for every word x of T, the list of positions where x occurs. T ALO ALO MEU AMIGO …. ALO AMIGO MEU Positions ALO  1,4,30 AMIGO  12,34 MEU  9, 40 n We do not know the list of words beforehand; some words may occur much more frequently than others

111 Dictionary Problem: non uniform access probabilities Static Case: distribution access probability is known beforehand Lists Optimal Binary Search Trees Dynamic Case: distribution access probability is not known beforehand Self Adjusted Lists Self Adjusted Binary Search Trees Splay Trees

112 Dictionary Problem with non uniform access probabilities Problem n Given sequence K = k 1 < k 2 <··· < k n of n sorted keys, with a search probability p i for each key k i. – We assume that we always search an element that belongs to K. This assumption can be easily removed. n Want to design a data structure with minimum expected search cost. n Actual cost = # of items examined. – For key k i, number of elements accessed before finding k i

Optimal Binary Search Trees Cormen (cap 15.5, Edition 3) Estruturas de Dados e seus Algoritmos (Cap 4)

114 Dictionary Problem: non uniform access probabilities Approach 1: Linked lists n Put the elements with highest probabilities of being accessed at the beginning of the list n Keys (1,2,3,4,5); p=(0.1, 0.3, 0.2, 0.05, 0.15) n Best possible linked list  2  3  5  1  4  Expected cost of accessing an element = 1 x x x x x0.05

115 Approach 2: Binary Search Tree n Given sequence K = k 1 < k 2 <··· < k n of n sorted keys, with a search probability p i for each key k i. n Want to build a binary search tree (BST) with minimum expected search cost. n Actual cost = # of items examined. n For key k i, cost = depth T (k i ) + 1, where depth T (k i ) = depth of k i in BST T. (root is at depth 0) Dictionary Problem with non uniform access probabilities

116 Expected Search Cost Sum of probabilities is 1. Identity (1)

117 Example Consider 5 keys with these search probabilities: p 1 = 0.25, p 2 = 0.2, p 3 = 0.05, p 4 = 0.2, p 5 = 0.3. k2k2 k1k1 k4k4 k3k3 k5k5 i depth T ( k i ) depth T ( k i ) · p i Therefore, E[search cost] = 2.15.

118 Example p 1 = 0.25, p 2 = 0.2, p 3 = 0.05, p 4 = 0.2, p 5 = 0.3. i depth T (k i ) depth T (k i )·p i Therefore, E[search cost] = k2k2 k1k1 k5k5 k4k4 k3k3 This tree turns out to be optimal for this set of keys.

119 Example Observations: n Optimal BST may not have smallest height. n Optimal BST may not have highest-probability key at root. Build by exhaustive checking? n Construct each n-node BST. n For each, assign keys and compute expected search cost. n But there are  (4 n /n 3/2 ) different BSTs with n nodes.

120 Optimal Substructure Any subtree of a BST contains keys in a contiguous range k i,..., k j for some 1 ≤ i ≤ j ≤ n. If T is an optimal BST and T contains subtree T ’ with keys k i,...,k j, then T must be an optimal BST for keys k i,..., k j. Proof: Otherwise, we can obtain a tree better T by replacing T’ with an optimal BST for keys k i,..., k j. T T

121 Optimal Substructure One of the keys in k i, …,k j, say k r, where i ≤ r ≤ j, must be the root of an optimal subtree for these keys. Left subtree of k r contains k i,...,k r  1. Right subtree of k r contains k r+1,...,k j. To find an optimal BST: n Examine all candidate roots k r, for i ≤ r ≤ j n Determine all optimal BSTs containing k i,...,k r  1 and containing k r+1,...,k j krkr kiki k r-1 k r+1 kjkj

Recursive Solution When the OPT subtree becomes a subtree of a node: n Depth of every node in OPT subtree goes up by 1. n Expected search cost increases by from Identity (1)

123 Recursive Solution When the OPT subtree becomes a subtree of a node: n Depth of every node in OPT subtree goes up by 1. n Expected search cost increases by from Identity (1) k1k1 k4k4 k3k3 k5k5 k2k2 k1k1 k4k4 k3k3 k5k5 k0k0

124 Recursive Solution e[i,j]: cost of the optimal BST for k i,..,k j : If k r is the root of an optimal BST for k i,..,k j : n e[i, j ] = p r + ( e[i, r  1] + w(i, r  1) ) + ( e[r+1, j] + w(r+1, j) )= e[i, r  1] + e[r+1, j] + w(i, j). But, we don’t know k r. Hence,

125 Computing an Optimal Solution For each subproblem (i,j), store: expected search cost in a table e [1.. n+1, 0.. n] n Will use only entries e[i, j ], where j ≥ i  1. root[i, j ] = root of subtree with keys k i,..,k j, for 1 ≤ i ≤ j ≤ n. w[1..n+1, 0..n] = sum of probabilities n w[i, i  1] = 0 for 1 ≤ i ≤ n. n w[i, j ] = w[i, j-1] + p j for 1 ≤ i ≤ j ≤ n.

126 Pseudo-code 1. OPTIMAL-BST(p, q, n) 2. for i ← 1 to n do e[i, i  1] ← 0 4. for len ← 1 to n 5. do for i ← 1 to n  len do j ←i + len  1 7. e[i, j ]←∞ 8. for r ←i to j 9. do t ← e[i, r  1] + e[r + 1, j ] + w[i, j ] 10. if t < e[i, j ] 11. then e[i, j ] ← t 12. root[i, j ] ←r 13. return e and root 1. OPTIMAL-BST(p, q, n) 2. for i ← 1 to n do e[i, i  1] ← 0 4. for len ← 1 to n 5. do for i ← 1 to n  len do j ←i + len  1 7. e[i, j ]←∞ 8. for r ←i to j 9. do t ← e[i, r  1] + e[r + 1, j ] + w[i, j ] 10. if t < e[i, j ] 11. then e[i, j ] ← t 12. root[i, j ] ←r 13. return e and root Time: O(n 3 ) Space: O(n 2 ) Consider all trees with l keys. Fix the first key. Fix the last key Determine the root of the optimal (sub)tree

127 Speeding up the Algorithm Knuth principle: Let k r be the root of an optimal BST for the set of keys k i k j and k i-1 < k i. Then, (i) there is an optimal BST for the set of keys k i-1,k i,..., k j with root smaller than or equal to k r (ii) there is an optimal BST for the set of keys k i,k i+1,..., k j+1 with root larger than or equal to k r

128 Knuth principle: Example p 1 = 0.25, p 2 = 0.2, p 3 = 0.05, p 4 = 0.2, p 5 = 0.3. Let k 0 be a key with probability p 0 then there is an optimal BST for the set (k 0,…, k 5 ) with root smaller than or equal to k 2. k2k2 k1k1 k5k5 k4k4 k3k3

129 Knuth principle: Example p 1 = 0.25, p 2 = 0.2, p 3 = 0.05, p 4 = 0.2, p 5 = 0.3. Let k 6 be a key with probability p 6 then there is an optimal BST for the set (k 1,…, k 6 ) with root larger than or equal to k 2 k2k2 k1k1 k5k5 k4k4 k3k3

130 Speeding up the Algorithm 1. OPTIMAL-BST-Revised(p, q, n) 2. for i ← 1 to n do e[i, i  1] ← 0 4. for len ← 1 to n 5. do for i ← 1 to n  len do j ←i + len  1 7. e[i, j ]←∞ 8. for r ←root[i,j-1] to root[i+1,j] 9. do t ← e[i, r  1] + e[r + 1, j ] + w[i, j ] 10. if t < e[i, j ] 11. then e[i, j ] ← t 12. root[i, j ] ←r 13. return e and root 1. OPTIMAL-BST-Revised(p, q, n) 2. for i ← 1 to n do e[i, i  1] ← 0 4. for len ← 1 to n 5. do for i ← 1 to n  len do j ←i + len  1 7. e[i, j ]←∞ 8. for r ←root[i,j-1] to root[i+1,j] 9. do t ← e[i, r  1] + e[r + 1, j ] + w[i, j ] 10. if t < e[i, j ] 11. then e[i, j ] ← t 12. root[i, j ] ←r 13. return e and root Time: O(n 2 ) Space: O(n 2 ) Consider all trees with l keys. O(n l ) Determine the root of the optimal (sub)tree Optimization.

Speeding up the Algorithm 131

Lower Bound on the expected search cost 132

Lower Bound on the expected search cost 133

Lower Bound on the expected search cost 134