Static Dictionaries Collection of items. Each item is a pair. (key, element) Pairs have different keys. Operations are: initialize/create get (search)
Hashing Perfect hashing (no collisions). Minimal perfect hashing (space = n). CHD (compress, hash, and displace) algorithm. O(n) time to construct the perfect or minimal perfect hash function. O(1) search time. Bothelo, Belazzougui & Dietzfelbinger. Compress, hash, and displace. 17th European Symposium on Algorithms, 2009.
Search Tree Hashing not efficient for extended operations such as range search and nearest match. Will examine a binary search tree structure for static dictionaries. Each item/key/element has an estimated access frequency (or probability). If indexed search is to be supported since static could keep a sorted array as well as binary search tree or add index to each node. Don’t need leftSize as no inserts/deletes and so index does not change.
Example Key Probability a 0.8 b 0.1 c a < b < c a b b a c c Cost of a search equals number of key comparisons. Cost = 0.8 * 1 + 0.1 * 2 + 0.1 * 3 = 1.3 Cost = 0.8 * 2 + 0.1 * 1 + 0.1 * 2 = 1.9
Search Types Successful. Unsuccessful. b c f1 f2 f3 Successful. Search for a key that is in the dictionary. Terminates at an internal node. Unsuccessful. Search for a key that is not in the dictionary. Terminates at an external/failure node.
Internal And External Nodes f0 a b c f1 f2 f3 A binary tree with n internal nodes has n + 1 external nodes. Let s1, s2, …, sn be the internal nodes, in inorder. key(s1) < key(s2) < … < key(sn). Let key(s0) = –infinity and key(sn+1) = infinity. Let f0, f1, …, fn be the external nodes, in inorder. fi is reached iff key(si) < search key < key(si+1). Do an inorder traversal of extended tree: f0, s1, f1, s1, f2, s2, f3, s3, …
Cost Of Binary Search Tree Let pi = probability for key(si). Let qi = probability for key(si) < search key < key(si+1). Sum of ps and qs = 1. Cost of tree = S0 <= i <= n qi (level(fi) – 1) + S1<= i <= n pi * level(si) Cost = weighted path length. Note that Huffman trees minimize the weighted external path length. Now we want to minimize the sum of weighted external path length and weighted internal path length.
Brute Force Algorithm Generate all binary search trees with n internal nodes. Compute the weighted path length of each. Determine tree with minimum weighted path length. Number of trees to examine is O(4n/n1.5). Brute force approach is impractical for large n.
Dynamic Programming Keys are a1 < a2 < …< an. Let Ti j= least cost tree for ai+1, ai+2, …, aj. T0n= least cost tree for a1, a2, …, an. a3 a5 a6 a7 a4 f2 f3 f5 f4 f7 f6 T2,7 Ti j includes pi+1, pi+2, …, pj and qi, qi+1, …, qj.
Terminology Ti j= least cost tree for ai+1, ai+2, …, aj. ci j= cost of Ti j = Si <= u <= j qu (level(fu) – 1) + Si < u <= j pu * level(su). ri j= root of Ti j. wi j= weight of Ti j = sum of ps and qs in Ti j = pi+1+ pi+2+ …+ pj + qi + qi+1 + … + qj T2,7 a3 a5 a6 a7 a4 f2 f3 f5 f4 f7 f6
i = j Ti j includes pi+1, pi+2, …, pj and qi, qi+1, …, qj. Ti i includes qi only. fi Tii ci i = cost of Ti i = 0. ri i = root of Ti i = 0. wi i = weight of Ti i = sum of ps and qs in Ti i = qi
i < j Ti j= least cost tree for ai+1, ai+2, …, aj. Ti j includes pi+1, pi+2, …, pj and qi, qi+1, …, qj. Let ak, i < k <= j, be in the root of Ti j. a3 a5 a6 a7 a4 f2 f3 f5 f4 f7 f6 ak L R Ti j L includes pi+1, pi+2, …, pk-1 and qi, qi+1, …, qk-1. R includes pk+1, pk+2, …, pj and qk, qk+1, …, qj.
cost(L) L includes pi+1, pi+2, …, pk-1 and qi, qi+1, …, qk-1. f2 f3 f5 f4 f7 f6 L L includes pi+1, pi+2, …, pk-1 and qi, qi+1, …, qk-1. cost(L) = weighted path length of L when viewed as a stand alone binary search tree.
Contribution To cij Ti j ci j = Si <= u <= j qu (level(fu) – 1) ak L R Ti j L ci j = Si <= u <= j qu (level(fu) – 1) + Si < u <= j pu * level(su). When L is viewed as a subtree of Ti j , the level of each node is 1 more than when L is viewed as a stand alone tree. So, contribution of L to cij is cost(L) + wi k-1.
cij Ti j Contribution of L to cij is cost(L) + wi k-1. a3 a5 a6 a7 a4 f2 f3 f5 f4 f7 f6 ak L R Ti j Contribution of L to cij is cost(L) + wi k-1. Contribution of R to cij is cost(R) + wkj. cij = cost(L) + wi k-1 + cost(R) + wkj + pk = cost(L) + cost(R) + wij
cij Ti j cij = cost(L) + cost(R) + wij cost(L) = cik-1 cost(R) = ckj a3 a5 a6 a7 a4 f2 f3 f5 f4 f7 f6 ak L R Ti j cij = cost(L) + cost(R) + wij cost(L) = cik-1 cost(R) = ckj cij = cik-1 + ckj + wij Don’t know k. cij = mini < k <= j{cik-1 + ckj} + wij
cij Ti j cij = mini < k <= j{cik-1 + ckj} + wij ak L R Ti j cij = mini < k <= j{cik-1 + ckj} + wij rij = k that minimizes right side.
Computation Of c0n And r0n Start with ci i = 0, ri i = 0, wi i = qi, 0 <= i <= n (zero-key trees). Use cij = mini < k <= j{cik-1 + ckj} + wij to compute cii+1, ri i+1, 0 <= i <= n – 1 (one-key trees). Now use the equation to compute cii+2, ri i+2, 0 <= i <= n – 2 (two-key trees). Now use the equation to compute cii+3, ri i+3, 0 <= i <= n – 3 (three-key trees). Continue until c0n and r0n(n-key tree) have been computed.
Computation Of c0n And r0n cij, rij, i <= j 1 2 3 4
Complexity cij = mini < k <= j{cik-1 + ckj} + wij O(n) time to compute one cij. O(n2) cijs to compute. Total time is O(n3). May be reduced to O(n2) by using cij = min ri,j-1 < k <= ri+1,j {cik-1 + ckj} + wij
Construct T0n Root is r0n. Suppose that r0n = 10. T0n T09 T10,n Construct T09 and T10,n recursively. Time is O(n).