Multidimensional Range Search Static collection of records. No inserts, deletes, changes. Only queries. Each record has k key fields. Multidimensional range query. Given k ranges [li, ui], 1 <= i <= k. Report all records in collection such that li <= ki <= ui, 1 <= i <= k. Note, in priority search tree, the search rectangle has yb = 0. So, priority search tree supports only a limited 2-d range search.
Multidimensional Range Search All employees whose age is between 30 and 40 and whose salary is between $40K and $70K. All cities with an annual rainfall between 40 and 60 inches, population between 100K and 200K, average temperature >= 70F, and number of horses between 1025 and 2500.
Data Structures For Range Search Unordered sequential list. Sorted tables. k tables. Table i is sorted by i’th key. Cells. k-d trees. Range trees. k-fold trees. k-ranges. Sequential list is O(n) for query. In sorted tables, use one table to get records that satisfy range query on one field; reject those that don’t satisfy remaining ranges. Packet classification is often modeled as a static multidimensional search (the data are multidimensonal rectangles and the query is not a range query). A multidimensional trie may be used.
Performance Measures P(n,k). S(n,k). Q(n,k). Preprocessing time to construct search structure for n records, each has k key fields. For many applications, this time needs only to be reasonable. S(n,k). Total space needed by the search structure. Q(n,k). Time needed to answer a query.
k-d Tree Binary tree. At each node of tree, pick a key field to partition records in that subtree into two approximately equal groups. Pick field i with max spread in values. Select median key value, m. Node stores i and m. Records with ki <= m in left subtree. Records with ki > m in right subtree. Stop when partition size <= 8 or 16 (say).
2-d Example d b a c g e f a b c d e g f Blue nodes are buckets that contain the records. Stopping criteria is when a partition has at most 3 records. Nodes are labeled by cut line. In practice, they would be labeled by cut coordinate and median/cut value. So, node labeled a would actually be labeled (x, x.value), where x is the cut coordinate. Node labeled e would be labeled (y, y.value). Leftmost bucket has the 3 leftmost points shown in top figure. Next bucket has 2 points/records. Range search/query in 2-d is defined by a rectangle. Example query is shown by yellow rectangle. For the example query, two buckets are examined and records in these two buckets that fall within the rectangle are reported. d e g f
Performance a b d e c f g P(n,k) = O(kn log n). O(kn) time to select partition keys at each level. O(n) time to find all medians and split at each level of the tree. O(log n) levels. Alternatively, sort on x and y to get 2 sorted lists (1 on x; 1 on y). Then split the lists in 2 as you go down.
Performance a b d e c f g S(n,k) = O(n). Actually O(n|record|). |record| = size of a record. S(n,k) = O(n). O(n) needed for the n records. Tree takes O(n) space.
Performance Q(n,k) depends on shape of query. O(n1-1/k + s), k > 1,where s is number of records that satisfy the query. Bound on worst-case query time. O(log n + s), average time when query is almost cubical and a small fraction of the n records satisfy the query. O(s), average time when query is almost cubical and a large fraction of the n records satisfy the query. Worst-case bound is for k > 1.
Range Trees—k=1 Sorted array on single key. 10 12 15 20 24 26 27 29 35 40 50 55 P(n,1) = O(n log n). S(n,1) = O(n). Q(n,1) = O(log n + s).
Range Trees—k=2 Let the two key fields be x and y. Binary search tree on x. x value used in a node is the median x value for all records in that subtree. Records with x value <= median are in left subtree. Records with larger x value in right subtree.
Range Trees—k=2 Each node has a sorted array on y of all records in the subtree. Root has sorted array of all n records. Left and right subtrees, each have a sorted array of about n/2 records. Stop partitioning when # records in a partition is small enough (say 8).
Example a b c d e f g a-g are x values. SA b c d e f g a-g are x values. x-range of a node begins at min x value in subtree and ends at max x value in subtree. In practice, it is sufficient to compute a super-range of a node’s x-range from that of its parent. So, if the x (super) range of a node is [L,R] and the partitioning value is m, the super range for the left child is [L,m] and that for the right is (m,R]. Alternatively, one may store the exact range in each node during construction.
Example—Search a b c d e f g SA b c d e f g If x-range of root is contained in x-range of query, search SA for records that satisfy y-range of query. Done. query x-range root x-range
Example—Search a b c d e f g SA b c d e f g If entire x-range of query <= x (> x)value in root, recursively search left (right) subtree. query x-range root x-value
Example—Search a b c d e f g SA b c d e f g If x-range of query contains value in root, recursively search left and right subtrees. query x-range root x-value
Performance a b c d e f g P(n,2) = O(n log n). SA O(n log n) – sort all records by y for the SAs. O(n) time to find all medians at each level of the tree.
Performance a b c d e f g P(n,2) = O(n log n). SA O(n) time to construct SAs at each level of the tree from SAs at preceding level. O(log n) levels. P(n,2) = O(n log n).
Performance a b c d e f g S(n,2) = O(n log n). SA O(n) needed for the SAs and nodes at each level. O(log n) levels.
Performance a b c d e f g Q(n,2) = O(log2 n + s). SA Suppose that the query x-range contains a – g and that the x-ranges of a, b, and c are not entirely contained in the query x-range. No SAs at levels 1 and 2 are searched. The x-ranges of e and f must be entirely contained in the x-range of the query. So, the SAs for e and f are searched. Now suppose that d has the children h and i and that the x-range of the query overlaps h. The x-range of i must be entirely in the x-range of the query and i’s SA is searched. Similarly, if g has children j and k and the x-range of the query overlaps k, the SA of j is searched. The subtrees h and k are searched recursively. At most two SAs are searched at each level. Note that all SAs that are searched must be for contiguous nodes. So, if 3 or more are searched, at least two must be siblings. The x-range of the parent of these two siblings must be contained in the query range. Therefore the parent, and not its children, will be searched. Q(n,2) = O(log2 n + s). At each level of the binary search tree, at most 2 SAs are searched. O(log n) levels.
Range Trees—k=3 Let the three key fields be w, x and y. Binary search tree on w. w value used in a node is the median w value for all records in that subtree. Records with w value <= median in left subtree. Records with larger w value in right subtree.
Range Trees—k=3 Each node has a 2-d range tree on x and y of all records in the subtree. Stop partitioning when # records in a partition is small enough (say 8).
Example a 2-d b c d e f g a-g are w values. w-range of a node begins at min w value in subtree and ends at max w value in subtree.
Example—Search a 2-d b c d e f g If w-range of root is contained in w-range of query, search 2-d range tree in root for records that satisfy x- and y-ranges of query. Done. If entire w-range of query <= w (> w) value in root, recursively search left (right) subtree.
Example—Search a 2-d b c d e f g c If w-range of query contains value in root, recursively search left and right subtrees.
Performance —3-d Range Tree b c d e f g P(n,3) = O(n log2 n). O(n) time to find all medians at each level of the tree.
Performance —3-d Range Tree b c d e f g P(n,3) = O(n log2 n). O(n log n) time to construct 2-d range trees at each level of the tree from data at preceding level. O(log n) levels.
Performance —3-d Range Tree b c d e f g S(n,3) = O(n log2 n). O(n log n) needed for the 2-d range trees and nodes at each level. O(log n) levels.
Performance —3-d Range Tree Q(n,3) = O(log3 n + s). At each level of the binary search tree, at most 2 2-d range trees are searched. O(log2 n + si) time to search each 2-d range tree. si is # records in the searched 2-d range tree that satisfy query. O(log n) levels.
Performance—k-d Range Tree P(n,k) = O(n logk-1 n), k > 1. S(n,k) = O(n logk-1 n). Q(n,k) = O(logk n + s).