Multidimensional Range Search

Slides:



Advertisements
Similar presentations
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
Advertisements

Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Lecture 11 : More Geometric Data Structures Computational Geometry Prof. Dr. Th. Ottmann 1 Geometric Data Structures 1.Rectangle Intersection 2.Segment.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Heaps and heapsort COMP171 Fall 2005 Part 2. Sorting III / Slide 2 Heap: array implementation Is it a good idea to store arbitrary.
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
COSC 2007 Data Structures II Chapter 15 External Methods.
Sorting. Pseudocode of Insertion Sort Insertion Sort To sort array A[0..n-1], sort A[0..n-2] recursively and then insert A[n-1] in its proper place among.
2IL50 Data Structures Fall 2015 Lecture 9: Range Searching.
Starting at Binary Trees
Computational Geometry Piyush Kumar (Lecture 5: Range Searching) Welcome to CIS5930.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Binary Search Trees (BSTs) 18 February Binary Search Tree (BST) An important special kind of binary tree is the BST Each node stores some information.
Priority Search Trees Keys are pairs (x,y). Basic (search, insert, delete) and rectangle operations. Two varieties.  Based on a balanced binary search.
CMPS 3130/6130 Computational Geometry Spring 2015
UNC Chapel Hill M. C. Lin Geometric Data Structures Reading: Chapter 10 of the Textbook Driving Applications –Windowing Queries Related Application –Query.
Spatial Data Management
School of Computing Clemson University Fall, 2012
Binary Search Trees < > =
Computational Geometry
Multilevel Indexing and B+ Trees
Multilevel Indexing and B+ Trees
Higher Order Tries Key = Social Security Number.
Multiway Search Trees Data may not fit into main memory
CS 728 Advanced Database Systems Chapter 18
Interval Trees Store intervals of the form [li,ri], li <= ri.
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
CMPS 3130/6130 Computational Geometry Spring 2017
BST Trees
Chapter 11: Multiway Search Trees
Oracle SQL*Loader
Randomized Algorithms
Best-fit bin packing in O(n log n) time
Binary Search Tree Chapter 10.
Source: Muangsin / Weiss
Digital Search Trees & Binary Tries
CMPS 3130/6130 Computational Geometry Spring 2017
KD Tree A binary search tree where every node is a
Heap Chapter 9 Objectives Upon completion you will be able to:
Interval Heaps Complete binary tree.
Orthogonal Range Searching and Kd-Trees
Data Structures: Segment Trees, Fenwick Trees
Priority Search Trees Keys are pairs (x,y).
Wednesday, April 18, 2018 Announcements… For Today…
B-Trees (continued) Analysis of worst-case and average number of disk accesses for an insert. Delete and analysis. Structure for B-tree node.
Binary Search Trees < > =
B-Trees CSE 373 Data Structures CSE AU B-Trees.
Segment Trees Basic data structure in computational geometry.
Quadtrees 1.
Reporting (1-D) Given a set of points S on the line, preprocess them to build structure that allows efficient queries of the from: Given an interval I=[x1,x2]
Indexing and Hashing Basic Concepts Ordered Indices
Tree Representation Heap.
Priority Queues (Chapter 6.6):
Digital Search Trees & Binary Tries
B-Trees CSE 373 Data Structures CSE AU B-Trees.
(2,4) Trees 2/15/2019 (2,4) Trees (2,4) Trees.
Chapter 11 Indexing And Hashing (1)
CSE 373, Copyright S. Tanimoto, 2002 B-Trees -
CENG 351 Data Management and File Structures
CMPS 3130/6130 Computational Geometry Spring 2017
B-Trees CSE 373 Data Structures CSE AU B-Trees.
Priority Queues (Chapter 6):
CO4301 – Advanced Games Development Week 4 Binary Search Trees
Chapter 12 Heap ADT © 2011 Pearson Addison-Wesley. All rights reserved.
การวิเคราะห์และออกแบบขั้นตอนวิธี
B-Trees.
Heaps & Multi-way Search Trees
Chapter 7 : Sorting 교수 : 이상환 강의실 : 113,118호, 324호 연구실 : 과학관 204호
Presentation transcript:

Multidimensional Range Search Static collection of records. No inserts, deletes, changes. Only queries. Each record has k key fields. Multidimensional range query. Given k ranges [li, ui], 1 <= i <= k. Report all records in collection such that li <= ki <= ui, 1 <= i <= k. Note, in priority search tree, the search rectangle has yb = 0. So, priority search tree supports only a limited 2-d range search.

Multidimensional Range Search All employees whose age is between 30 and 40 and whose salary is between $40K and $70K. All cities with an annual rainfall between 40 and 60 inches, population between 100K and 200K, average temperature >= 70F, and number of horses between 1025 and 2500.

Data Structures For Range Search Unordered sequential list. Sorted tables. k tables. Table i is sorted by i’th key. Cells. k-d trees. Range trees. k-fold trees. k-ranges. Sequential list is O(n) for query. In sorted tables, use one table to get records that satisfy range query on one field; reject those that don’t satisfy remaining ranges. Packet classification is often modeled as a static multidimensional search (the data are multidimensonal rectangles and the query is not a range query). A multidimensional trie may be used.

Performance Measures P(n,k). S(n,k). Q(n,k). Preprocessing time to construct search structure for n records, each has k key fields. For many applications, this time needs only to be reasonable. S(n,k). Total space needed by the search structure. Q(n,k). Time needed to answer a query.

k-d Tree Binary tree. At each node of tree, pick a key field to partition records in that subtree into two approximately equal groups. Pick field i with max spread in values. Select median key value, m. Node stores i and m. Records with ki <= m in left subtree. Records with ki > m in right subtree. Stop when partition size <= 8 or 16 (say).

2-d Example d b a c g e f a b c d e g f Blue nodes are buckets that contain the records. Stopping criteria is when a partition has at most 3 records. Nodes are labeled by cut line. In practice, they would be labeled by cut coordinate and median/cut value. So, node labeled a would actually be labeled (x, x.value), where x is the cut coordinate. Node labeled e would be labeled (y, y.value). Leftmost bucket has the 3 leftmost points shown in top figure. Next bucket has 2 points/records. Range search/query in 2-d is defined by a rectangle. Example query is shown by yellow rectangle. For the example query, two buckets are examined and records in these two buckets that fall within the rectangle are reported. d e g f

Performance a b d e c f g P(n,k) = O(kn log n). O(kn) time to select partition keys at each level. O(n) time to find all medians and split at each level of the tree. O(log n) levels. Alternatively, sort on x and y to get 2 sorted lists (1 on x; 1 on y). Then split the lists in 2 as you go down.

Performance a b d e c f g S(n,k) = O(n). Actually O(n|record|). |record| = size of a record. S(n,k) = O(n). O(n) needed for the n records. Tree takes O(n) space.

Performance Q(n,k) depends on shape of query. O(n1-1/k + s), k > 1,where s is number of records that satisfy the query. Bound on worst-case query time. O(log n + s), average time when query is almost cubical and a small fraction of the n records satisfy the query. O(s), average time when query is almost cubical and a large fraction of the n records satisfy the query. Worst-case bound is for k > 1.

Range Trees—k=1 Sorted array on single key. 10 12 15 20 24 26 27 29 35 40 50 55 P(n,1) = O(n log n). S(n,1) = O(n). Q(n,1) = O(log n + s).

Range Trees—k=2 Let the two key fields be x and y. Binary search tree on x. x value used in a node is the median x value for all records in that subtree. Records with x value <= median are in left subtree. Records with larger x value in right subtree.

Range Trees—k=2 Each node has a sorted array on y of all records in the subtree. Root has sorted array of all n records. Left and right subtrees, each have a sorted array of about n/2 records. Stop partitioning when # records in a partition is small enough (say 8).

Example a b c d e f g a-g are x values. SA b c d e f g a-g are x values. x-range of a node begins at min x value in subtree and ends at max x value in subtree. In practice, it is sufficient to compute a super-range of a node’s x-range from that of its parent. So, if the x (super) range of a node is [L,R] and the partitioning value is m, the super range for the left child is [L,m] and that for the right is (m,R]. Alternatively, one may store the exact range in each node during construction.

Example—Search a b c d e f g SA b c d e f g If x-range of root is contained in x-range of query, search SA for records that satisfy y-range of query. Done. query x-range root x-range

Example—Search a b c d e f g SA b c d e f g If entire x-range of query <= x (> x)value in root, recursively search left (right) subtree. query x-range root x-value

Example—Search a b c d e f g SA b c d e f g If x-range of query contains value in root, recursively search left and right subtrees. query x-range root x-value

Performance a b c d e f g P(n,2) = O(n log n). SA O(n log n) – sort all records by y for the SAs. O(n) time to find all medians at each level of the tree.

Performance a b c d e f g P(n,2) = O(n log n). SA O(n) time to construct SAs at each level of the tree from SAs at preceding level. O(log n) levels. P(n,2) = O(n log n).

Performance a b c d e f g S(n,2) = O(n log n). SA O(n) needed for the SAs and nodes at each level. O(log n) levels.

Performance a b c d e f g Q(n,2) = O(log2 n + s). SA Suppose that the query x-range contains a – g and that the x-ranges of a, b, and c are not entirely contained in the query x-range. No SAs at levels 1 and 2 are searched. The x-ranges of e and f must be entirely contained in the x-range of the query. So, the SAs for e and f are searched. Now suppose that d has the children h and i and that the x-range of the query overlaps h. The x-range of i must be entirely in the x-range of the query and i’s SA is searched. Similarly, if g has children j and k and the x-range of the query overlaps k, the SA of j is searched. The subtrees h and k are searched recursively. At most two SAs are searched at each level. Note that all SAs that are searched must be for contiguous nodes. So, if 3 or more are searched, at least two must be siblings. The x-range of the parent of these two siblings must be contained in the query range. Therefore the parent, and not its children, will be searched. Q(n,2) = O(log2 n + s). At each level of the binary search tree, at most 2 SAs are searched. O(log n) levels.

Range Trees—k=3 Let the three key fields be w, x and y. Binary search tree on w. w value used in a node is the median w value for all records in that subtree. Records with w value <= median in left subtree. Records with larger w value in right subtree.

Range Trees—k=3 Each node has a 2-d range tree on x and y of all records in the subtree. Stop partitioning when # records in a partition is small enough (say 8).

Example a 2-d b c d e f g a-g are w values. w-range of a node begins at min w value in subtree and ends at max w value in subtree.

Example—Search a 2-d b c d e f g If w-range of root is contained in w-range of query, search 2-d range tree in root for records that satisfy x- and y-ranges of query. Done. If entire w-range of query <= w (> w) value in root, recursively search left (right) subtree.

Example—Search a 2-d b c d e f g c If w-range of query contains value in root, recursively search left and right subtrees.

Performance —3-d Range Tree b c d e f g P(n,3) = O(n log2 n). O(n) time to find all medians at each level of the tree.

Performance —3-d Range Tree b c d e f g P(n,3) = O(n log2 n). O(n log n) time to construct 2-d range trees at each level of the tree from data at preceding level. O(log n) levels.

Performance —3-d Range Tree b c d e f g S(n,3) = O(n log2 n). O(n log n) needed for the 2-d range trees and nodes at each level. O(log n) levels.

Performance —3-d Range Tree Q(n,3) = O(log3 n + s). At each level of the binary search tree, at most 2 2-d range trees are searched. O(log2 n + si) time to search each 2-d range tree. si is # records in the searched 2-d range tree that satisfy query. O(log n) levels.

Performance—k-d Range Tree P(n,k) = O(n logk-1 n), k > 1. S(n,k) = O(n logk-1 n). Q(n,k) = O(logk n + s).