Multidimensional Access Methods Ho Hoang Nguyen Nguyen Thanh Trong Dao Vu Quoc Trung Ngo Phuoc Huong Thien DATABASE SYSTEM
2 Outline Spatial Data and Multidimensional Access Methods Point Access Methods Spatial Access Methods Conclusion
3 Outline Spatial Data and Multidimensional Access Methods Point Access Methods Spatial Access Methods Conclusion
4 What is special about Spatial Data? Basic properties: Complex structure object could be point or thousands of polygons variable tuple size Dynamic different operations (insert, delete, update) are interleaved Large e.g.: gigabytes for geographic maps Integration of secondary and tertiary memory Several proposals but no standard algebra no standard set of operators operators depends of domain (application specific) Not closed operators result of an operator can return any kind of object Expensive computational costs
5 What is special about Spatial Data?... Special physical layer support is required for “Search” operators (for search, update, … ) Requirements for multidimensional access methods: Dynamic: keep track of changes (inserts, deletes, updates,... ) Secondary/tertiary storage management: not possible to have everything in main memory Broad range of supported operations: not sacrifice one for others Independence of the input data (distribution) and insertion sequence Simplicity Scalability Time efficiency of search: goal is to meet one-dimensional B-tree Space efficiency Concurrency and recovery: multiple concurrent accesses minimum impact for integration
6 Definitions Point access methods (PAMs): designed to perform spatial search in point DBs point could have 2,3,… dimensions, but no extension. Spatial access methods (SAMs): can manage extended objects (lines, polygons,...)
7 Outline Spatial Data and Multidimensional Access Methods Point Access Methods Spatial Access Methods Conclusion
8 Point Access Methods Previous access methods designed for main memory Can be used for secondary memory, but performance is very below optimum (no control over how OS accesses disks) Different approaches for PAMs: hashing (extended, linear) hierarchical (tree based) space-filling curves
K-d tree example a c b e d d b f f cae
3D k-d tree
Examples of applications
13 Outline Spatial Data and Multidimensional Access Methods Point Access Methods Spatial Access Methods Conclusion
14 Spatial Access Methods Previous methods are for points, not for objects with extension How to do objects with extension? modifying point access methods classification of methods: based on different techniques: transformation (object mapping) overlapping regions (object bounding) clipping (object duplication) multiple layers based on “base type”: primarily supported spatial data type, mostly intervals
15 Spatial Access Methods...
16 Spatial Access Methods... Transformation –transform object to different representation –then use PAMs or one-dimensional access methods –possible options: transform each object to higher dimensional point transform object to one-dimensional intervals using space- filling curves
17 Spatial Access Methods... Transformation … –mapping to higher dimensional space –e.g., four numbers ( = to a point in four dimensional space) for a rectangle –use one of the PAMs for this new point –options: x and y coordinates of two diagonal corners, endpoint transformation x and y coordinates of center, and height and width, midpoint transformation –more complex objects: approximate with rectangle or sphere! result: PAM provide partial result
18 Spatial Access Methods... Transformation (mapping to higher dimensional space ) … –cons: formulation of point and range queries is more difficult in new (dual) space –finite search regions may map to infinite search regions in dual space –more complex queries with spatial predicates may not be expressible at all depending on the mapping, the distribution of point in dual space may be highly non-uniform, even if data in original space is uniform image of two close objects may be far in dual space
19 Spatial Access Methods... Transformation … Space-Filling Curves for Extended Objects –has less drawbacks –represent extended objects with grid cells –equal to: represents extended object with union of several simpler objects –equal to: list of one-dimensional intervals that define position of the grids. –variations: z-ordering, Hilbert R-tree, UB-tree
20 Spatial Access Methods... Overlapping regions –idea: different data buckets correspond to mutually overlapping subspaces –can put any object to one bucket –extends regions to accommodate new data –increase search paths (due to overlap), even for point problem: performance, specially when objects are large in compare to universe –very large objects lead to ineffective index, the whole index should be searched !! minor problem: ambiguity during insertion – any subspace could be picked to enlarge –solution: »pick subspace that causes minimal additional overlap »or the one that requires least enlargement »or takes less time
21 Spatial Access Methods... Overlapping regions –R-tree: hierarchy of nested intervals nodes correspond to intervals intervals of descendant of a node are contained in interval of that node ! Same level nodes may have overlap leaf node: MBB and reference of the actual data Each node has between m (lower threshold) and M (upper threshold) entries m ensures efficient storage R-tree is height-balanced search is similar to B-tree, but several intervals in each level may satisfy the search provides candidate search results, requires refinement insertion: only one path is traversed, at each node pick the child which requires least enlargement to cover the object
22 Spatial Access Methods... Overlapping regions –R-tree
23 Spatial Access Methods... Overlapping regions –R-tree + Add node
24 Spatial Access Methods... Overlapping regions –R-tree + Add node Solution 1
25 Solution 2 Overlapping regions –R-tree + Add node Spatial Access Methods...
26 Incorrect Overlapping regions –R-tree + Add node Spatial Access Methods...
27 Overlapping regions –R-tree + Delete node deletion, may require adjustment in size of the covering interval Delete R8 Spatial Access Methods...
28 Overlapping regions –R-tree + Delete node deletion, may require adjustment in size of the covering interval Delete R8 Spatial Access Methods...
29 Spatial Access Methods... Overlapping regions –R*-tree: similar to R-tree forced reinsert policy: –if a node overflows, don’t split right away –remove some (30% of M) nodes from the node, and reinsert them deletion and search are same as R-tree splitting policy: –all R-tree policies –minimize overlap between same level nodes (less probability for multiple search paths) –minimized region perimeters (regions should become squares) –maximize storage utilization pro: 50% performance improvements con: cpu overhead for reinsert
30 Spatial Access Methods... Overlapping regions –R*-tree:
31 Comparative Studies Experimental Results Search performance for: R-tree, k-d-B-tree, R+-tree (10,000 uniformly distributed rectangles of varying size) k-d-B-tree can never compete with R-tree variants. Not much difference between R and R+ (R+ is significantly more difficult to code) R+ performs better when there is less overlap between rectangles R*-tree with several variants of R-tree R* is the winner for queries, best storage utilization and insertion time. (again, only disk access was measured)
32 Outline Spatial Data and Multidimensional Access Methods Point Access Methods Spatial Access Methods Conclusion
33 Conclusions Different point and spatial access methods No one is superior to others in whatever sense A method is a clear winner by a benchmark, inferior by another benchmark! Reason: So many different criteria for optimality So many parameters to define performance Example: A good access method for dense data may not be good for sparse data. An optimized index method for point queries may be inefficient for region query A good method for static environment may not be good for an environment which has too many insertion/deletion.
34 Conclusions... Technology transfer Pick the method that is easy to understand and implement and robust. Performance not that much important Try to optimize performance by highly tuned implementation. Examples: Quadtree for SICAD and SmallWorld GIS. R-tree by Informix Z-ordering by Oracle.
References Multidimensional Access Methods VOLKER GAEDE IC-Parc, Imperial College, London AND OLIVER GUNTHER Humboldt- Universitat, Berlin
THANKS FOR WATCHING