Multidimensional Access Methods 12073125 Ho Hoang Nguyen 51204112 Nguyen Thanh Trong 51204119 Dao Vu Quoc Trung 51203574 Ngo Phuoc Huong Thien DATABASE.

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

Trees for spatial indexing
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Data Structures Hanan Samet Computer Science Department
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Access Methods for Advanced Database Applications.
Introduction to Spatial Database System Presented by Xiaozhi Yu.
Spatial Database Systems
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
2-dimensional indexing structure
Spatial indexing PAMs (II).
Multidimensional Access Methods
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Accessing Spatial Data
Spatial Indexing SAMs.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
Spatial Information Systems (SIS) COMP Spatial access methods: Indexing.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.
Chapter 3: Data Storage and Access Methods
Spatial Indexing I Point Access Methods.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Birch: An efficient data clustering method for very large databases
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Indexing for Multidimensional Data An Introduction.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
R-Tree. 2 Spatial Database (Ia) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.
Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA.
Spatial and Geographic Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Spatial Database 2/5/2011 Reference – Ramakrishna Gerhke and Silbershatz.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
Spatial and Geographic Databases. Spatial databases store information related to spatial locations, and support efficient storage, indexing and querying.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Spatial Databases - Indexing
R-T REES Accessing Spatial Data. I N THE BEGINNING … The B-Tree provided a foundation for R-Trees. But what’s a B-Tree? A data structure for storing sorted.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Mehdi Kargar Department of Computer Science and Engineering
Data Indexing Herbert A. Evans.
Data Structures: Disjoint Sets, Segment Trees, Fenwick Trees
Multidimensional Access Structures
Chapter 25: Advanced Data Types and New Applications
Spatial Indexing I Point Access Methods.
Hash-Based Indexes Chapter 11
Database Management Systems (CS 564)
The Quad tree The index is represented as a quaternary tree
Multidimensional Indexes
Spatial Indexing I R-trees
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Chapter 11 Instructor: Xin Zhang
Donghui Zhang, Tian Xia Northeastern University
Efficient Aggregation over Objects with Extent
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

Multidimensional Access Methods Ho Hoang Nguyen Nguyen Thanh Trong Dao Vu Quoc Trung Ngo Phuoc Huong Thien DATABASE SYSTEM

2 Outline Spatial Data and Multidimensional Access Methods Point Access Methods Spatial Access Methods Conclusion

3 Outline Spatial Data and Multidimensional Access Methods Point Access Methods Spatial Access Methods Conclusion

4 What is special about Spatial Data? Basic properties:  Complex structure  object could be point or thousands of polygons  variable tuple size  Dynamic  different operations (insert, delete, update) are interleaved  Large  e.g.: gigabytes for geographic maps  Integration of secondary and tertiary memory  Several proposals but no standard algebra  no standard set of operators  operators depends of domain (application specific)  Not closed operators  result of an operator can return any kind of object  Expensive computational costs

5 What is special about Spatial Data?... Special physical layer support is required for “Search” operators (for search, update, … ) Requirements for multidimensional access methods:  Dynamic: keep track of changes (inserts, deletes, updates,... )  Secondary/tertiary storage management: not possible to have everything in main memory  Broad range of supported operations: not sacrifice one for others  Independence of the input data (distribution) and insertion sequence  Simplicity  Scalability  Time efficiency of search: goal is to meet one-dimensional B-tree  Space efficiency  Concurrency and recovery: multiple concurrent accesses  minimum impact for integration

6 Definitions Point access methods (PAMs): designed to perform spatial search in point DBs  point could have 2,3,… dimensions, but no extension. Spatial access methods (SAMs): can manage extended objects (lines, polygons,...)

7 Outline Spatial Data and Multidimensional Access Methods Point Access Methods Spatial Access Methods Conclusion

8 Point Access Methods  Previous access methods designed for main memory  Can be used for secondary memory, but performance is very below optimum (no control over how OS accesses disks)  Different approaches for PAMs:  hashing (extended, linear)  hierarchical (tree based)  space-filling curves

K-d tree example a c b e d d b f f cae

3D k-d tree

Examples of applications

13 Outline Spatial Data and Multidimensional Access Methods Point Access Methods Spatial Access Methods Conclusion

14 Spatial Access Methods Previous methods are for points, not for objects with extension How to do objects with extension? modifying point access methods classification of methods:  based on different techniques:  transformation (object mapping)  overlapping regions (object bounding)  clipping (object duplication)  multiple layers  based on “base type”: primarily supported spatial data type, mostly intervals

15 Spatial Access Methods...

16 Spatial Access Methods... Transformation –transform object to different representation –then use PAMs or one-dimensional access methods –possible options: transform each object to higher dimensional point transform object to one-dimensional intervals using space- filling curves

17 Spatial Access Methods... Transformation … –mapping to higher dimensional space –e.g., four numbers ( = to a point in four dimensional space) for a rectangle –use one of the PAMs for this new point –options: x and y coordinates of two diagonal corners, endpoint transformation x and y coordinates of center, and height and width, midpoint transformation –more complex objects: approximate with rectangle or sphere! result: PAM provide partial result

18 Spatial Access Methods... Transformation (mapping to higher dimensional space ) … –cons: formulation of point and range queries is more difficult in new (dual) space –finite search regions may map to infinite search regions in dual space –more complex queries with spatial predicates may not be expressible at all depending on the mapping, the distribution of point in dual space may be highly non-uniform, even if data in original space is uniform image of two close objects may be far in dual space

19 Spatial Access Methods... Transformation … Space-Filling Curves for Extended Objects –has less drawbacks –represent extended objects with grid cells –equal to: represents extended object with union of several simpler objects –equal to: list of one-dimensional intervals that define position of the grids. –variations: z-ordering, Hilbert R-tree, UB-tree

20 Spatial Access Methods... Overlapping regions –idea: different data buckets correspond to mutually overlapping subspaces –can put any object to one bucket –extends regions to accommodate new data –increase search paths (due to overlap), even for point problem: performance, specially when objects are large in compare to universe –very large objects lead to ineffective index, the whole index should be searched !! minor problem: ambiguity during insertion – any subspace could be picked to enlarge –solution: »pick subspace that causes minimal additional overlap »or the one that requires least enlargement »or takes less time

21 Spatial Access Methods... Overlapping regions –R-tree: hierarchy of nested intervals nodes correspond to intervals intervals of descendant of a node are contained in interval of that node ! Same level nodes may have overlap leaf node: MBB and reference of the actual data Each node has between m (lower threshold) and M (upper threshold) entries m ensures efficient storage R-tree is height-balanced search is similar to B-tree, but several intervals in each level may satisfy the search provides candidate search results, requires refinement insertion: only one path is traversed, at each node pick the child which requires least enlargement to cover the object

22 Spatial Access Methods... Overlapping regions –R-tree

23 Spatial Access Methods... Overlapping regions –R-tree + Add node

24 Spatial Access Methods... Overlapping regions –R-tree + Add node Solution 1

25 Solution 2 Overlapping regions –R-tree + Add node Spatial Access Methods...

26 Incorrect Overlapping regions –R-tree + Add node Spatial Access Methods...

27 Overlapping regions –R-tree + Delete node deletion, may require adjustment in size of the covering interval Delete R8 Spatial Access Methods...

28 Overlapping regions –R-tree + Delete node deletion, may require adjustment in size of the covering interval Delete R8 Spatial Access Methods...

29 Spatial Access Methods... Overlapping regions –R*-tree: similar to R-tree forced reinsert policy: –if a node overflows, don’t split right away –remove some (30% of M) nodes from the node, and reinsert them deletion and search are same as R-tree splitting policy: –all R-tree policies –minimize overlap between same level nodes (less probability for multiple search paths) –minimized region perimeters (regions should become squares) –maximize storage utilization pro: 50% performance improvements con: cpu overhead for reinsert

30 Spatial Access Methods... Overlapping regions –R*-tree:

31 Comparative Studies Experimental Results Search performance for: R-tree, k-d-B-tree, R+-tree (10,000 uniformly distributed rectangles of varying size)  k-d-B-tree can never compete with R-tree variants.  Not much difference between R and R+ (R+ is significantly more difficult to code)  R+ performs better when there is less overlap between rectangles R*-tree with several variants of R-tree  R* is the winner for queries, best storage utilization and insertion time.  (again, only disk access was measured)

32 Outline Spatial Data and Multidimensional Access Methods Point Access Methods Spatial Access Methods Conclusion

33 Conclusions Different point and spatial access methods No one is superior to others in whatever sense A method is a clear winner by a benchmark, inferior by another benchmark!  Reason:  So many different criteria for optimality  So many parameters to define performance  Example:  A good access method for dense data may not be good for sparse data.  An optimized index method for point queries may be inefficient for region query  A good method for static environment may not be good for an environment which has too many insertion/deletion.

34 Conclusions... Technology transfer  Pick the method that is easy to understand and implement and robust.  Performance not that much important  Try to optimize performance by highly tuned implementation.  Examples:  Quadtree for SICAD and SmallWorld GIS.  R-tree by Informix  Z-ordering by Oracle.

References Multidimensional Access Methods  VOLKER GAEDE IC-Parc, Imperial College, London AND OLIVER GUNTHER Humboldt- Universitat, Berlin

THANKS FOR WATCHING