Rethinking Choices for Multi-dimensional Point Indexing You Jung Kim and Jignesh M. Patel University of Michigan.

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Join Yan Huang Spatial Join Given two sets of spatial data Find the pair of objects satisfying certain spatial predicate – e.g.
CMU SCS : Multimedia Databases and Data Mining Lecture#2: Primary key indexing – B-trees Christos Faloutsos - CMU
Danzhou Liu Ee-Peng Lim Wee-Keong Ng
Multidimensional Indexing
BIRCH: Is It Good for Databases? A review of BIRCH: An And Efficient Data Clustering Method for Very Large Databases by Tian Zhang, Raghu Ramakrishnan.
A Simple and Efficient Algorithm for R-Tree Packing Scott T. Leutenegger, Mario A. Lopez, Jeffrey Edgington STR Sunho Cho Jeonghun Ahn 1.
Indexing Network Voronoi Diagrams*
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
1 One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems Prasanna Ganesan Beverly Yang Hector Garcia-Molina Stanford University.
Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.
1 Overview of Storage and Indexing Chapter 8 (part 1)
Spatio-Temporal Databases
Optimization of Spatial Joins on Mobile Devices N. Mamoulis 1, P. Kalnis 2, S. Bakiras 3, X. Li 2 1 Department of Computer Science and Information Systems,
High-Dimensional Similarity Search using Data-Sensitive Space Partitioning ┼ Sachin Kulkarni 1 and Ratko Orlandic 2 1 Illinois Institute of Technology,
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
Chapter 3: Data Storage and Access Methods
An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer.
R-tree Analysis. R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?
Ad-hoc Distributed Spatial Joins on Mobile Devices Panos Kalnis, Xiaochen Li National University of Singapore Nikos Mamoulis The University of Hong Kong.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
1 Overview of Storage and Indexing Chapter 8 1. Basics about file management 2. Introduction to indexing 3. First glimpse at indices and workloads.
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Birch: An efficient data clustering method for very large databases
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
AAU A Trajectory Splitting Model for Efficient Spatio-Temporal Indexing Presented by YuQing Zhang  Slobodan Rasetic Jorg Sander James Elding Mario A.
Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
A Query Adaptive Data Structure for Efficient Indexing of Time Series Databases Presented by Stavros Papadopoulos.
M- tree: an efficient access method for similarity search in metric spaces Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
1 Overview of Storage and Indexing Chapter 8 (part 1)
R-Tree. 2 Spatial Database (Ia) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
Observer Relative Data Extraction Linas Bukauskas 3DVDM group Aalborg University, Denmark 2001.
Database Systems Laboratory The Pyramid-Technique: Towards Breaking the Curse of Dimensionality Stefan Berchtold, Christian Bohm, and Hans-Peter Kriegal.
R-trees: An Average Case Analysis. R-trees - performance analysis How many disk (=node) accesses we ’ ll need for range nn spatial joins why does it matter?
Multimedia and Time-Series Data When Is “ Nearest Neighbor ” Meaningful? Group member: Terry Chan, Edward Chu, Dominic Leung, David Mak, Henry Yeung, Jason.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
23 1 Christian Böhm 1, Florian Krebs 2, and Hans-Peter Kriegel 2 1 University for Health Informatics and Technology, Innsbruck 2 University of Munich Optimal.
STR: A Simple and Efficient Algorithm for R-Tree Packing.
A Spatial Index Structure for High Dimensional Point Data Wei Wang, Jiong Yang, and Richard Muntz Data Mining Lab Department of Computer Science University.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
Multi-dimensional Range Query Processing on the GPU Beomseok Nam Date Intensive Computing Lab School of Electrical and Computer Engineering Ulsan National.
Indexing Multidimensional Data
Spatial Data Management
Presented by: Omar Alqahtani Fall 2016
Strategies for Spatial Joins
Data Indexing Herbert A. Evans.
RE-Tree: An Efficient Index Structure for Regular Expressions
Spatial Indexing I Point Access Methods.
Introduction to Query Optimization
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Distributed Probabilistic Range-Aggregate Query on Uncertain Data
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Efficient Cost Models for Spatial Queries Using R-Trees
15-826: Multimedia Databases and Data Mining
Automatic and Efficient Data Virtualization System on Scientific Datasets Li Weng.
Efficient Processing of Top-k Spatial Preference Queries
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

Rethinking Choices for Multi-dimensional Point Indexing You Jung Kim and Jignesh M. Patel University of Michigan

Outline Motivation Index structures Experimental evaluation Conclusion

Motivation Need for multi-dimensional point indexing in low to medium dimensional space  Inherent nature of problems  Use of dimensionality reduction techniques, e.g. PCA Examples  Spectral/image search (in feature space)  Similarity search in sequence and structure databases  Subsequence matching in time-series databases Frequent choice: R*-tree Is this the Right Choice?

Index Structures R* tree Data Partition Quadtree Balanced/Disjoint Space Partition Pyramid-Technique Unbalanced/Disjoint Space Partition Balanced Tree Unbalanced TreeBalanced Tree

Packed Quadtree Reduced disk footprint for the index Clustering sibling nodes Regular Quadtree Packed Quadtree

Experimental Setup Three indices and a file scan in SHORE Synthetic and real datasets  Uniformly distributed point data  MAPS Catalog data Query workload  Random and skewed queries following the underlying data distribution

Experiments with uniform data Uniform-2DUniform-4DUniform-8D Total execution time for varying data dimensionality

Experiments with skewed data MAPS-2D MAPS-4DMAPS-8D Total execution time for varying data dimensionality

Analysis with skewed data The (relative) poor performance of R*-tree  High overlap amongst MBRs  Skewed data points are spread under several non- leaf nodes The (relative) poor performance of Pyramid- Technique  The unbalanced space split is adversarial for skewed data

Quadtree Uses the buffer pool very efficiently Better spatial locality with skewed queries R*-tree Quadtree

Effect of packing in Quadtree MAPS-2D MAPS-4DMAPS-8D Total execution time of packed and unpacked Quadtree

Conclusion Quadtree outperforms R*-tree and Pyramid- Technique, especially for skewed (real) datasets Efficiency of the Quadtree comes from  Packing technique  Regular and disjoint partitioning  Better spatial locality and an efficient use of buffer Analytical cost model agrees with experimental results  i.e. our claims are not due to implementation differences, or dataset peculiarities

Questions?