A Simple and Efficient Algorithm for R-Tree Packing Scott T. Leutenegger, Mario A. Lopez, Jeffrey Edgington STR Sunho Cho Jeonghun Ahn 1.

Slides:

Advertisements

Similar presentations

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.

Advertisements

1 Spatial Join. 2 Papers to Present “Efficient Processing of Spatial Joins using R-trees”, T. Brinkhoff, H-P Kriegel and B. Seeger, Proc. SIGMOD, 1993.

Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.

Spatial and Temporal Data Mining V. Megalooikonomou Spatial Access Methods (SAMs) II (some slides are based on notes by C. Faloutsos)

Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.

I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.

Multidimensional Indexing

Nearest Neighbor Queries using R-trees Based on notes from G. Kollios.

2-dimensional indexing structure

Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.

Spatial Indexing for NN retrieval

Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

Accessing Spatial Data

Spatial Indexing SAMs.

Spatial Queries Nearest Neighbor and Join Queries.

I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.

Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1

1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.

Chapter 3: Data Storage and Access Methods

Spatial Indexing I Point Access Methods.

An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer.

Spatial Queries Nearest Neighbor Queries.

I/O-Algorithms Lars Arge Aarhus University March 6, 2007.

R-tree Analysis. R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?

R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.

Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.

Spatial Information Systems (SIS) COMP Spatial access methods: Indexing (part 2)

Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.

Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.

AAU A Trajectory Splitting Model for Efficient Spatio-Temporal Indexing Presented by YuQing Zhang  Slobodan Rasetic Jorg Sander James Elding Mario A.

Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,

Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.

SEMILARITY JOIN COP6731 Advanced Database Systems.

The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.

Indexing for Multidimensional Data An Introduction.

Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.

Lecture 2: External Memory Indexing Structures CS6931 Database Seminar.

Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.

Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.

On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:

9/2/2005VLDB 2005, Trondheim, Norway1 On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern.

Observer Relative Data Extraction Linas Bukauskas 3DVDM group Aalborg University, Denmark 2001.

Spatial Database 2/5/2011 Reference – Ramakrishna Gerhke and Silbershatz.

Lecture 3: External Memory Indexing Structures (Contd) CS6931 Database Seminar.

Performance Comparison of xBR-trees and R*-trees for Single Dataset Spatial Queries Performance Comparison of xBR-trees and R*-trees for Single Dataset.

Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.

R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.

Spatial Databases - Indexing

R-trees: An Average Case Analysis. R-trees - performance analysis How many disk (=node) accesses we ’ ll need for range nn spatial joins why does it matter?

R-T REES Accessing Spatial Data. I N THE BEGINNING … The B-Tree provided a foundation for R-Trees. But what’s a B-Tree? A data structure for storing sorted.

Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.

1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree ： An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.

CMPS 3130/6130 Computational Geometry Spring 2015

STR: A Simple and Efficient Algorithm for R-Tree Packing.

R* Tree By Rohan Sadale Akshay Kulkarni.  Motivation  Optimization criteria for R* Tree  High level Algorithm  Example  Performance Agenda.

Rethinking Choices for Multi-dimensional Point Indexing You Jung Kim and Jignesh M. Patel University of Michigan.

Spatial Queries Nearest Neighbor and Join Queries Most slides are based on slides provided By Prof. Christos Faloutsos (CMU) and Prof. Dimitris Papadias.

1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.

Spatial Data Management

Strategies for Spatial Joins

Spatial Queries Nearest Neighbor and Join Queries.

Tree-based Indexing Hessam Zakerzadeh.

Multidimensional Access Structures

Spatial Indexing I Point Access Methods.

Advanced Topics in Data Management

R-tree: Indexing Structure for Data in Multi-dimensional Space

Spatial Indexing I R-trees

R-trees: An Average Case Analysis

Donghui Zhang, Tian Xia Northeastern University

Presentation transcript:

A Simple and Efficient Algorithm for R-Tree Packing Scott T. Leutenegger, Mario A. Lopez, Jeffrey Edgington STR Sunho Cho Jeonghun Ahn 1

Overview  R Tree Packing  Packing Algorithm  Nearest –X  Hilbert Sort  Sort –Tile Recursive  Experimental Methodology Results  Synthetic  GIS  VLSI  CFD  Conclusions 2

Packing  R-Tree are dynamic structure : their contents can be modified without reconstructing the entire tree  Disadvantages of inserting one element at a time into a R-Tree :  High load time  Suboptimal space utilization  Poor R-Tree structure  Preprocessing advantageous for static data  Nearly 100% space utilization and improved query times 3

Basic Algorithm 1. Preprocess the data file so that the r rectangles are ordered in [r/b] consecutive groups of b rectangles, where each group of b is intended to be placed in the same leaf level node. 2. Load the [r/b] groups of rectangles into pages and output the (MBR, page-number) for each leaf level page into a temporary file. 3. Recursively pack these MBRs into nodes at the next level, proceeding upwards, until the root node is created. 4

R-Tree Packing Algorithms  Nearest X (NX)  Hilbert Sort (HS)  Sort-Tile-Recursive (STR) 5 Three algorithms differ only in how the rectangles are ordered at each level

Nearest-X  Rectangles are sorted by x-coordinate (center of the rectangle)  Rectangles are then ordered into groups of size b. 6

Hilbert Sort  Rectangles are ordered by using the Hilbert space filling curve (center point of the rectangles are sorted based on their distance from the origin, measured along the Hilbert Curve) 7

Sort-Tile-Recursive  Sort the rectangles by x-coordinate and partition them into S vertical slices.  A slice consists of a run of S×b rectangles.  Sort the rectangles of each slice by y-coordinate.  Pack them into nodes by grouping them in size of b. 8

Classes of Data  Synthetic  Uniformly distributed point and region data  Geographic Information System  Mildly skewed line segment data  VLSI  Highly Skewed in location and size region data  Computational Fluid Dynamics  Highly skewed, in terms of location, point data 9

Synthetic Data - Uniformly Distributed Data  Hilbert sort 42% more disk accesses than STR for both point and range query.  NX algorithm performs well as well as STR for point queries 10

GIS tiger data - Mildly skewed Data  HS algorithm requires up to 49% more disk accesses than STR for both point and region queries.  As region size increases, the difference between STR and HS becomes smaller. areas and perimeters Number of disk accesses as a function of query and buffer sizes 11

VLSI - Highly Skewed Data  For region data, HS performed 3% - 11% faster than STR for point queries and roughly the same for region queries. Number of disk accesses as a function of query and buffer sizes areas and perimeters 12

CFD - Highly Skewed Data  For point data, HS required % more disk access than STR for point queries, and roughly the same for region queries. CFD Data (51,510 nodes) areas and perimeters CFD dafa (52,510 nodes) disk accesses as a function of query and buffer sizes 13

Conclusions  All algorithms based on heuristics  None of them is best for all datasets  NX is not competitive  Decision of using HS or STR is dependent on the type of the dataset  Importance of choosing a packing algorithm is diminished as either the query size or the buffer size increase 14