R-tree Analysis. R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?

Slides:



Advertisements
Similar presentations
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Advertisements

CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial and Temporal Data Mining V. Megalooikonomou Spatial Access Methods (SAMs) II (some slides are based on notes by C. Faloutsos)
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals - case studies - I C. Faloutsos.
Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
1 Top-k Spatial Joins
Fast Algorithms For Hierarchical Range Histogram Constructions
Indexing and Range Queries in Spatio-Temporal Databases
Danzhou Liu Ee-Peng Lim Wee-Keong Ng
CMU SCS : Multimedia Databases and Data Mining Lecture #6: Spatial Access Methods Part III: R-trees C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals - case studies Part III (regions, quadtrees, knn queries) C. Faloutsos.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 2) Efficient Processing of Spatial Joins Using R-trees Rollo Chan Chu Chung Man Mak Wai Yip Vivian Lee Eric.
2-dimensional indexing structure
Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Indexing for NN retrieval
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Spatio-Temporal Databases
Spatial Indexing SAMs.
Chapter 3: Data Storage and Access Methods
An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer.
R-tree Analysis. R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
Advanced Data Structures NTUA 2007 R-trees and Grid File.
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
CMU SCS : Multimedia Databases and Data Mining Lecture #6: Spatial Access Methods Part III: R-trees C. Faloutsos.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
AAU A Trajectory Splitting Model for Efficient Spatio-Temporal Indexing Presented by YuQing Zhang  Slobodan Rasetic Jorg Sander James Elding Mario A.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.
Handling Spatial Data In P2P Systems Verena Kantere, Timos Sellis, Yannis Kouvaras.
CMU SCS : Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos.
Advanced Data Structures NTUA 2007 R-trees and Grid File.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Efficient Processing of Top-k Spatial Preference Queries
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
R-trees: An Average Case Analysis. R-trees - performance analysis How many disk (=node) accesses we ’ ll need for range nn spatial joins why does it matter?
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
1 Complex Spatio-Temporal Pattern Queries Cahide Sen University of Minnesota.
The Power-Method: A Comprehensive Estimation Technique for Multi- Dimensional Queries Yufei Tao U. Hong Kong Christos Faloutsos CMU Dimitris Papadias Hong.
23 1 Christian Böhm 1, Florian Krebs 2, and Hans-Peter Kriegel 2 1 University for Health Informatics and Technology, Innsbruck 2 University of Munich Optimal.
STR: A Simple and Efficient Algorithm for R-Tree Packing.
R* Tree By Rohan Sadale Akshay Kulkarni.  Motivation  Optimization criteria for R* Tree  High level Algorithm  Example  Performance Agenda.
A Spatial Index Structure for High Dimensional Point Data Wei Wang, Jiong Yang, and Richard Muntz Data Mining Lab Department of Computer Science University.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
Rethinking Choices for Multi-dimensional Point Indexing You Jung Kim and Jignesh M. Patel University of Michigan.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Strategies for Spatial Joins
Spatial Indexing.
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Distributed Probabilistic Range-Aggregate Query on Uncertain Data
Spatial Indexing I R-trees
15-826: Multimedia Databases and Data Mining
R-trees: An Average Case Analysis
Efficient Processing of Top-k Spatial Preference Queries
Efficient Aggregation over Objects with Extent
C. Faloutsos Spatial Access Methods - R-trees
Presentation transcript:

R-tree Analysis

R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?

R-trees - performance analysis A: because we can design split etc algorithms accordingly; also, do query- optimization motivating question: on, e.g., split, should we try to minimize the area (volume)? the perimeter? the overlap? or a weighted combination? why?

R-trees - performance analysis How many disk accesses for range queries? query distribution wrt location? “ “ wrt size?

R-trees - performance analysis How many disk accesses for range queries? query distribution wrt location? uniform; (biased) “ “ wrt size? uniform

R-trees - performance analysis easier case: we know the positions of parent MBRs, eg:

R-trees - performance analysis How many times will P1 be retrieved (unif. queries)? P1 x1 x2

R-trees - performance analysis How many times will P1 be retrieved (unif. POINT queries)? P1 x1 x

R-trees - performance analysis How many times will P1 be retrieved (unif. POINT queries)? A: x1*x2 P1 x1 x

R-trees - performance analysis How many times will P1 be retrieved (unif. queries of size q1xq2)? P1 x1 x q1 q2

R-trees - performance analysis Minkowski sum q1 q2 q1/2 q2/2

R-trees - performance analysis How many times will P1 be retrieved (unif. queries of size q1xq2)? A: (x1+q1)*(x2+q2) P1 x1 x q1 q2

R-trees - performance analysis Thus, given a tree with n nodes (i=1,... n) we expect

R-trees - performance analysis Thus, given a tree with n nodes (i=1,... n) we expect ‘volume’ ‘surface area’ count

R-trees - performance analysis Observations: for point queries: only volume matters for horizontal-line queries: (q2=0): vertical length matters for large queries (q1, q2 >> 0): the count N matters

R-trees - performance analysis Observations (cont’ed) overlap: does not seem to matter formula: easily extendible to n dimensions (for even more details: [Pagel +, PODS93], [Kamel+, CIKM93])

R-trees - performance analysis Conclusions: splits should try to minimize area and perimeter ie., we want few, small, square-like parent MBRs rule of thumb: shoot for queries with q1=q2 = 0.1 (or =0.05 or so).

R-trees - performance analysis Range queries - how many disk accesses, if we just now that we have - N points in n-d space? A: ?

R-trees - performance analysis Range queries - how many disk accesses, if we just now that we have - N points in n-d space? A: can not tell! need to know distribution

R-trees - performance analysis What are obvious and/or realistic distributions?

R-trees - performance analysis What are obvious and/or realistic distributions? A: uniform A: Gaussian / mixture of Gaussians A: self-similar / fractal. Fractal dimension ~ intrinsic dimension

R-trees - performance analysis Formulas for range queries and k-nn queries: use fractal dimension [Kamel+, PODS94], [Korn+ ICDE2000] [Kriegel+, PODS97]

R-trees–performance analysis Assuming Uniform distribution: where And D is the density of the dataset, f the fanout [TS96], N the number of objects

Project Deadlines Phase 1 : Proposal Oct 11, 2002 Phase 2 : Progress Report Nov 11, 2002 Phase 3: Final Report Dec 10, 2002