Download presentation
Presentation is loading. Please wait.
Published byRidwan Kusumo Modified over 6 years ago
1
Efficient Cost Models for Spatial Queries Using R-Trees
Reference: Y Theodoridis, E Stefanakis, T Sellis, Efficient cost models for spatial queries using R-trees, IEEE Transactions on Knowledge and Data Engineering , 2000 Speaker: Kai-Yun Ho 2019/2/25 MCSE LAB
2
Outline Introduction Background
Analytical Cost Models for Spatial Queries Selection Oueries Join Oueries Introducing a Path Buffer Evaluation of the Cost Models Conclusions 2019/2/25 MCSE LAB
3
Introduction Spatial queries addressed by users of SDBMS usually involve selection (point or range) and join operations. We present analytical models that estimate the cost of selection and join queries using R-tree-based structures. 2019/2/25 MCSE LAB
4
Background (1/2) The processing of any type of spatial query can be accelerated when a spatial index exists. Selection queries Search all data rectangles that overlap the query window q. Join queries Search all pairs of rectangles that overlap each other. 2019/2/25 MCSE LAB
5
Background (2/2) For both operations, the total cost is measured by the total amount of page accesses in the R-tree index. By definition, the number of node accesses is always greater than or equal to the number of actual disk accesses. The equality only holds for the case where no buffering scheme exists. 2019/2/25 MCSE LAB
6
Background : R-tree r A C B r A D D G E B C E F G F 2019/2/25 MCSE LAB
7
Analytical Cost Models for Spatial Queries
What is sought? A formula that estimates the average number NA of node accesses using only knowledge about data properties. 2019/2/25 MCSE LAB
8
For Selection Oueries (1/2)
The number of nodes at level l intersected by the query window q 2019/2/25 MCSE LAB
9
For Selection Oueries (2/2)
A function of the data properties NR1 and DR1 …..…. … f0 f1 fh-1 + 2019/2/25 MCSE LAB
10
For Join Oueries 1 - £ + h l and where
For the upper levels of two R-tree : ( ) 1 2 - + R h l and where 因為對於兩顆tree來說,access的node個數是一樣的,所以NA(R1,R2,l1)=NA(R2,R1,i2) 2019/2/25 MCSE LAB
11
Introducing a Path Buffer (1/2)
The existence of such a buffer mainly affects the performance of the tree index that plays the role of the query set. 2019/2/25 MCSE LAB
12
Disk Access (DA) data set query set 2019/2/25 MCSE LAB
13
Introducing a Path Buffer (2/2)
“query” tree R2 and “data” tree R1 : the propagation of R1 down to leaf adds no extra cost (disk accesses) to R2 that has already reached its leaf level. : each propagation of R2 down to its lower levels adds equal cost to R1 2019/2/25 MCSE LAB
14
Evaluation of the Cost Models
synthetic and real data sets LBeach data set : consisting of 53,143 line segments (stored as rectangles) indicating roads of Long Beach, California. MGcounty data set : consisting of 39,221 line segments (stored as rectangles)indicating roads of Montgomery County, Maryland. synthetic random synthetic skewed LBeach real data MGcounty real data 2019/2/25 MCSE LAB
15
Evaluation of the Cost Models
For selection queries on synthetic random data set Density D=0.1 2019/2/25 MCSE LAB
16
Evaluation of the Cost Models
For join queries on synthetic random data set node and disk accesses Density D=2 Density D=1 2019/2/25 MCSE LAB
17
Evaluation of the Cost Models
Performance comparison for selection queries on (a) skewed and (b) real data. (a) (b) 2019/2/25 MCSE LAB
18
Evaluation of the Cost Models
For join queries on (a) skewed and (b) real data. (a) (b) 2019/2/25 MCSE LAB
19
Conclusions For query optimization purposes, efficient cost models should be also available in order to make accurate cost estimations under various data distributions. The proposed cost formulae are functions of data properties only, and, therefore, can be used without any knowledge of the R-tree index properties. Experimental results on both synthetic and real data sets showed that the proposed analytical model is very accurate. 2019/2/25 MCSE LAB
20
Q & A 2019/2/25 MCSE LAB
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.