Download presentation
Presentation is loading. Please wait.
Published byDella Andrews Modified over 9 years ago
1
A Rule-Based Optimizer for Spatial Join Operations Miguel Fornari João Luiz Comba Cirano Iochpe Instituto de Informática Universidade Federal do Rio Grande do Sul Porto Alegre - Brazil
2
Outline 1.Introduction and motivation 2.Spatial Hash Algorithms 3.The Validation System Architecture 4.The Rules 5.Conclusions
3
3 Introduction and Motivation The spatial join operation is fundamental and expensive in GIS Combines two sets of spatial features based on a spatial predicate DBMS, traditionally, improves the performance based on heuristic rules and cost expressions Spatial DBMS can include a specific module to spatial operations
4
4 Goal Reduce response time of the spatial join algorithms for the filter step A set of rules to optimize the performance of some well-known algorithms Which parameters are relevant? What is the best value for each important parameter?
5
5 SJ Algorithms ● According to the file organization
6
6 SJ algorithms For each algorithm, two cost expressions are important I/O cost CPU cost Some cost are already known, but not all All expressions written in a similar notation
7
7 The System Architecture The performance analysis, although correct, simplifies many cases Real cases are more complex Real data sets obtained in Internet
8
8 Plane-sweep algorithm All SJ algorithms load objects to memory and perform a sweep-plane algorithm to check if pairs of objects satisfy the spatial predicate. Traditional performance is O(k + n log n), where k is the number of object intersections But, O(c + n log n), where c is the number of performed comparisons, is more exact.
9
9 Plane-sweep algorithm Divide the space into strips Count the number of objects in each strip The size of strips is the average size of objects Estimation of c = Sum of all values
10
10 Rule 1 – Sweep-plane The DBMS can estimate c for each axle and choose the one with minor value of c, optimizing the plane-sweep. The shape of objects alters the response time
11
11 Synchronized Tree Transversal Well known algorithm for R-Trees The performance depends on height of R-Trees and average size of nodes. The space division reduces the number of object comparisons ( c). Available memory is not important.
12
12 Rule 2 - STT The STT algorithm is optimized defining nodes with a low number of entries. But, the total number of nodes will be greater, defining a minimum limit for the rule. Optimal value between 50-75
13
13 Rule 2 - STT The performance of STT algorithm is constant when the memory buffer size increases. Except for very values Set any value greater than 4*heigth of the R-Trees
14
14 Iterative Stripped SJ Iterative SJ (Jacox & Samet) + strips Strips divides the space and reduces c Transpassant objects can occur The sorting can be either internal or external The performance depends on the memory available, the number of strips, and replicas.
15
15 Rule 3 - ISSJ The ISSJ algorithm is optimized definining a great number of strips. The number of objects in each strip will be small, but the is limited by the adding of replicas.
16
16 Rule 3 - ISSJ It´s important allocate enough memory to perform an internal sorting of each set
17
17 Partition Based Spatial Method (PBSM) Calculates the number of partitions Uses a regular grid to divide the space Partitions can overflow The performance depends on the number of replicas and overflowed partitions The number of object comparisons (c) is reduced by the space subdivision
18
18 Rule 4 - PBSM PBSM is improved setting a high value for the number of partitions using a small size of memory or just set a lower bound to the number of partitions.
19
19 Rule 4 - PBSM This rule is limited by the number of replicas, that increase the number of processed objects.
20
20 Histogram Hash Stripped Join A histogram of object distribution guides the space partitioning to avoid overflow Replicas are counted into the histogram The objects are maintained in a hash file and are loaded to memory only once. The performance is affected by the number of replicas and the space subdivision
21
21 Rule 5 - HHSJ HHSJ is improved setting a large value for the number of partitions and for the number of strips.
22
22 Rule 5 - HHSJ This rule is limited, also, by the number of replicas, that increase the number of processed objects.
23
23 Conclusions & Future Work Our main contribution The use of rules can reduce the response time of individual algorithms, in some cases, more than 50%. The rules can be incorporated in real GDBMS Future work Use 3D sets to perform the tests Include other spatial operations Implement in PostGIS
24
24 Contact & questions Miguel Fornari fornari@ieee.org www.inf.ufrgs.br/~fornari João Comba comba@inf.ufrgs.br www.inf.ufrgs.br/~comba Cirano Iochpe ciochpe@inf.ufrgs.br www.inf.ufrgs.br/~ciochpe
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.