Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza.

Slides:



Advertisements
Similar presentations
C) between 18 and 27. D) between 27 and 50.
Advertisements

You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advanced Piloting Cruise Plot.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
STATISTICS Sampling and Sampling Distributions
STATISTICS HYPOTHESES TEST (I)
Effective Change Detection Using Sampling Junghoo John Cho Alexandros Ntoulas UCLA.
Chapter 1 Image Slides Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Describing Data: Measures of Dispersion
Summary of Convergence Tests for Series and Solved Problems
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Measurements and Their Uncertainty 3.1
Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.
Jeopardy Q $100 Q $100 Q $100 Q $100 Q $100 Q $200 Q $200 Q $200
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 10 second questions Numbers and number system Numbers and the number system, fractions, decimals, proportion & probability.
Year 6 mental test 10 second questions
Query optimisation.
Chapter 6 File Systems 6.1 Files 6.2 Directories
Polygon Scan Conversion – 11b
Data Visualization Lecture 4 Two Dimensional Scalar Visualization
Visualization Techniques -
Chapter 7 Sampling and Sampling Distributions
Coherent and Electro-Optics Research Group (CEORG)
Solve Multi-step Equations
Break Time Remaining 10:00.
The basics for simulations
Network, Local, and Portable Storage Media Computer Literacy for Education Majors.
ABC Technology Project
Hash Tables.
Reconstruction from Voxels (GATE-540)
Online Algorithm Huaping Wang Apr.21
Cache and Virtual Memory Replacement Algorithms
5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
VOORBLAD.
15. Oktober Oktober Oktober 2012.
Re-Order Point Problems Set 1: General
Chapter 6 File Systems 6.1 Files 6.2 Directories
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
Chapter 4 Inference About Process Quality
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Chapter 5 Microsoft Excel 2007 Window
1..
© 2012 National Heart Foundation of Australia. Slide 2.
LO: Count up to 100 objects by grouping them and counting in 5s 10s and 2s. Mrs Criddle: Westfield Middle School.
Artificial Intelligence
Lecture 4 vector data analysis. 2014年10月11日 2014年10月11日 2014年10月11日 2 Introduction Based on the objects,such as point,line and polygon Based on the objects,such.
Before Between After.
Addition 1’s to 20.
25 seconds left…...
Subtraction: Adding UP
Equal or Not. Equal or Not
Week 1.
Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining 2010/8/25.
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Clock will move after 1 minute
PSSA Preparation.
Other Dynamic Programming Problems
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
Select a time to count down from the clock above
A Non-Blocking Join Achieving Higher Early Result Rate with Statistical Guarantees Shimin Chen* Phillip B. Gibbons* Suman Nath + *Intel Labs Pittsburgh.
Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University.
Presentation transcript:

Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza Coppe – Graduate School of Engineering Institute of Mathematics – Computer Science Department Federal University of Rio de Janeiro

Approximate Spatial Query Processing Using Raster Signatures 2 Common Spatial Queries Area of polygon Area of polygon Area of polygon within window Area of polygon within window Spatial Joins Spatial Joins polygon polygon, polygon polyline & polyline polyline polygon polygon, polygon polyline & polyline polyline Distance Distance Buffer Buffer Perimeter Perimeter Topological queries Topological queries

Approximate Spatial Query Processing Using Raster Signatures 3 Common Spatial Queries Approximate Area of polygon Approximate Area of polygon Approximate Area of polygon within window Approximate Area of polygon within window Approximate Spatial Joins Approximate Spatial Joins polygon polygon, polygon polyline & polyline polyline polygon polygon, polygon polyline & polyline polyline Approximate Distance Approximate Distance Approximate Buffer Approximate Buffer Approximate Perimeter Approximate Perimeter Approximate Topological queries Approximate Topological queries

Approximate Spatial Query Processing Using Raster Signatures 4 Approximate Answers to Spatial Queries What is an approximate answer? What is an approximate answer? If the exact result is a number, the approximate result will be a number and a confidence interval If the exact result is a number, the approximate result will be a number and a confidence interval If not, the graphical display of approximate answers is something like a fuzzy map If not, the graphical display of approximate answers is something like a fuzzy map

Approximate Spatial Query Processing Using Raster Signatures 5 The increase of storage capacity The increase of storage capacity The decrease of hardware costs The decrease of hardware costs Disk access time is still high Disk access time is still high Complex queries Complex queries Data stored in devices that are not on- line. Data stored in devices that are not on- line. A query may take minutes or hours to be processed. Motivation

Approximate Spatial Query Processing Using Raster Signatures 6 Motivation Approximate answer may be enough Approximate answer may be enough exact answers are itself approximations exact answers are itself approximations Approximate answers can be computed quickly Approximate answers can be computed quickly Spatial query processing: Spatial query processing: Scale Scale Quality Quality Round-off errors Round-off errors

Approximate Spatial Query Processing Using Raster Signatures 7 Decision Support System Decision Support System Increasing business competitiveness Increasing business competitiveness More use of accumulated data More use of accumulated data Data mining Data mining During drill down query sequence in ad-hoc data mining During drill down query sequence in ad-hoc data mining Earlier queries in a sequence can be used to find out the interesting queries. Earlier queries in a sequence can be used to find out the interesting queries. Data warehouse Data warehouse Performance and scalability when accessing very large volumes of data during the analysis process. Performance and scalability when accessing very large volumes of data during the analysis process. Scenarios and Applications

Approximate Spatial Query Processing Using Raster Signatures 8 Query optimization Query optimization To define the most efficient access plan for a given query To define the most efficient access plan for a given query Distributed data recording and warehousing environments Distributed data recording and warehousing environments Data may be remote, and even may be unavailable Data may be remote, and even may be unavailable Old data can be disposed in order to make room for new ones. Therefore it becomes impossible to answer to queries on deleted information. Old data can be disposed in order to make room for new ones. Therefore it becomes impossible to answer to queries on deleted information. Scenarios and Applications

Approximate Spatial Query Processing Using Raster Signatures 9 Mobile computing Mobile computing An approximate answer may be an alternative: An approximate answer may be an alternative: When the data is not availableWhen the data is not available To save storage spaceTo save storage space Scenarios and Applications

Approximate Spatial Query Processing Using Raster Signatures 10 Data environment set-up for providing approximate answers New data Queries Responses Approx. Query Engine A framework for approximate query processing Database

Approximate Spatial Query Processing Using Raster Signatures 11 Four Color Raster Signature (4CRS) Raster approximation (VLDB98) Raster approximation (VLDB98) Object representation upon a grid of cells. Object representation upon a grid of cells. Each cell stores relevant information using few bits. Each cell stores relevant information using few bits. Grid resolution can be changed Grid resolution can be changed Precision storage requirementsPrecision storage requirements 4 types of cells 4 types of cells Bit value Cell type Description 00Empty The cell is not intersected by the polygon 01Weak The cell contains an intersection of 50% or less with the polygon 10Strong The cell contains an intersection of more than 50% with the polygon and less than 100% 11Full The cell is fully occupied by the polygon

24th VLDB Conference New York, USA, Polygon 4CRS 4CRS Approximation Construction of Signatures

Approximate Spatial Query Processing Using Raster Signatures 13 Polygon approximate area The algorithm is based on the sum of the expected area of each cell grid The algorithm is based on the sum of the expected area of each cell grid Empty cells: 0% Empty cells: 0% Full cells: 100% Full cells: 100% Weak and Strong cells supposing uniform distribution Weak and Strong cells supposing uniform distribution Weak cells: (0, 0.5] interval mean 0.25Weak cells: (0, 0.5] interval mean 0.25 Strong cells: (0.5, 1) interval mean 0.75Strong cells: (0.5, 1) interval mean 0.75 Count the number of each cell type in the polygons 4CRS, and multiply these values by the presumed cell area. Count the number of each cell type in the polygons 4CRS, and multiply these values by the presumed cell area.

Approximate Spatial Query Processing Using Raster Signatures 14 A measure of answer accuracy A measure of answer accuracy The polygon area inside weak or strong cell is assumed to be uniformly distributed. The polygon area inside weak or strong cell is assumed to be uniformly distributed. Weak cells Weak cells Strong cells Strong cells Using Central Limit Theorem confidence interval Using Central Limit Theorem confidence interval 95% 95% 99% 99% Confidence interval

Approximate Spatial Query Processing Using Raster Signatures 15 Confidence interval (example) Query results Query results # weak cells: 100 # weak cells: 100 # strong cells: 120 # strong cells: 120 # full cells: 400 # full cells: 400 Confidence interval: 95% Confidence interval: 95% Weak cells: Weak cells: Strong cells: Strong cells: Full cells: 400 (full cells have the exact area!) Full cells: 400 (full cells have the exact area!) Total: Total: Error between -1.15% and 1.15% Error between -1.15% and 1.15%

Approximate Spatial Query Processing Using Raster Signatures 16 Cell Area Distribution WeakStrong Comparable to an uniform distribution Variance: (U: ) Mean: (U: 0.25)

Approximate Spatial Query Processing Using Raster Signatures 17 Example # empty cells: 55 # empty cells: 55 # weak cells: 27 # weak cells: 27 # strong cells: 26 # strong cells: 26 # full cells: 79 # full cells: 79 Approximate area: ( Σ weak * Σ strong * Σ full ) * cellArea Approximate area: ( Σ weak * Σ strong * Σ full ) * cellArea Exact area: Exact area: Appr. area: Appr. area: Error: 1.07% Error: 1.07%

Approximate Spatial Query Processing Using Raster Signatures 18 This algorithm is similar to the approximate polygon area algorithm This algorithm is similar to the approximate polygon area algorithm There are two kinds of cell overlap: There are two kinds of cell overlap: The cell may be completely contained by the window The cell may be completely contained by the window The cell may be partially contained by the window The cell may be partially contained by the window proportional to its overlapping areaproportional to its overlapping area Approximate area of polygon window intersection

Approximate Spatial Query Processing Using Raster Signatures 19 Experimental tests Computer: PC Pentium IV 1,8 GHz, 512 MB RAM Computer: PC Pentium IV 1,8 GHz, 512 MB RAM Page size 2,048 Bytes Page size 2,048 Bytes Target: to evaluate the use of 4CRS for approximate query processing against exact query processing related to the following aspects: Target: to evaluate the use of 4CRS for approximate query processing against exact query processing related to the following aspects: Response time Response time Storage requirements Storage requirements Accuracy Accuracy The algorithms tested were : The algorithms tested were : Polygon approximate area Polygon approximate area Approximate area of polygon x window intersection Approximate area of polygon x window intersection 100 random windows for each data set (different sizes and positions)100 random windows for each data set (different sizes and positions)

Approximate Spatial Query Processing Using Raster Signatures 20 Use of R*-trees in order to reduce the search space. Use of R*-trees in order to reduce the search space. Relation ARelation B SAMs Candidate pairs Exact geometry processor $ Response set Step 1 Step 2 Relation ARelation B SAMs Candidate pairs Approximate query processing Response set Step 1 Step 2 4CRS Experimental tests

Approximate Spatial Query Processing Using Raster Signatures 21 The polygon real data sets used in the experiments consist of township boundaries, census block-group, topography, geologic map and hydrographic map from Iowa (USA), and Brazilian municipalities. The polygon real data sets used in the experiments consist of township boundaries, census block-group, topography, geologic map and hydrographic map from Iowa (USA), and Brazilian municipalities. Experimental tests

Approximate Spatial Query Processing Using Raster Signatures 22 Approximate polygon area

Approximate Spatial Query Processing Using Raster Signatures 23 Approximate polygon area

Approximate Spatial Query Processing Using Raster Signatures 24 Approximate polygon window area

Approximate Spatial Query Processing Using Raster Signatures 25 Approximate polygon window area

Approximate Spatial Query Processing Using Raster Signatures 26 Conclusion The experimental results demonstrated the efficiency of the 4CRS use for approximate query processing. The experimental results demonstrated the efficiency of the 4CRS use for approximate query processing. Storage requirements Storage requirements 4CRS has an average of 3.75% of the real data set size4CRS has an average of 3.75% of the real data set size Accuracy Accuracy Approximate area: average error of 2.62%Approximate area: average error of 2.62% Window query approximate area: average error of 1%Window query approximate area: average error of 1% Response time Response time Approximate area: average 28.41%Approximate area: average 28.41% Window query approximate area: average 7.22%Window query approximate area: average 7.22% Disk access Disk access Approximate area: average 1.90%Approximate area: average 1.90% Window query approximate area: average 7.04%Window query approximate area: average 7.04%

Approximate Spatial Query Processing Using Raster Signatures 27 Future works Algorithms for the other operations Algorithms for the other operations Approximate area of polygon x polygon intersection algorithm is being evaluated Approximate area of polygon x polygon intersection algorithm is being evaluated Use of approximations for mobile computing Use of approximations for mobile computing