Introduction to Spatial Databases

Slides:



Advertisements
Similar presentations
The Optimal-Location Query
Advertisements

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Efficient Evaluation of k-Range Nearest Neighbor Queries in Road Networks Jie BaoChi-Yin ChowMohamed F. Mokbel Department of Computer Science and Engineering.
Indexing and Range Queries in Spatio-Temporal Databases
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Searching on Multi-Dimensional Data
Nearest Neighbor Queries using R-trees Based on notes from G. Kollios.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
Efficient Reverse k-Nearest Neighbors Retrieval with Local kNN-Distance Estimation Mike Lin.
Continuous Intersection Joins Over Moving Objects Rui Zhang University of Melbourne Dan Lin Purdue University Kotagiri Ramamohanarao University of Melbourne.
Answering Metric Skyline Queries by PM-tree Tomáš Skopal, Jakub Lokoč Department of Software Engineering, FMP, Charles University in Prague.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
2-dimensional indexing structure
Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.
Spatio-Temporal Databases
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Spatial Queries Nearest Neighbor and Join Queries.
Top-k and Skyline Computation in Database Systems
Spatial Queries Nearest Neighbor Queries.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Efficient Processing of Top-k Spatial Preference Queries
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:
9/2/2005VLDB 2005, Trondheim, Norway1 On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern.
Exact indexing of Dynamic Time Warping
Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.
Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
Spatio-Temporal Databases. Term Project Groups of 2 students You can take a look on some project ideas from here:
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University.
Dense-Region Based Compact Data Cube
Advanced Database Aggregation Query Processing
Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)
Spatial Data Management
Tian Xia and Donghui Zhang Northeastern University
Spatial Queries Nearest Neighbor and Join Queries.
Spatio-Temporal Databases
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Progressive Computation of The Min-Dist Optimal-Location Query
KD Tree A binary search tree where every node is a
Nearest Neighbor Queries using R-trees
Preference Query Evaluation Over Expensive Attributes
Spatio-temporal Pattern Queries
Spatial Online Sampling and Aggregation
Chapter 4: Probabilistic Query Answering (2)
Spatio-Temporal Databases
Efficient Evaluation of k-NN Queries Using Spatial Mashups
Finding Fastest Paths on A Road Network with Speed Patterns
Probabilistic Data Management
Spatial Indexing I R-trees
The Skyline Query in Databases Which Objects are the Most Important?
Efficient Processing of Top-k Spatial Preference Queries
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Donghui Zhang, Tian Xia Northeastern University
Efficient Aggregation over Objects with Extent
Presentation transcript:

Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University

What is spatial database? A database system that is optimized to store and query spatial objects: Point: a hotel, a car Line: a road segment Polygon: landmarks, layout of VLSI Road Network Satellite Image VLSI Layout

Are spatial databases useful? Geographical Information Systems e.g. data: road network and places of interest. e.g. usage: driving directions, emergency calls, standalone applications. Environmental Systems e.g. data: land cover, climate, rainfall, and forest fire. e.g. usage: find total rainfall precipitation. Corporate Decision-Support Systems e.g. data: store locations and customer locations. e.g. usage: determine the optimal location for a new store. Battlefield Soldier Monitoring Systems e.g. data: locations of soldiers (w/wo medical equipments). e.g. usage: monitor soldiers that may need help from each one with medical equipment.

Shortest-Path Query Fastest-Path Query MapQuest.com

Driving directions as you go. Find nearest Wal-Mart or hospital. NN Query

Range query ArcGIS 9.2, ESRI

Are spatial databases useful? Geographical Information Systems e.g. data: road network and places of interest. e.g. usage: driving directions, emergency calls, standalone applications. Environmental Systems e.g. data: land cover, climate, rainfall, and forest fire. e.g. usage: find total rainfall precipitation. Corporate Decision-Support Systems e.g. data: store locations and customer locations. e.g. usage: determine the optimal location for a new store. Battlefield Soldier Monitoring Systems e.g. data: locations of soldiers (w/wo medical equipments). e.g. usage: monitor soldiers that may need help from each one with medical equipment.

Aggregation query

Are spatial databases useful? Geographical Information Systems e.g. data: road network and places of interest. e.g. usage: driving directions, emergency calls, standalone applications. Environmental Systems e.g. data: land cover, climate, rainfall, and forest fire. e.g. usage: find total rainfall precipitation. Corporate Decision-Support Systems e.g. data: store locations and customer locations. e.g. usage: determine the optimal location for a new store. Battlefield Soldier Monitoring Systems e.g. data: locations of soldiers (w/wo medical equipments). e.g. usage: monitor soldiers that may need help from each one with medical equipment.

Optimal Location query

Are spatial databases useful? Geographical Information Systems e.g. data: road network and places of interest. e.g. usage: driving directions, emergency calls, standalone applications. Environmental Systems e.g. data: land cover, climate, rainfall, and forest fire. e.g. usage: find total rainfall precipitation. Corporate Decision-Support Systems e.g. data: store locations and customer locations. e.g. usage: determine the optimal location for a new store. Battlefield Soldier Monitoring Systems e.g. data: locations of soldiers (w/wo medical equipments). e.g. usage: monitor soldiers that may need help from each one with medical equipment.

George NN(Bob) = George John Bob Bill Mike

Who will seek help from me? George RNN(Bob) = {John, Mike} John Bob Bill Mike RNN query

And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? i.e. not “dominated” by any other player. Skyline query Name Points Rebounds Assists Steals …… Tracy McGrady 2003 484 448 135 Kobe Bryant 1819 392 398 86 Shaquille O'Neal 1669 760 200 36 Yao Ming 1465 669 61 34 Dwyane Wade 1854 397 520 121 Steve Nash 1165 249 861 74 * www.databaseBasketball.com

And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? i.e. not “dominated” by any other player. Skyline query Name Points Rebounds Assists Steals …… Tracy McGrady 2003 484 448 135 Kobe Bryant 1819 392 398 86 Shaquille O'Neal 1669 760 200 36 Yao Ming 1465 669 61 34 Dwyane Wade 1854 397 520 121 Steve Nash 1165 249 861 74 * www.databaseBasketball.com

And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? i.e. not “dominated” by any other player. Skyline query Name Points Rebounds Assists Steals …… Tracy McGrady 2003 484 448 135 Kobe Bryant 1819 392 398 86 Shaquille O'Neal 1669 760 200 36 Yao Ming 1465 669 61 34 Dwyane Wade 1854 397 520 121 Steve Nash 1165 249 861 74 * www.databaseBasketball.com

And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? i.e. not “dominated” by any other player. Skyline query Name Points Rebounds Assists Steals …… Tracy McGrady 2003 484 448 135 Kobe Bryant 1819 392 398 86 Shaquille O'Neal 1669 760 200 36 Yao Ming 1465 669 61 34 Dwyane Wade 1854 397 520 121 Steve Nash 1165 249 861 74 * www.databaseBasketball.com

And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? i.e. not “dominated” by any other player. Skyline query Name Points Rebounds Assists Steals …… Tracy McGrady 2003 484 448 135 Kobe Bryant 1819 392 398 86 Shaquille O'Neal 1669 760 200 36 Yao Ming 1465 669 61 34 Dwyane Wade 1854 397 520 121 Steve Nash 1165 249 861 74 * www.databaseBasketball.com

And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? i.e. not “dominated” by any other player. Skyline query Name Points Rebounds Assists Steals …… Tracy McGrady 2003 484 448 135 Kobe Bryant 1819 392 398 86 Shaquille O'Neal 1669 760 200 36 Yao Ming 1465 669 61 34 Dwyane Wade 1854 397 520 121 Steve Nash 1165 249 861 74 * www.databaseBasketball.com

Research goals in spatial databases Support spatial database queries efficiently! range query, aggregation query, NN query, RNN query, optimal-location query, fastest-path query, skyline query, … Which statement is the best in a large spatial database? (a) Both an O(n2) algorithm and an O(n) algorithm are efficient. (b) An O(n2) algorithm is not efficient, but an O(n) algorithm is. (c) Neither an O(n2) algorithm nor an O(n) algorithm is efficient. Answer: (c)! Even a linear algorithm is not efficient!

Research goals in spatial databases Example of a linear algorithm: to find my nearest Wal-mart, compare my location with all Wal-marts in the world. Example of a quadratic algorithm: to find the skyline of NBA players, compare every player against all other players (to see if it is dominated). Sample scenario: Disk page size: 8KB. Database size: 1GB = 131,072 disk page. Let each disk I/O be 10-3 second. O(n): 131 seconds  2 minutes. (Not efficient!) O(n2):  200 days! (Out of the question!)

How can you do better than O(n)? Answer: use (disk-based) index structures! However, 1-dim index structures, e.g. the B+-tree, are not efficient. E.g. to search for hotels in Boston…

A 1-dim index is not good enough Suppose a B+-tree exists on X.

A 1-dim index is not good enough Suppose a B+-tree exists on X.

Content The R-tree NN Query Skyline Query Highlights of Our Research Range Query Aggregation Query NN Query Skyline Query Highlights of Our Research

R-Tree Motivation Range query: find the objects in a given range. y axis 10 m g h l 8 k f e 6 i j d 4 b a 2 c x axis 2 4 6 8 10 Range query: find the objects in a given range. E.g. find all hotels in Boston. No index: scan through all objects. NOT EFFICIENT!

R-Tree: Clustering by Proximity

R-Tree

R-Tree

Range Query E E Root E E 1 2 E E E E 1 E E 3 4 5 6 7 E 2 a b c d e f g y axis 10 m g h l 8 k f e E 6 2 i j E d 1 4 b a 2 c x axis 2 4 6 8 10 Root E E 1 2 E E E E 1 E E 3 4 5 6 7 E 2 a b c d e f g h i j k l m E E E E E 3 4 5 6 7

Range Query E E Root E E 1 2 E E E E 1 E E 3 4 5 6 7 E 2 a b c d e f g y axis 10 m g h l 8 k f e E 6 2 i j E d 1 4 b a 2 c x axis 2 4 6 8 10 Root E E 1 2 E E E E 1 E E 3 4 5 6 7 E 2 a b c d e f g h i j k l m E E E E E 3 4 5 6 7

Aggregation Query Given a range, find some aggregate value of objects in this range. COUNT, SUM, AVG, MIN, MAX E.g. find the total number of hotels in Massachusetts. Straightforward approach: reduce to a range query. Better approach: along with each index entry, store aggregate of the sub-tree.

Aggregation Query E E Root E :8 E :5 1 2 E E :3 E :2 E :3 1 E :3 E :2 y axis 10 m g h l 8 k f e E 6 2 i j E d 1 4 b a 2 c x axis 2 4 6 8 10 Root E :8 E :5 1 2 E E :3 E :2 E :3 1 E :3 E :2 3 4 5 6 7 E 2 a b c d e f g h i j k l m E E E E E 3 4 5 6 7

Aggregation Query Subtree pruned! E E Root E :8 E :5 1 2 E E :3 E :2 y axis 10 m g h l 8 k f e E 6 2 i j E d 1 4 Subtree pruned! b a 2 c x axis 2 4 6 8 10 Root E :8 E :5 1 2 E E :3 E :2 E :3 1 E :3 E :2 3 4 5 6 7 E 2 a b c d e f g h i j k l m E E E E E 3 4 5 6 7

Content The R-tree NN Query Skyline Query Highlights of Our Research Range Query Aggregation Query NN Query Skyline Query Highlights of Our Research

Nearest Neighbor (NN) Query Given a query location q, find the nearest object. E.g.: given a hotel, find its nearest bar. a q

A Useful Metric: MINDIST Minimum distance between q and an MBR. It is an lower bound of d(o, q) for every object o in E1. E1 MINDIST(q, E1) q

NN Basic Algorithm Keep a heap H of index entries and objects, ordered by MINDIST. Initially, H contains the root. While H   Extract the element with minimum MINDIST If it is an index entry, insert its children into H. If it is an object, return it as NN. End while E1 q

NN Query Example E E E E E E query E y axis Action Heap Visit Root 10 E m E 2 7 g h Action Heap l 8 E 6 Visit Root E k E 1 1 f E 2 2 e 5 6 E i j 4 E d 1 query 4 E b 3 a 2 c x axis 2 4 6 8 10 Root E E 1 2 1 2 E E E E E E 1 3 4 5 6 7 E 9 5 13 2 5 2 a b c i j d e k f g h l m 2 10 13 E E E E E 3 4 5 6 7

NN Query Example E E E E E E query E y axis Action Heap Visit Root 10 E m E 2 7 g h Action Heap l 8 E 6 Visit Root E k E 1 1 2 f E 2 e 5 follow E E E E E 6 1 E 2 2 3 5 5 5 4 9 i j 4 E d 1 query 4 E b 3 a 2 c x axis 2 4 6 8 10 Root E E 1 2 1 2 E E E E E E 1 3 4 5 6 7 E 9 5 2 5 2 13 a b c i j d e f g k h l m 2 10 13 E E E E E 3 4 5 6 7

NN Query Example E E E E E E query E y axis Action Heap Visit Root 10 E m E 2 7 g h Action Heap l 8 E 6 Visit Root E k E 1 1 2 2 f E e 5 follow E E E E E 6 1 2 3 5 5 5 E 4 9 i j 2 4 E E d 1 follow query E E E E E 2 6 2 3 5 5 5 4 9 7 13 4 E b 3 a 2 c x axis 2 4 6 8 10 Root E E 1 2 1 2 E E E E E E 1 3 4 5 6 7 E 5 9 5 2 13 2 a b c i j d e k f g h l m 2 10 13 E E E E E 3 4 5 6 7

NN Query Example E E E E E E query E y axis Action Heap Visit Root 10 E m E 2 7 g h Action Heap l 8 E 6 Visit Root E k E f 1 1 E 2 2 e 5 follow E E E E E 6 1 2 3 5 5 5 4 9 E i j 2 4 E E 1 follow query E E E E E d 2 5 5 6 2 3 5 4 9 7 13 4 E follow E 3 i E E 6 E j E k b a 2 3 5 5 5 4 9 10 7 13 13 2 c x axis 2 4 6 8 10 Root E E 1 2 1 2 E E E E E E 1 3 4 5 6 7 E 9 5 13 2 5 2 a b c i j d e f g k h l m 2 10 13 E E E E E 3 4 5 6 7

NN Query Example E E E E E E query E y axis Action Heap Visit Root 10 E m E 2 7 g h Action Heap l 8 E 6 Visit Root E k E f 1 1 E 2 2 e 5 follow E 6 1 E E E E 5 E 2 2 3 5 5 4 9 i j 4 E follow E 1 query E E E E E d 2 6 2 3 5 5 5 4 9 7 13 4 E follow E i E E E j E k b 3 a 6 2 3 5 5 5 4 9 10 7 13 13 2 Report i and terminate c x axis 2 4 6 8 10 Root E E 1 2 1 2 E E E E E E 1 3 4 5 6 7 E 9 5 13 2 5 2 a b c i j d e f g k h l m 2 10 13 E E E E E 3 4 5 6 7

Content The R-tree NN Query Skyline Query Highlights of Our Research Range Query Aggregation Query NN Query Skyline Query Highlights of Our Research

Skyline of Manhattan Which buildings can we see? not dominated (further away and shorter)

A skyline example: best hotels Which one is better? i or h? (i, because its price and distance dominate those of h) i or k?

A skyline example: best hotels The skyline: a, i, k.

Branched and Bound Skyline (BBS) Assume all points are indexed in an R-tree. mindist(MBR) = the L1 distance between its lower-left corner and the origin.

Branched and Bound Skyline (BBS) Each heap entry keeps the mindist of the MBR.

Example of BBS Process entries in ascending order of their mindists.

Example of BBS

Example of BBS

Example of BBS

Example of BBS

Example of BBS

Content The R-tree NN Query Skyline Query Highlights of Our Research Range Query Aggregation Query NN Query Skyline Query Highlights of Our Research

The Compressed Skycube [SIGMOD’06] Goal: support skyline queries for an arbitrary subset of dimensions. Pre-computing all skylines: too much space expensive update The Compressed Skycube is a very compact representation of all skylines, with efficient query and update support.

The Optimal-Location Query [SSTD’05, VLDB’06] The optimal location, of a potential new store, can be defined as a location which maximizes the number of customers who will be “attracted”, or maximizes the combined saving for the customers in their traveling distance to the nearest store. There seem to have infinite number of candidate locations to check. Efficient algorithms to find exact answers.

Continuous RNN Monitoring [ICDE’06, ICDE’07] In a battlefield, the RNNs of a soldier with medical equipment are the soldiers that may need to receive help from him. To continuously monitor the RNNs in real time while all objects are moving is challenging. We proposed solution to the monochromatic case. Cooperated with Univ. of Minnesota to solve the bichromatic case.

Fastest-path computation [ICDE’06] MapQuest provides driving directions without asking leaving time. During rush hour, the best route should be different. Suppose each road segment has a speed pattern. We provide solutions for finding the fastest path, with a leaving time INTERVAL. “I may leave for work some time between 7 and 9. Suggest all fastest paths, e.g. if leaving during [7:43, 8:06], take route A, otherwise take route B”.

Summary Spatial database has many practical applications. Spatial database research aims to design efficient algorithms for various queries. The talk mentioned a few (range query, aggregation query, NN query, RNN query, optimal-location query, fastest-path query, and skyline query). There are much more -- an on-going research field.