Download presentation
Presentation is loading. Please wait.
Published byTheodore Strickland Modified over 8 years ago
1
1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University
2
2 What is spatial database? A database system that is optimized to store and query spatial objects: –Point: a hotel, a car –Line: a road segment –Polygon: landmarks, layout of VLSI VLSI LayoutRoad NetworkSatellite Image
3
3 Are spatial databases useful? Geographical Information Systems –e.g. data: road network and places of interest. –e.g. usage: driving directions, emergency calls, standalone applications. Environmental Systems –e.g. data: land cover, climate, rainfall, and forest fire. –e.g. usage: find total rainfall precipitation. Corporate Decision-Support Systems –e.g. data: store locations and customer locations. –e.g. usage: determine the optimal location for a new store. Battlefield Soldier Monitoring Systems –e.g. data: locations of soldiers (w/wo medical equipments). –e.g. usage: monitor soldiers that may need help from each one with medical equipment.
4
4 MapQuest.com Shortest-Path Query Fastest-Path Query
5
5 Driving directions as you go. Find nearest Wal-Mart or hospital. NN Query
6
6 ArcGIS 9.2, ESRI Range query
7
7 Are spatial databases useful? Geographical Information Systems –e.g. data: road network and places of interest. –e.g. usage: driving directions, emergency calls, standalone applications. Environmental Systems –e.g. data: land cover, climate, rainfall, and forest fire. –e.g. usage: find total rainfall precipitation. Corporate Decision-Support Systems –e.g. data: store locations and customer locations. –e.g. usage: determine the optimal location for a new store. Battlefield Soldier Monitoring Systems –e.g. data: locations of soldiers (w/wo medical equipments). –e.g. usage: monitor soldiers that may need help from each one with medical equipment.
8
8 Aggregation query
9
9 Are spatial databases useful? Geographical Information Systems –e.g. data: road network and places of interest. –e.g. usage: driving directions, emergency calls, standalone applications. Environmental Systems –e.g. data: land cover, climate, rainfall, and forest fire. –e.g. usage: find total rainfall precipitation. Corporate Decision-Support Systems –e.g. data: store locations and customer locations. –e.g. usage: determine the optimal location for a new store. Battlefield Soldier Monitoring Systems –e.g. data: locations of soldiers (w/wo medical equipments). –e.g. usage: monitor soldiers that may need help from each one with medical equipment.
10
10 Optimal Location query
11
11 Are spatial databases useful? Geographical Information Systems –e.g. data: road network and places of interest. –e.g. usage: driving directions, emergency calls, standalone applications. Environmental Systems –e.g. data: land cover, climate, rainfall, and forest fire. –e.g. usage: find total rainfall precipitation. Corporate Decision-Support Systems –e.g. data: store locations and customer locations. –e.g. usage: determine the optimal location for a new store. Battlefield Soldier Monitoring Systems –e.g. data: locations of soldiers (w/wo medical equipments). –e.g. usage: monitor soldiers that may need help from each one with medical equipment.
12
12 Bob John George Bill Mike NN(Bob) = George
13
13 Bob John George Bill Mike RNN query Who will seek help from me? RNN(Bob) = {John, Mike}
14
14 And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? –i.e. not “ dominated ” by any other player. NamePointsReboundsAssistsSteals …… Tracy McGrady2003484448135 …… Kobe Bryant181939239886 …… Shaquille O'Neal166976020036 …… Yao Ming14656696134 …… Dwyane Wade1854397520121 …… Steve Nash116524986174 …… * www.databaseBasketball.com Skyline query
15
15 And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? –i.e. not “ dominated ” by any other player. NamePointsReboundsAssistsSteals …… Tracy McGrady2003484448135 …… Kobe Bryant181939239886 …… Shaquille O'Neal166976020036 …… Yao Ming14656696134 …… Dwyane Wade1854397520121 …… Steve Nash116524986174 …… * www.databaseBasketball.com Skyline query
16
16 And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? –i.e. not “ dominated ” by any other player. NamePointsReboundsAssistsSteals …… Tracy McGrady2003484448135 …… Kobe Bryant181939239886 …… Shaquille O'Neal166976020036 …… Yao Ming14656696134 …… Dwyane Wade1854397520121 …… Steve Nash116524986174 …… * www.databaseBasketball.com Skyline query
17
17 And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? –i.e. not “ dominated ” by any other player. NamePointsReboundsAssistsSteals …… Tracy McGrady2003484448135 …… Kobe Bryant181939239886 …… Shaquille O'Neal166976020036 …… Yao Ming14656696134 …… Dwyane Wade1854397520121 …… Steve Nash116524986174 …… * www.databaseBasketball.com Skyline query
18
18 And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? –i.e. not “ dominated ” by any other player. NamePointsReboundsAssistsSteals …… Tracy McGrady2003484448135 …… Kobe Bryant181939239886 …… Shaquille O'Neal166976020036 …… Yao Ming14656696134 …… Dwyane Wade1854397520121 …… Steve Nash116524986174 …… * www.databaseBasketball.com Skyline query
19
19 And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? –i.e. not “ dominated ” by any other player. NamePointsReboundsAssistsSteals …… Tracy McGrady2003484448135 …… Kobe Bryant181939239886 …… Shaquille O'Neal166976020036 …… Yao Ming14656696134 …… Dwyane Wade1854397520121 …… Steve Nash116524986174 …… * www.databaseBasketball.com Skyline query
20
20 Research goals in spatial databases Support spatial database queries efficiently! –range query, aggregation query, NN query, RNN query, optimal-location query, fastest-path query, skyline query, … Which statement is the best in a large spatial database? (a) Both an O(n 2 ) algorithm and an O(n) algorithm are efficient. (b) An O(n 2 ) algorithm is not efficient, but an O(n) algorithm is. (c) Neither an O(n 2 ) algorithm nor an O(n) algorithm is efficient. Answer: (c)! Even a linear algorithm is not efficient!
21
21 Research goals in spatial databases Example of a linear algorithm: to find my nearest Wal-mart, compare my location with all Wal-marts in the world. Example of a quadratic algorithm: to find the skyline of NBA players, compare every player against all other players (to see if it is dominated). Sample scenario: –Disk page size: 8KB. –Database size: 1GB = 131,072 disk page. –Let each disk I/O be 10 -3 second. O(n): 131 seconds 2 minutes. (Not efficient!) O(n 2 ): 200 days! (Out of the question!)
22
22 How can you do better than O(n)? Answer: use (disk-based) index structures! However, 1-dim index structures, e.g. the B+- tree, are not efficient. E.g. to search for hotels in Boston…
23
23 A 1-dim index is not good enough Suppose a B+-tree exists on X.
24
B+-tree 2*3* Root 17 24 30 14*16* 19*20*22*24*27* 29*33*34* 38* 39* 135 7*5*8* disk-based: stored on disk, load to memory the needed part. paginated: every node is a disk page of fixed size (e.g. 8KB). balanced: all leaf nodes have the same distance from root. dynamically-updateable: dynamic insertion/deletion leaf-storage: all records are stored in leaf nodes. min-capacity: every node (except the root) is at least half full.
25
B+-tree 2*3* Root 17 24 30 14*16* 19*20*22*24*27* 29*33*34* 38* 39* 135 7*5*8* Exact match query: find the record with key=15. Load to memory the nodes along a single path from root to leaf.
26
B+-tree 2*3* Root 17 24 30 14*16* 19*20*22*24*27* 29*33*34* 38* 39* 135 7*5*8* Range search query: find the records [15, 25]. Note that leaf nodes are linked together. So a range search = exact match + horizontal scan.
27
27 A 1-dim index is not good enough Suppose a B+-tree exists on X.
28
28 Suppose a B+-tree exists on X. A 1-dim index is not good enough
29
29 A 1-dim index is not good enough Suppose a B+-tree exists on Y.
30
30 Solution: spatial index! E.g. the R-tree, the HB-tree. Similar to the B+-tree: disk-based, paginated, balanced, dynamically updateable, leaf-storage, min-capacity. Different from the B+-tree: clusters objects which are close to each other in multiple dimensions (vs. one).
31
31 A leaf node in the B+-tree
32
32 A leaf node in the R-tree
33
33 Selected spatial queries (I) Range query: find the objects in a given range. Aggregation query: find some aggregate value (e.g. COUNT) of the objects in a given range. NN query: find the nearest neighbor of a query location. RNN query: find the objects closer to a given location than to other objects. shortest/fastest query: find the shortest/fastest path in a road network.
34
34 Selected spatial queries (II) Optimal-location query: find the optimal location in a given region to build a new franchise store. Skyline query: find the objects not dominated (i.e. worse in all dimensions) by any other object. Join query: find all pairs of intersecting objects, one in each dataset (e.g. find cities near lakes).
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.