Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases Jimeng Sun, Dimitris Papadias, Yufei Tao, Bin Liu
2 Motivation Spatio-temporal databases vs. Data streams The monitoring applications –Traffic supervision –Mobile users monitoring –Weather forecasting Example: –find the number of vehicles in the city center now The challenge is to provide fast query response in highly intensive environment
3 Problems and methods Problems: –How to efficiently store/summarize the spatio-temporal information? –How to approximately answer the query about the past, the present, and the future? Methods: –Adaptive multi-dimensional histogram (AMH) –Historical synopsis –Stochastic prediction method
4 Related work Histograms –Static multi-dimensional histograms Equi-depth, Mhist, Minskew, Genhist, SQ –Query-adaptive multi-dimensional histograms STGrid, STHoles, SASH Other approximation methods –DCT, Wavelet, Sketch Spatio-temporal databases –Historical retrieval –Future prediction
5 Outline Introduction Problem and proposed methods –Adaptive multi-dimensional histogram –Historical synopsis –Prediction model Experiment Conclusion
6 Query types Present Time (PT) Historical Time (HT) Future Time (FT) Queries time location currentpastfuture
7 System Overview PT HT FT Queries AMH Past Index Historical Synopsis Prediction Model Spatio-temporal updates
8 Histogram Partition the space into buckets Data within a bucket summarize by the mean The properties of a good histogram: –Uniformity within each bucket –Incremental updateable bad good
9 Adaptive Multi-dimensional Histogram (AMH) Regular cells Objective: minimize WVS= (area i ∙var i ) (Minskew [Acharya, Poosala, Ramaswamy 99]) n1 n2 n3 n4 b1b2 b4b3 b5 n5 b6 BPT b1 b2 b3 b4 b6 b5 Buckets
10 Dynamic Maintenance of AMH Our scheme: record the information during the construction and modify the structure as needed. –1. information update Update the bucket count –2. bucket reorganization Merge: to claim buckets Split: to reduce WVS
11 Information update of AMH n1 n2 n3 n4 b1b2 b4b3 b5 n5 b6 BPT b1 b2 b3 b4 b6 b5 Buckets mapping b1 n2 n1
12 Bucket reorganization -Merge n1 n2 n3 b1b2 b5 BPT n1 n2 n3 n4 b1b2 b4b3 b5 n5 b6 BPT n1 n2 n3 n4 b1b2 b4b3 b5 n5 b6 n4 b* Merge b1 b2 b* b5 Buckets Bucket Info: 1. region [x-, x+][y-,y+] 2. frequency: count/area 3. 2 nd moment: (for variance calculation) Merge the subtree that leads to minimal WVS increase
13 Bucket reorganization -Split n1 n2 n3 b1b2 b5b* Split n1 n2 n3 b*1 b2 b5b* b*2 n4 b*3b*4 n5 Split the bucket that leads to maximal WVS decrease
14 Features of AMH Bucket information is updated as new data arrive Bucket extents continuously adapt the data distribution changes The maintenance does not affect the normal query processing –It is interruptible at any moment of time –It is performed at the CPU idle time
15 Outline Introduction Problem and proposed methods –Adaptive multi-dimensional histogram –Historical synopsis –Prediction model Experiment Conclusion
16 Historical Synopsis AMH maintains the current buckets. Past index stores the obsolete buckets. Past index: –Packed B-tree –3D R-tree
17 Prediction Model Prediction based on velocity doesn’t work! –It is not realistic to assume velocity remains constant between current time and query time –Velocity is highly dynamic We suggest to use only the past and present location information to do prediction.
18 Prediction Model (cont.) FT Prediction Model HT PT Historical Synopsis results Parse forecast the future using any time series prediction method: we use AR
19 Outline Introduction Related work Problem and proposed methods –Adaptive multi-dimensional histogram –Historical synopsis –Prediction model Experiment Conclusion
20 Experiment settings Datasets –2.5M updates for each dataset –spatial: 50K mobile objects from 2 spatial dataset –road: from a spatio-temporal generator (described in [Brinkhoff 2002] ) median finalinitial Road networkData distribution
21 Robustness with time spatial road Query: qlength = 6% of the data space; 25K queries uniformly distribute along space and time
22 Comparison with conventional histogram Minskew (a static spatial histogram) is rebuilt every 50k location updates tp is the proportion between the cost of AMH and that of Minskew The re-organization operations of AMH are uniformly distributed among the 50k location updates. spatial road minskew AMH minskew AMH
23 The effect of update intensity B-tree performs better at the high update rate. R-tree provides much faster query response. In general, when query/update ratio is large (>30%), R-tree performs better. spatial road 3D r-tree b-tree Query type
24 Conclusion We present a comprehensive approach for processing queries that refer to any time in history. The proposed architecture maintains – an incremental multi-dimensional histogram; –a past index structure for storing the outdated buckets. Future queries are answered by a stochastic method that uses the recent history to predict the future.
25 Q+A
26 Summary AMH Past Index Historical Synopsis Prediction Model 0. goal: min(WVS) 1. Info update 2. Reorganization happens when CPU is idle 1.Recent buckets in memory 2.Old buckets dump to the disk Old buckets Forecast based on the present and past.
27 Related work Static multi-dimensional histograms Query-adaptive multi-dimensional histograms Other multi-dimensional approximation methods Spatio-temporal prediction methods Spatio-temporal aggregation methods
28 Evaluation over different query types spatial road
29 Motivation (cont.) Spatio-temporal database (STDB) research: –historical retrieval –future prediction
30 Bucket reorganization -Split n1 n2 n3 b1b2 b5b* b1 b2 b* b5 Buckets Split b*1 b2 b* b5 Buckets n1 n2 n3 b*1 b2 b5b* b*2 n4 b*2 b*3b*4 n5 b*3 b*4