Download presentation
Presentation is loading. Please wait.
Published byPosy Carter Modified over 9 years ago
1
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong
2
Motivating Scenario The spatial dimension at the finest granularity consists of a set of regions (e.g., road segments in traffic supervision systems, areas covered by cells in mobile communication systems) The raw data provide the set of objects that fall in each region every timestamp (e.g., cars in a road segment, users serviced by a cell). Queries ask for aggregate data over regions that satisfy some spatio- temporal condition (find the current traffic in all areas in a 1km range around each hospital). Unlike traditional OLAP, there do not exist pre-defined hierarchies.
3
The aggregate R-tree The same idea can be applied for other access methods (e.g, quadtrees). Other functions may be used (e.g., avg, max). An R-tree with aggregate data for every entry.
4
Why keep spatiotemporal aggregate information Aggregate information is all that we need/know for some applications (e.g., traffic systems record the number of cars in an area not their ids) Storing historical information about individual objects may raise privacy issues (having all locations of mobile phone users through history may be illegal) Although the actual data may be highly volatile and involve extreme space requirements, the summarized data are less voluminous and may remain rather constant for long intervals. For efficient query processing (e.g., the number of objects inside an area can be found by a window query instead of a spatial join).
5
aR-trees and OLAP operations The aR-tree corresponds to a lattice. There may be multiple dimensions.
6
Query Processing- Single Window "find the total number of cars on all road segments inside a query window" Start from the root of the aR-tree: for all entries one of the following three conditions may hold: The entry is disjoint with the query window; thus, the corresponding node cannot contain any cars contributing to the answer and is not retrieved. The entry is inside the query window in which case all aggregate information is stored with the entry and the corresponding node does not need to be accessed. The entry partially overlaps the query window in which case the corresponding node must be recursively followed.
7
Query Processing - Multiple Windows "Find the total number of cars on road segments inside each city suburb" Without aR-trees, the query can be processed as a multiway spatial join (suburbs, cars, road segments). With aR-trees, it is processed as a pairwise join (suburbs, aR- tree). If the query windows (i.e., suburbs) fit in memory, we propose an extension of the single-window technique that considers all windows in parallel.
8
Experimental Settings Tiger Dataset (130,000 road segments) We randomly selected 5,000 seed points which were located on roads. For each seed point, we generated a cluster with 250 points (i.e. car positions) with Gaussian distribution; therefore the total number of cars was 1.25M. The distribution of the queries follows the distribution of the roads
9
Evaluation for Single-Window Queries Fact table approach: an R-tree indexes the fact table (i.e., similar to aR- trees, but no aggregate information in the intermediate nodes). Raw data approach: join the cars and streets datasets.
10
aR-tree (single queries): a set of single-window queries processed using the single_aggregation algorithm of aR-trees. Evaluation for Multiple-Window Queries Fact table (join): join between the R-tree index of the fact table and the query windows which fit in memory. Fact table (single): indexed nested loops using the R-tree index of the fact table.
11
Applications to spatio-temporal data Query: "find the total number of objects in the regions intersecting some window q s during a time interval q t "
12
The aggregate 3DR-tree (a3DR-tree) Each entry has the form, that is, for each region it keeps the aggregate value and the time interval during which this value is valid. Whenever the aggregate information about a region changes a new entry is created. Advantage: the a3DR-tree integrates spatial and temporal dimensions in the same structure (and is, therefore, expected to be more efficient than column scanning for queries that involve both conditions) Disadvantage: it wastes space by storing the MBR each time there is an aggregate change
13
The aggregate RB tree
14
Query Example Find all objects in some region overlapping the query window q s during the time interval [1-3]
15
The aggregate 3DRB-tree
16
Conclusions and directions for future work Spatio-temporal OLAP very promising direction of work Incorporation of multi-version structures for dynamic dimensions Formalization - analysis of when aggregation multi-trees are preferable
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.