Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.

Slides:



Advertisements
Similar presentations
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Advertisements

Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values.
Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.
Indexing and Range Queries in Spatio-Temporal Databases
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
March DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
--Presented By Sudheer Chelluboina. Professor: Dr.Maggie Dunham.
Spatio-Temporal Databases
Optimization of Spatial Joins on Mobile Devices N. Mamoulis 1, P. Kalnis 2, S. Bakiras 3, X. Li 2 1 Department of Computer Science and Information Systems,
Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
1 SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref Presented by.
Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases Jimeng Sun, Dimitris Papadias, Yufei Tao, Bin Liu.
Spatial Information Systems (SIS) COMP Spatial access methods: Indexing.
Hierarchical Constraint Satisfaction in Spatial Database Dimitris Papadias, Panos Kalnis And Nikos Mamoulis.
An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
R-tree Analysis. R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
1 SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref Presented by.
Indexing Spatio-Temporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science.
Ad-hoc Distributed Spatial Joins on Mobile Devices Panos Kalnis, Xiaochen Li National University of Singapore Nikos Mamoulis The University of Hong Kong.
Evaluation of Top-k OLAP Queries Using Aggregate R-trees Nikos Mamoulis (HKU) Spiridon Bakiras (HKUST) Panos Kalnis (NUS)
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
GeoPKDD Geographic Privacy-aware Knowledge Discovery and Delivery Kick-off meeting Pisa, March 14, 2005.
DSI : A Fully Distributed Spatial Index for Location-based Wireless Broadcast Services Sungwon Jung Dept. of Computer Science and Engineering Sogang University.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.
KNR-tree: A novel R-tree-based index for facilitating Spatial Window Queries on any k relations among N spatial relations in Mobile environments ANIRBAN.
Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.
R-Tree. 2 Spatial Database (Ia) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.
DIST: A Distributed Spatio-temporal Index Structure for Sensor Networks Anand Meka and Ambuj Singh UCSB, 2005.
1 The MV3R-Tree: A Spatio- Temporal Access Method for Timestamp and Interval Queries Yufei Tao and Dimitris Papadias Hong Kong University of Science and.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:
ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Trajectory Data Mining Dr. Yu Zheng Lead Researcher, Microsoft Research Chair Professor at Shanghai Jiao Tong University Editor-in-Chief of ACM Trans.
1 Complex Spatio-Temporal Pattern Queries Cahide Sen University of Minnesota.
Location-based Spatial Queries AGM SIGMOD 2003 Jun Zhang §, Manli Zhu §, Dimitris Papadias §, Yufei Tao †, Dik Lun Lee § Department of Computer Science.
SF-Tree and Its Application to OLAP Speaker: Ho Wai Shing.
Spatio-Temporal Databases. Term Project Groups of 2 students You can take a look on some project ideas from here:
Rethinking Choices for Multi-dimensional Point Indexing You Jung Kim and Jignesh M. Patel University of Michigan.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University.
CSIS7101 – Advanced Database Technologies Spatio-Temporal Data (Part 3) The MV3-Tree: A Spatio-Temporal Access Method for Timestamp and Interval Queries.
Click to edit Present’s Name AP-Tree: Efficiently Support Continuous Spatial-Keyword Queries Over Stream Xiang Wang 1*, Ying Zhang 2, Wenjie Zhang 1, Xuemin.
Advanced Database Aggregation Query Processing
Managing Massive Trajectories on the Cloud
Spatial Data Management
Strategies for Spatial Joins
Spatio-Temporal Databases
S SPATE: Compacting and Exploring Telco Big Data Constantinos Costa1 , Georgios Chatzimilioudis1, Demetris Zeinalipour-Yazti2,1, Mohamed F. Mokbel3.
Sameh Shohdy, Yu Su, and Gagan Agrawal
Spatio-temporal Pattern Queries
Spatial Online Sampling and Aggregation
Spatio-temporal Databases
On Spatial Joins in MapReduce
Spatio-Temporal Databases
Joining Interval Data in Relational Databases
Spatio-temporal Databases
Donghui Zhang, Tian Xia Northeastern University
Efficient Aggregation over Objects with Extent
Presentation transcript:

Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong

Motivating Scenario The spatial dimension at the finest granularity consists of a set of regions (e.g., road segments in traffic supervision systems, areas covered by cells in mobile communication systems) The raw data provide the set of objects that fall in each region every timestamp (e.g., cars in a road segment, users serviced by a cell). Queries ask for aggregate data over regions that satisfy some spatio- temporal condition (find the current traffic in all areas in a 1km range around each hospital). Unlike traditional OLAP, there do not exist pre-defined hierarchies.

The aggregate R-tree The same idea can be applied for other access methods (e.g, quadtrees). Other functions may be used (e.g., avg, max). An R-tree with aggregate data for every entry.

Why keep spatiotemporal aggregate information Aggregate information is all that we need/know for some applications (e.g., traffic systems record the number of cars in an area not their ids) Storing historical information about individual objects may raise privacy issues (having all locations of mobile phone users through history may be illegal) Although the actual data may be highly volatile and involve extreme space requirements, the summarized data are less voluminous and may remain rather constant for long intervals. For efficient query processing (e.g., the number of objects inside an area can be found by a window query instead of a spatial join).

aR-trees and OLAP operations The aR-tree corresponds to a lattice. There may be multiple dimensions.

Query Processing- Single Window "find the total number of cars on all road segments inside a query window" Start from the root of the aR-tree: for all entries one of the following three conditions may hold:  The entry is disjoint with the query window; thus, the corresponding node cannot contain any cars contributing to the answer and is not retrieved.  The entry is inside the query window in which case all aggregate information is stored with the entry and the corresponding node does not need to be accessed.  The entry partially overlaps the query window in which case the corresponding node must be recursively followed.

Query Processing - Multiple Windows "Find the total number of cars on road segments inside each city suburb" Without aR-trees, the query can be processed as a multiway spatial join (suburbs, cars, road segments). With aR-trees, it is processed as a pairwise join (suburbs, aR- tree). If the query windows (i.e., suburbs) fit in memory, we propose an extension of the single-window technique that considers all windows in parallel.

Experimental Settings Tiger Dataset (130,000 road segments) We randomly selected 5,000 seed points which were located on roads. For each seed point, we generated a cluster with 250 points (i.e. car positions) with Gaussian distribution; therefore the total number of cars was 1.25M. The distribution of the queries follows the distribution of the roads

Evaluation for Single-Window Queries Fact table approach: an R-tree indexes the fact table (i.e., similar to aR- trees, but no aggregate information in the intermediate nodes). Raw data approach: join the cars and streets datasets.

aR-tree (single queries): a set of single-window queries processed using the single_aggregation algorithm of aR-trees. Evaluation for Multiple-Window Queries Fact table (join): join between the R-tree index of the fact table and the query windows which fit in memory. Fact table (single): indexed nested loops using the R-tree index of the fact table.

Applications to spatio-temporal data Query: "find the total number of objects in the regions intersecting some window q s during a time interval q t "

The aggregate 3DR-tree (a3DR-tree) Each entry has the form, that is, for each region it keeps the aggregate value and the time interval during which this value is valid. Whenever the aggregate information about a region changes a new entry is created. Advantage: the a3DR-tree integrates spatial and temporal dimensions in the same structure (and is, therefore, expected to be more efficient than column scanning for queries that involve both conditions) Disadvantage: it wastes space by storing the MBR each time there is an aggregate change

The aggregate RB tree

Query Example Find all objects in some region overlapping the query window q s during the time interval [1-3]

The aggregate 3DRB-tree

Conclusions and directions for future work Spatio-temporal OLAP very promising direction of work Incorporation of multi-version structures for dynamic dimensions Formalization - analysis of when aggregation multi-trees are preferable