Tian Xia and Donghui Zhang Northeastern University

Slides:



Advertisements
Similar presentations
VLDB 2011 Pohang University of Science and Technology (POSTECH) Republic of Korea Jongwuk Lee, Seung-won Hwang VLDB 2011.
Advertisements

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao 1 1 gStore: Answering SPARQL Queries Via Subgraph Matching 1 Peking University, 2 Hong.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
SIGMOD 2006 PAKDD 2009 Finding k-Dominant Skylines in High Dimensional Space K-Dominant Skyline Computation by Using Sort-Filtering Method 1.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
New Sampling-Based Summary Statistics for Improving Approximate Query Answers P. B. Gibbons and Y. Matias (ACM SIGMOD 1998) Rongfang Li Feb 2007.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
SIGMOD 2006University of Alberta1 Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Presented by Fan Deng Joint work with.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.
Indexing Spatio-Temporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science.
Efficient Computation of the Skyline Cube Yidong Yuan School of Computer Science & Engineering The University of New South Wales & NICTA Sydney, Australia.
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces Jian Pei # Wen Jin # Martin Ester # Yufei Tao + # Simon Fraser University,
AAU A Trajectory Splitting Model for Efficient Spatio-Temporal Indexing Presented by YuQing Zhang  Slobodan Rasetic Jorg Sander James Elding Mario A.
25th VLDB, Edinburgh, Scotland, September 7-10, 1999 Extending Practical Pre-Aggregation for On-Line Analytical Processing T. B. Pedersen 1,2, C. S. Jensen.
Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey,
1 Progressive Computation of Constrained Subspace Skyline Queries Evangelos Dellis 1 Akrivi Vlachou 1 Ilya Vladimirskiy 1 Bernhard Seeger 1 Yannis Theodoridis.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University.
R ++ -tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice.
Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. Nov 12, 2004.
Page 1 MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services Shoji Nishimura (NEC Service Platforms Labs.), Sudipto Das,
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
DB Seminar Schedule Seminar Schedule ================================================================= Chui Chun Kit30/11/07 Gong Jian Jim7/12/07 Loo Kin.
Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.
On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:
9/2/2005VLDB 2005, Trondheim, Norway1 On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern.
QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.
D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang 1, Tie Hua Zhou 1, Kyung Ah Kim 2, Eun Jong Cha 2, and Keun.
Interactive Data Exploration Using Semantic Windows Alexander Kalinin Ugur Cetintemel, Stan Zdonik.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Dense-Region Based Compact Data Cube
Nan Zhang Texas A&M University
TITLE What should be in Objective, Method and Significant
Updating SF-Tree Speaker: Ho Wai Shing.
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Progressive Computation of The Min-Dist Optimal-Location Query
Efficient Join Query Evaluation in a Parallel Database System
Efficient Methods for Data Cube Computation
A paper on Join Synopses for Approximate Query Answering
The Variable-Increment Counting Bloom Filter
COMP 430 Intro. to Database Systems
Query in Streaming Environment
Supporting Fault-Tolerance in Streaming Grid Applications
TT-Join: Efficient Set Containment Join
Preference Query Evaluation Over Expensive Attributes
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
Introduction to Spatial Databases
Sofian Maabout University of Bordeaux. CNRS
Packet Classification Using Coarse-Grained Tuple Spaces
Probabilistic n-of-N Skyline Computation over Uncertain Data Streams
Skyline query with R*-Tree: Branch and Bound Skyline (BBS) Algorithm
Presented by: Mahady Hasan Joint work with
Similarity Search: A Matching Based Approach
Minwise Hashing and Efficient Search
The Skyline Query in Databases Which Objects are the Most Important?
Efficient Processing of Top-k Spatial Preference Queries
Donghui Zhang, Tian Xia Northeastern University
Efficient Aggregation over Objects with Extent
Presentation transcript:

Tian Xia and Donghui Zhang Northeastern University Refreshing the Sky: The Compressed Skycube with Efficient Support for Frequent Updates Tian Xia and Donghui Zhang Northeastern University June 29, 2006 SIGMOD 2006, Chicago, IL

Skyline Query A classic example revisited. Hotels in Nassau 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 t5 t6 t7 t1 t4 t2 t3 Dist. To Beach Price Price Dist. To Beach t1 3 2 t2 4 7 t3 9 5 t4 4 6 t5 2 3 t6 6 1 t7 1 4 The smaller, the better. If no object is better than ti in all dimensions, ti is a skyline object. June 29, 2006 SIGMOD 2006, Chicago, IL

Subspace Skyline Query What if users may issue skyline queries based on arbitrary subsets of dimensions? Results of subspace skylines can be very different! u1 u2 u3 u4 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 u3 u1 t2 t1 t3 t4 t5 t6 t7 Skyline in u1, u3 u4 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 u3 t5 t6 t7 t1 t2 t3 t4 Skyline in u3, u4 t1 3 4 2 5 t2 4 6 7 2 t3 9 7 5 6 t4 4 3 6 1 t5 2 2 3 1 t6 6 1 1 3 t7 1 3 4 1 Objects of 4-dimensions June 29, 2006 SIGMOD 2006, Chicago, IL

Skycube (Yuan, et al., VLDB 2005) A d-dimensional space contains 2d-1 subspaces, and the subspaces of various users’ interests are unpredictable. On-the-fly computation does not achieve fast response time for an online system. Skycube is the collection of all subspace skyline results. u1u2u3u4 u1u2 u3 u1u2u4 u1u3u4 u2u3u4 u1 u2 u3 u4 u1u2 u1u3 u1u4 u2u3 u2u4 u3u4 Full-space skyline June 29, 2006 SIGMOD 2006, Chicago, IL

Our Motivations (1) In many scenarios of the subspace skyline applications, the data are changing constantly. Example: In an online hotel-booking system, room prices change due to the availability. The previous Skycube paper focused only on the initial computation of the Skycube. A straightforward re-computation upon each update is extremely inefficient! June 29, 2006 SIGMOD 2006, Chicago, IL

Our Motivations (2) The complete Skycube contains a huge number of duplicates. Drawback 1: Waste of storage. Drawback 2: Difficult to maintain. The large size of the Skycube and the large number of duplications cause the update of the Skycube inherently expensive. June 29, 2006 SIGMOD 2006, Chicago, IL

Corresponding Skycube Our Motivations (2) Cuboid Skyline u1 t7 u2 t6 u1 u2 u3 u4 u3 t6 t1 3 4 2 5 u4 t5 , t7 , t4 t5 2 2 3 1 u1 , u2 t5 , t6 , t7 , t9 Corresponding Skycube t6 6 1 1 3 u1 , u3 t1 , t5 , t6 , t7 , t9 t7 1 3 4 1 u1 , u4 t7 t4 4 3 6 1 u2 , u3 t6 t9 2 2 3 7 u2 , u4 t5 , t6 t2 4 6 7 2 u3 , u4 t5 , t6 t3 9 7 5 6 u1 , u2 , u3 t1 , t5 , t6 , t7 , t9 t8 6 5 3 8 u1 , u2 , u4 t5 , t6 , t7 u1 , u3 , u4 t1 , t5 , t6 , t7 Full-space skyline objects u2 , u3 , u4 t5 , t6 Other skyline objects (not in full-space) u1 , u2 , u3 , u4 t1 , t5 , t6 , t7 June 29, 2006 SIGMOD 2006, Chicago, IL

Our Motivations (3) – Tradeoffs The complete Skycube: Fast query response. High update cost. On-the-fly computation: Slow query response. Low update cost. June 29, 2006 SIGMOD 2006, Chicago, IL

Our Solution – The Compressed Skycube We propose a new storage model for the Skycube, which greatly reduces the storage. We propose a new object-aware update scheme, which avoids unnecessary disk access and cuboids' computation. By taking advantages of the compact structure and our update scheme, the Compressed Skycube achieves both fast query response and very efficient update. June 29, 2006 SIGMOD 2006, Chicago, IL

Outline Background and Motivations The Compressed Skycube Experimental Results Conclusions June 29, 2006 SIGMOD 2006, Chicago, IL

Minimum Subspace DEFINITION: Given an object t, the minimum subspaces of t, denoted as mss(t), satisfies the following two conditions: For any subspace U in mss(t), t is in the skyline of U; And for any subspace V  U, t is not in the skyline of V. June 29, 2006 SIGMOD 2006, Chicago, IL

Minimum Subspace Object t6 appears in the skylines of 12 cuboids. Cuboid Skyline u1 t7 Object t6 appears in the skylines of 12 cuboids. The minimum subspaces of t6 are only 2 cuboids. u2 t6 u3 t6 u4 t5 , t7 , t4 u1 , u2 t5 , t6 , t7 , t9 u1 , u3 t1 , t5 , t6 , t7 , t9 u1 , u4 t7 u2 , u3 t6 t4 u4 t9 u1, u2, u1, u3 t7 u1, u4 t1 u1, u3 t5 u4, u1, u2, u1, u3 t6 u2, u3 Minimum Subspaces u2 , u4 t5 , t6 u3 , u4 t5 , t6 Switch t5 <-> t6 u1 , u2 , u3 t1 , t5 , t6 , t7 , t9 u1 , u2 , u4 t5 , t6 , t7 u1 , u3 , u4 t1 , t5 , t6 , t7 u2 , u3 , u4 t5 , t6 u1 , u2 , u3 , u4 t1 , t5 , t6 , t7 June 29, 2006 SIGMOD 2006, Chicago, IL

The Compressed Skycube DEFINITION: The Compressed Skycube (CSC) consists of non-empty cuboids U, such that an object t is stored in a cuboid U if and only if U mss(t). t7 Cuboid Skyline u1 u2 u3 u4 u1 , u2 u1 , u3 t1 , t5 , t9 t5 , t9 t5 , t7 , t4 t6 Cuboids of CSC t4 u4 t9 u1, u2, u1, u3 t7 u1, u4 t1 u1, u3 t5 u4, u1, u2, u1, u3 t6 u2, u3 Minimum Subspaces June 29, 2006 SIGMOD 2006, Chicago, IL

Querying CSC Overview example: query space Uq = u2, u3 , u4  u1 u2 u3 u4 Cuboid Skyline t1 3 4 2 5 u1 t7 t5 2 2 3 1 u2 t6 t6 t6 6 1 1 3 u3 t6 t7 1 3 4 1 u4 t5 , t7 , t4 t5 t4 4 3 6 1 u1 , u2 t5 , t9 t9 2 2 3 7 u1 , u3 t1 , t5 , t9 Search within the cuboids which are the subsets of Uq. Compare the objects only within a candidate cuboid to filter out false positives. June 29, 2006 SIGMOD 2006, Chicago, IL

Querying CSC LEMMA 1: Given a query space Uq and an object t, if for any subspace Ui in mss(t), Ui  Uq, then t is not in the skyline of Uq. Lemma 1 implies two important facts: Only the existing cuboids that Uq need to be searched. No other cuboids need to be accessed or computed in the query process. Example: Uq = u2, u3 , u4 , and t9 can be safely pruned. u1 u2 u3 u4 t7 Cuboid Skyline u1 u2 u3 u4 u1 , u2 u1 , u3 t1 , t5 , t9 t5 , t9 t5 , t7 , t4 t6 t1 3 4 2 5 t5 2 2 3 1 t6 6 1 1 3 t7 1 3 4 1 t4 4 3 6 1 t9 2 2 3 7 June 29, 2006 SIGMOD 2006, Chicago, IL

Querying CSC LEMMA 2 (Local Comparison): To check a candidate t in a cuboid V  Uq, we only need to compare t with the objects within the same cuboid. Example: Uq = u2, u3 , u4 , and the skyline of Uq is {t5, t6}. No comparison is needed for t6. And t5, t7, t4 are only locally compared to each other. u1 u2 u3 u4 Cuboid Skyline t1 3 4 2 5 u1 t7 t5 2 2 3 1 u2 t6 t6 6 1 1 3 u3 t6 t7 1 3 4 1 u4 t5 , t7 , t4 t4 4 3 6 1 u1 , u2 t5 , t9 t9 2 2 3 7 u1 , u3 t1 , t5 , t9 June 29, 2006 SIGMOD 2006, Chicago, IL

Updating CSC Intuitions: Not all updates of objects need to access the dataset. Not all updates of objects need to re-compute the skyline of a cuboid. These intuitions are supported by our theorems. D: full-space; sky(D): full-space skyline. t: object before update; tnew: object after update. t  sky(D) No dataset (disk) access tnew  sky(D) tnew  sky(D) No cuboid computation. Existing CSC objects are not changed. Existing CSC objects may be removed or move to other cuboids. May access dataset (disk) t  sky(D) Insert new skyline objects Considering the proportion of full-space skyline objects in the whole dataset, the above covers most cases of the updates June 29, 2006 SIGMOD 2006, Chicago, IL

Updating CSC t  sky(D) and tnew  sky(D) Key points: Compare tnew with existing full-space skyline objects (sky(D)). mss(tnew) is determined by any dominating object in sky(D). u1 u2 u3 u4 Minimum Subspaces Cuboid Skyline t1 3 4 2 5 t1 u1, u3 u1 t7 t5 2 2 3 1 t5 u4, u1, u2, u1, u3 u2 t6 t6 6 1 1 3 t6 u2, u3 u3 t6 t7 1 3 4 1 t7 u1, u4 u4 t5 , t7 , t4 t4 4 3 6 1 t4 u4 u1 , u2 t5 , t9 u1 , u3 t1 , t5 , t9 t9 2 2 3 7 t9 u1, u2, u1, u3 June 29, 2006 SIGMOD 2006, Chicago, IL

Updating CSC t  sky(D) and tnew  sky(D) Key points: Existing objects may be removed or move to super-set cuboids. Determine mss(tnew) is not intuitive in this case. A new recursion-based approach is proposed to avoid unnecessary computations. u1 u2 u3 u4 t4 u4 t9 u1, u2, u1, u3 t7 u1, u4 t1 u1, u3 t5 u4, u1, u2, u1, u3 t6 u2, u3 Minimum Subspaces Cuboid Skyline t1 3 4 2 5 u1 t7 , t10 t5 2 2 3 1 u2 t6 t6 6 1 1 3 u3 t6 , t10 t7 1 3 4 1 u4 t5 , t7 , t4 t4 4 3 6 1 u1 , u2 t5 , t9 t9 2 2 3 7 u1 , u3 t1 , t5 , t9 t10 1 3 1 3 t10 u1, u3 June 29, 2006 SIGMOD 2006, Chicago, IL

Outline Background and Motivations The Compressed Skycube Experimental Results Conclusions June 29, 2006 SIGMOD 2006, Chicago, IL

Storage Comparison Settings: Dimensionality (Full-space) – [4, 8]; default = 6. Cardinality – [100K, 500K]; default = 300K. Distribution: Independent, Corr, Anti-Corr. Logarithmic Scale! June 29, 2006 SIGMOD 2006, Chicago, IL

Query Performance Queries on the complete skycube do not involve computations, their time is not reported. This set of experiments verifies that the query response of the CSC is indeed very fast. June 29, 2006 SIGMOD 2006, Chicago, IL

Update Performance General update Updates are from random objects in the whole dataset. Skycube is re-computed from scratch. Full-space skyline update. Updates are from random full-space skyline objects. Skycube is re-computed from existing skylines plus new candidates. June 29, 2006 SIGMOD 2006, Chicago, IL

Outline Background and Motivations The Compressed Skycube Experimental Results Conclusions June 29, 2006 SIGMOD 2006, Chicago, IL

Conclusions We addressed the update support of the skycube in dynamic environment, and provided an efficient and scalable solution for online skyline query system. We proposed a compact structure, the compressed skycube, with about 10% disk space of the complete skycube and fast query response. We proposed an object-aware update scheme, such that different updates trigger different amount of computation. Our CSC outperforms the Skycube in update by several orders of magnitude. June 29, 2006 SIGMOD 2006, Chicago, IL

Thank you! Tian Xia and Donghui Zhang. Refreshing the Sky: the Compressed Skycube with Efficient Support for Frequent Updates. SIGMOD 2006. Questions? June 29, 2006 SIGMOD 2006, Chicago, IL