Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tian Xia and Donghui Zhang Northeastern University

Similar presentations


Presentation on theme: "Tian Xia and Donghui Zhang Northeastern University"— Presentation transcript:

1 Tian Xia and Donghui Zhang Northeastern University
Refreshing the Sky: The Compressed Skycube with Efficient Support for Frequent Updates Tian Xia and Donghui Zhang Northeastern University June 29, 2006 SIGMOD 2006, Chicago, IL

2 Skyline Query A classic example revisited. Hotels in Nassau
t5 t6 t7 t1 t4 t2 t3 Dist. To Beach Price Price Dist. To Beach t t t t t t t The smaller, the better. If no object is better than ti in all dimensions, ti is a skyline object. June 29, 2006 SIGMOD 2006, Chicago, IL

3 Subspace Skyline Query
What if users may issue skyline queries based on arbitrary subsets of dimensions? Results of subspace skylines can be very different! u u u u4 u3 u1 t2 t1 t3 t4 t5 t6 t7 Skyline in u1, u3 u4 u3 t5 t6 t7 t1 t2 t3 t4 Skyline in u3, u4 t t t t t t t Objects of 4-dimensions June 29, 2006 SIGMOD 2006, Chicago, IL

4 Skycube (Yuan, et al., VLDB 2005)
A d-dimensional space contains 2d-1 subspaces, and the subspaces of various users’ interests are unpredictable. On-the-fly computation does not achieve fast response time for an online system. Skycube is the collection of all subspace skyline results. u1u2u3u4 u1u2 u u1u2u u1u3u u2u3u4 u u u u4 u1u u1u u1u u2u u2u u3u4 Full-space skyline June 29, 2006 SIGMOD 2006, Chicago, IL

5 Our Motivations (1) In many scenarios of the subspace skyline applications, the data are changing constantly. Example: In an online hotel-booking system, room prices change due to the availability. The previous Skycube paper focused only on the initial computation of the Skycube. A straightforward re-computation upon each update is extremely inefficient! June 29, 2006 SIGMOD 2006, Chicago, IL

6 Our Motivations (2) The complete Skycube contains a huge number of duplicates. Drawback 1: Waste of storage. Drawback 2: Difficult to maintain. The large size of the Skycube and the large number of duplications cause the update of the Skycube inherently expensive. June 29, 2006 SIGMOD 2006, Chicago, IL

7 Corresponding Skycube
Our Motivations (2) Cuboid Skyline u1 t7 u2 t6 u u u u4 u3 t6 t u4 t5 , t7 , t4 t u1 , u2 t5 , t6 , t7 , t9 Corresponding Skycube t u1 , u3 t1 , t5 , t6 , t7 , t9 t u1 , u4 t7 t u2 , u3 t6 t u2 , u4 t5 , t6 t u3 , u4 t5 , t6 t u1 , u2 , u3 t1 , t5 , t6 , t7 , t9 t u1 , u2 , u4 t5 , t6 , t7 u1 , u3 , u4 t1 , t5 , t6 , t7 Full-space skyline objects u2 , u3 , u4 t5 , t6 Other skyline objects (not in full-space) u1 , u2 , u3 , u4 t1 , t5 , t6 , t7 June 29, 2006 SIGMOD 2006, Chicago, IL

8 Our Motivations (3) – Tradeoffs
The complete Skycube: Fast query response. High update cost. On-the-fly computation: Slow query response. Low update cost. June 29, 2006 SIGMOD 2006, Chicago, IL

9 Our Solution – The Compressed Skycube
We propose a new storage model for the Skycube, which greatly reduces the storage. We propose a new object-aware update scheme, which avoids unnecessary disk access and cuboids' computation. By taking advantages of the compact structure and our update scheme, the Compressed Skycube achieves both fast query response and very efficient update. June 29, 2006 SIGMOD 2006, Chicago, IL

10 Outline Background and Motivations The Compressed Skycube
Experimental Results Conclusions June 29, 2006 SIGMOD 2006, Chicago, IL

11 Minimum Subspace DEFINITION: Given an object t, the minimum subspaces of t, denoted as mss(t), satisfies the following two conditions: For any subspace U in mss(t), t is in the skyline of U; And for any subspace V  U, t is not in the skyline of V. June 29, 2006 SIGMOD 2006, Chicago, IL

12 Minimum Subspace Object t6 appears in the skylines of 12 cuboids.
Cuboid Skyline u1 t7 Object t6 appears in the skylines of 12 cuboids. The minimum subspaces of t6 are only 2 cuboids. u2 t6 u3 t6 u4 t5 , t7 , t4 u1 , u2 t5 , t6 , t7 , t9 u1 , u3 t1 , t5 , t6 , t7 , t9 u1 , u4 t7 u2 , u3 t6 t u4 t u1, u2, u1, u3 t u1, u4 t u1, u3 t u4, u1, u2, u1, u3 t u2, u3 Minimum Subspaces u2 , u4 t5 , t6 u3 , u4 t5 , t6 Switch t5 <-> t6 u1 , u2 , u3 t1 , t5 , t6 , t7 , t9 u1 , u2 , u4 t5 , t6 , t7 u1 , u3 , u4 t1 , t5 , t6 , t7 u2 , u3 , u4 t5 , t6 u1 , u2 , u3 , u4 t1 , t5 , t6 , t7 June 29, 2006 SIGMOD 2006, Chicago, IL

13 The Compressed Skycube
DEFINITION: The Compressed Skycube (CSC) consists of non-empty cuboids U, such that an object t is stored in a cuboid U if and only if U mss(t). t7 Cuboid Skyline u1 u2 u3 u4 u1 , u2 u1 , u3 t1 , t5 , t9 t5 , t9 t5 , t7 , t4 t6 Cuboids of CSC t u4 t u1, u2, u1, u3 t u1, u4 t u1, u3 t u4, u1, u2, u1, u3 t u2, u3 Minimum Subspaces June 29, 2006 SIGMOD 2006, Chicago, IL

14 Querying CSC Overview example: query space Uq = u2, u3 , u4 
u u u u4 Cuboid Skyline t u1 t7 t u2 t6 t6 t u3 t6 t u4 t5 , t7 , t4 t5 t u1 , u2 t5 , t9 t u1 , u3 t1 , t5 , t9 Search within the cuboids which are the subsets of Uq. Compare the objects only within a candidate cuboid to filter out false positives. June 29, 2006 SIGMOD 2006, Chicago, IL

15 Querying CSC LEMMA 1: Given a query space Uq and an object t, if for any subspace Ui in mss(t), Ui  Uq, then t is not in the skyline of Uq. Lemma 1 implies two important facts: Only the existing cuboids that Uq need to be searched. No other cuboids need to be accessed or computed in the query process. Example: Uq = u2, u3 , u4 , and t9 can be safely pruned. u u u u4 t7 Cuboid Skyline u1 u2 u3 u4 u1 , u2 u1 , u3 t1 , t5 , t9 t5 , t9 t5 , t7 , t4 t6 t t t t t t June 29, 2006 SIGMOD 2006, Chicago, IL

16 Querying CSC LEMMA 2 (Local Comparison): To check a candidate t in a cuboid V  Uq, we only need to compare t with the objects within the same cuboid. Example: Uq = u2, u3 , u4 , and the skyline of Uq is {t5, t6}. No comparison is needed for t6. And t5, t7, t4 are only locally compared to each other. u u u u4 Cuboid Skyline t u1 t7 t u2 t6 t u3 t6 t u4 t5 , t7 , t4 t u1 , u2 t5 , t9 t u1 , u3 t1 , t5 , t9 June 29, 2006 SIGMOD 2006, Chicago, IL

17 Updating CSC Intuitions:
Not all updates of objects need to access the dataset. Not all updates of objects need to re-compute the skyline of a cuboid. These intuitions are supported by our theorems. D: full-space; sky(D): full-space skyline. t: object before update; tnew: object after update. t  sky(D) No dataset (disk) access tnew  sky(D) tnew  sky(D) No cuboid computation. Existing CSC objects are not changed. Existing CSC objects may be removed or move to other cuboids. May access dataset (disk) t  sky(D) Insert new skyline objects Considering the proportion of full-space skyline objects in the whole dataset, the above covers most cases of the updates June 29, 2006 SIGMOD 2006, Chicago, IL

18 Updating CSC t  sky(D) and tnew  sky(D) Key points:
Compare tnew with existing full-space skyline objects (sky(D)). mss(tnew) is determined by any dominating object in sky(D). u u u u4 Minimum Subspaces Cuboid Skyline t t u1, u3 u1 t7 t t u4, u1, u2, u1, u3 u2 t6 t t u2, u3 u3 t6 t t u1, u4 u4 t5 , t7 , t4 t t u4 u1 , u2 t5 , t9 u1 , u3 t1 , t5 , t9 t t u1, u2, u1, u3 June 29, 2006 SIGMOD 2006, Chicago, IL

19 Updating CSC t  sky(D) and tnew  sky(D) Key points:
Existing objects may be removed or move to super-set cuboids. Determine mss(tnew) is not intuitive in this case. A new recursion-based approach is proposed to avoid unnecessary computations. u u u u4 t u4 t u1, u2, u1, u3 t u1, u4 t u1, u3 t u4, u1, u2, u1, u3 t u2, u3 Minimum Subspaces Cuboid Skyline t u1 t7 , t10 t u2 t6 t u3 t6 , t10 t u4 t5 , t7 , t4 t u1 , u2 t5 , t9 t u1 , u3 t1 , t5 , t9 t t u1, u3 June 29, 2006 SIGMOD 2006, Chicago, IL

20 Outline Background and Motivations The Compressed Skycube
Experimental Results Conclusions June 29, 2006 SIGMOD 2006, Chicago, IL

21 Storage Comparison Settings:
Dimensionality (Full-space) – [4, 8]; default = 6. Cardinality – [100K, 500K]; default = 300K. Distribution: Independent, Corr, Anti-Corr. Logarithmic Scale! June 29, 2006 SIGMOD 2006, Chicago, IL

22 Query Performance Queries on the complete skycube do not involve computations, their time is not reported. This set of experiments verifies that the query response of the CSC is indeed very fast. June 29, 2006 SIGMOD 2006, Chicago, IL

23 Update Performance General update
Updates are from random objects in the whole dataset. Skycube is re-computed from scratch. Full-space skyline update. Updates are from random full-space skyline objects. Skycube is re-computed from existing skylines plus new candidates. June 29, 2006 SIGMOD 2006, Chicago, IL

24 Outline Background and Motivations The Compressed Skycube
Experimental Results Conclusions June 29, 2006 SIGMOD 2006, Chicago, IL

25 Conclusions We addressed the update support of the skycube in dynamic environment, and provided an efficient and scalable solution for online skyline query system. We proposed a compact structure, the compressed skycube, with about 10% disk space of the complete skycube and fast query response. We proposed an object-aware update scheme, such that different updates trigger different amount of computation. Our CSC outperforms the Skycube in update by several orders of magnitude. June 29, 2006 SIGMOD 2006, Chicago, IL

26 Thank you! Tian Xia and Donghui Zhang. Refreshing the Sky: the Compressed Skycube with Efficient Support for Frequent Updates. SIGMOD 2006. Questions? June 29, 2006 SIGMOD 2006, Chicago, IL


Download ppt "Tian Xia and Donghui Zhang Northeastern University"

Similar presentations


Ads by Google