Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Computation of the Skyline Cube Yidong Yuan School of Computer Science & Engineering The University of New South Wales & NICTA Sydney, Australia.

Similar presentations


Presentation on theme: "Efficient Computation of the Skyline Cube Yidong Yuan School of Computer Science & Engineering The University of New South Wales & NICTA Sydney, Australia."— Presentation transcript:

1 Efficient Computation of the Skyline Cube Yidong Yuan School of Computer Science & Engineering The University of New South Wales & NICTA Sydney, Australia Joint Work: Xuemin Lin (UNSW), Qing Liu (UNSW), Wei Wang (UNSW), Jeffrey Xu Yu (CUHK), Qing Zhang (UNSW & CSIRO)

2 VLDB 2005Yidong Yuan @DBG.UNSW2 Outline  Introduction Skycube Computation Techniques Experiments Summary

3 VLDB 2005Yidong Yuan @DBG.UNSW3 Skyline Query A real estate example P 5 P 1 skyline returns data points not dominated by others price (100K)distage… P1P1 335… P2P2 511… P3P3 144… P4P4 452… P5P5 223… Properties and Values Skyline on price & dist Skyline on price & age P1P1 P3P3 P5P5 P4P4 P2P2 price age P4P4 P3P3 P5P5 P1P1 P2P2 price dist (x 1, x 2, …, x d ) (y 1, y 2, …, y d )   i, x i  y i & ∃ k, x k <y k

4 VLDB 2005Yidong Yuan @DBG.UNSW4 Skyline Cube Skycube Skyline on price & dist & age Skyline on price & dist Skyline on price & age …… A union of skyline results of all the non-empty subsets of d-dimensional set (2 d - 1) Lattice Structure of a Skycube Dataset Skycube Example ABC P1P1 335 P2P2 511 P3P3 144 P4P4 452 P5P5 223 skyline ABC{P 2, P 3, P 4, P 5 } AB{P 2, P 3, P 5 } AC{P 2, P 3, P 4, P 5 } BC{P 2 } A{P 3 } B{P 2 } C ABC ABACBC AB C

5 VLDB 2005Yidong Yuan @DBG.UNSW5 Motivation How to compute Skycube efficiently? existing skyline techniques are applicable no sharing computation  Not efficient!

6 VLDB 2005Yidong Yuan @DBG.UNSW6 Motivation (cont.) nested-loop-based alg. BNL [ICDE 01] redundant comparison  Not efficient! SFS [ICDE 03] : presort the dataset  keep the candidate list minimum repeated sorting  Not efficient! P4P4 P3P3 P5P5 P1P1 P2P2 A B Candidate List Comparison of Skyline on A Comparison of Skyline on A and B P1P1  -- P2P2 P1P1 P 1 (A) vs. P 2 (A) P 1 (B) vs. P 2 (B) ……

7 VLDB 2005Yidong Yuan @DBG.UNSW7 Motivation (cont.) divide-and-conquer-based alg. (DC [ICDE 01] ) repeat same divide/merge steps  Not efficient! P4P4 P3P3 P5P5 P1P1 P2P2 A B mAmA m’’ A m’ A P4P4 P3P3 P5P5 P1P1 P2P2 A BC mAmA m’’ A m’ A Divide Step of Skyline on A and B Divide Step of Skyline on A, B, and C

8 VLDB 2005Yidong Yuan @DBG.UNSW8 Outline Introduction  Skycube Computation Techniques Bottom-Up Skycube Algorithm (BUS) Top-Down Skycube Algorithm (TDS) Experiments Summary

9 VLDB 2005Yidong Yuan @DBG.UNSW9 Property of Skycube Distinct Value Condition no two data points have same value on the same dimension SKY U (S): skyline on sub-dimension set U SKY U (S)  SKY V (S)  U  V General Case Keep track of the “bad guys”

10 VLDB 2005Yidong Yuan @DBG.UNSW10 Basic Idea compute the Skycube in a level-wise and bottom-up manner each skyline is computed by a nested-loop-based algorithm ABC ABACBC AB C

11 VLDB 2005Yidong Yuan @DBG.UNSW11 Sharing Strategies share-results: SKY U (S)  SKY V (S) reduce the size of input reduce the # of dominance test share-sorting: sort the dataset on each dimension keep the candidate list minimum reduce the # of sorting from 2 d – 1 to d AB AB

12 VLDB 2005Yidong Yuan @DBG.UNSW12 Filtering Effective Dominance Test filter function:  p  = sum of p’s coordinates no false negative:  p    q   q does not dominate p maintain the candidate list in a non-decreasing order of filtering values (e.g. avl-tree) P4P4 P3P3 P5P5 P1P1 P2P2 A B Sort on B P2P2 P5P5 P1P1 P3P3 P4P4  AB  p  64659 Candidate List Comparison (without filter) Comparison (with filter) P5P5 P2P2 P 2 (A) vs. P 5 (A) P 2 (B) vs. P 5 (B)  AB  P 2  vs.  AB  P 5  Skyline on A and B

13 VLDB 2005Yidong Yuan @DBG.UNSW13 DC Algorithm P3P3 A B P5P5 P4P4 P1P1 P2P2 P3P3 A B P5P5 P4P4 P1P1 P2P2 mAmA S1S1 S2S2 P3P3 A B P5P5 P1P1 P2P2 mAmA mBmB S 12 S 22 S 11 S 21 Divide Step Merge Step

14 VLDB 2005Yidong Yuan @DBG.UNSW14 Sharing Opportunities share-partitioning A P1P1 P3P3 P4P4 P5P5 P2P2 B mAmA S1S1 S2S2 A P1P1 P3P3 P4P4 P5P5 P2P2 BC mAmA S1S1 S2S2 skyline on A and B skyline on A, B, and C mimi mjmj …… …… mimi mjmj …… ……

15 VLDB 2005Yidong Yuan @DBG.UNSW15 Sharing Opportunities (cont.) share-merging skyline on A and Bskyline on A, B, and C {P 3, P 5 } {P 1, P 2 } B {P 3, P 5 }{P 1, P 2, P 4 } BC {P 3, P 5 } {{P 1, P 2 }, {P 4 }} BC {P 3, P 5 } {P 1, P 2 } BC {P 3, P 5 } {P 1, P 2 } B {P 3, P 5 }above result C decompose merge step P3P3 A BC P5P5 P1P1 P2P2 mAmA S1S1 S2S2 {P 3, P 5 } {P 4 } BC P3P3 A B P5P5 P1P1 P2P2 mAmA S1S1 S2S2 P4P4

16 VLDB 2005Yidong Yuan @DBG.UNSW16 TDS Algorithm Basic Idea compute skylines on a path simultaneously find a minimal set of paths share-parent: using parent’s skyline result as the input ABC ABACBC AB C AC ABC AB AB C S SKY ABC (S) ABC AB A

17 VLDB 2005Yidong Yuan @DBG.UNSW17 Outline Introduction Skycube Computation Techniques  Experiments Summary

18 VLDB 2005Yidong Yuan @DBG.UNSW18 Experiment Setting Algorithms (* our sharing strategies applied) BNLS: BNL-Skycube algorithm * SFSS: SFS-Skycube algorithm * DCS: DC-Skycube algorithm * BUS: Bottom-Up Skycube algorithm TDS: Top-Down Skycube algorithm Datasetcorrelated, independent, anti-correlated Dimensionality d  [4, 10] Cardinality n  [100k, 500k]

19 VLDB 2005Yidong Yuan @DBG.UNSW19 Effect of Dimensionality independent Dimensionality (n = 500k)

20 VLDB 2005Yidong Yuan @DBG.UNSW20 Effect of Dimensionality (cont.) correlated anti-correlated Dimensionality (n = 500k)

21 VLDB 2005Yidong Yuan @DBG.UNSW21 Effect of Cardinality anti-correlated Cardinality (d = 8) x100K

22 VLDB 2005Yidong Yuan @DBG.UNSW22 Effect of Duplicate Values independent (d = 8)

23 VLDB 2005Yidong Yuan @DBG.UNSW23 Outline Introduction Skycube Computation Techniques Experiments  Summary

24 VLDB 2005Yidong Yuan @DBG.UNSW24 Summary A novel concept –– Skycube Skycube computation Techniques Bottom-Up Skycube algorithm share-results, share-sorting Top-Down Skycube algorithm share-partition-and-merging, share-parent Future Work I/O based techniques multiple skyline queries

25 VLDB 2005Yidong Yuan @DBG.UNSW25 Q&A Thank you.

26 VLDB 2005Yidong Yuan @DBG.UNSW26 Preliminaries Existing Skyline Computation Algorithms nested-loop-based Block-Nested-Loop (BNL) algorithm [BKS, ICDE 01] Sort-Filter-Skyline (SFS) algorithm [CGG+, ICDE 03] divide-and-conquer-based Divide-and-Conquer (DC) algorithm [BKS, ICDE 01] index-based Bitmap, Index-Method [TEO, VLDB 01] R-tree Index Based [KRR, VLDB 02; PTF+, SIGMOD 03]

27 VLDB 2005Yidong Yuan @DBG.UNSW27 Preliminaries –– BNL and SFS Algorithms BNL algorithm SFS algorithm entropy value (indicator of the dominance power) pre-sort the dataset (e.g., {P 5, P 2, P 3, P 1, P 4 }) P3P3 A B P5P5 P4P4 P1P1 P2P2 Current Cand. ListResults P1P1  P1P1 P2P2 P1P1 P 1, P 2 P3P3 P 1, P 2, P 3 P4P4 P5P5 P 2, P 3, P 5

28 VLDB 2005Yidong Yuan @DBG.UNSW28 Preliminaries –– DC Algorithm P3P3 A B P5P5 P4P4 P1P1 P2P2 P3P3 A B P5P5 P4P4 P1P1 P2P2 mAmA S1S1 S2S2 P3P3 A B P5P5 P4P4 P1P1 P2P2 mAmA mBmB S 12 S 22 S 11 S 21 Divide Step Merge Step

29 VLDB 2005Yidong Yuan @DBG.UNSW29 General Case Issue: SKY U (S)  SKY V (S) does not necessarily hold Solution share-results: re-examine SKY U (S) on V SKY B (S) = {P 3, P 4, P 5 } SKY AB (S) = {P 3 } P3P3 A B P5P5 P4P4 P1P1 P2P2

30 VLDB 2005Yidong Yuan @DBG.UNSW30 Motivation (cont.) other techniques Index method [VLDB 01] R-tree based index [VLDB 02; SIGMOD 03] Goal Maximizing sharing computation!   pre-computation (e.g. index) is not reusable repeat pre-computation Not efficient!


Download ppt "Efficient Computation of the Skyline Cube Yidong Yuan School of Computer Science & Engineering The University of New South Wales & NICTA Sydney, Australia."

Similar presentations


Ads by Google