Download presentation
Presentation is loading. Please wait.
Published byOlivia Stanley Modified over 9 years ago
1
Efficient Computation of the Skyline Cube Yidong Yuan School of Computer Science & Engineering The University of New South Wales & NICTA Sydney, Australia Joint Work: Xuemin Lin (UNSW), Qing Liu (UNSW), Wei Wang (UNSW), Jeffrey Xu Yu (CUHK), Qing Zhang (UNSW & CSIRO)
2
VLDB 2005Yidong Yuan @DBG.UNSW2 Outline Introduction Skycube Computation Techniques Experiments Summary
3
VLDB 2005Yidong Yuan @DBG.UNSW3 Skyline Query A real estate example P 5 P 1 skyline returns data points not dominated by others price (100K)distage… P1P1 335… P2P2 511… P3P3 144… P4P4 452… P5P5 223… Properties and Values Skyline on price & dist Skyline on price & age P1P1 P3P3 P5P5 P4P4 P2P2 price age P4P4 P3P3 P5P5 P1P1 P2P2 price dist (x 1, x 2, …, x d ) (y 1, y 2, …, y d ) i, x i y i & ∃ k, x k <y k
4
VLDB 2005Yidong Yuan @DBG.UNSW4 Skyline Cube Skycube Skyline on price & dist & age Skyline on price & dist Skyline on price & age …… A union of skyline results of all the non-empty subsets of d-dimensional set (2 d - 1) Lattice Structure of a Skycube Dataset Skycube Example ABC P1P1 335 P2P2 511 P3P3 144 P4P4 452 P5P5 223 skyline ABC{P 2, P 3, P 4, P 5 } AB{P 2, P 3, P 5 } AC{P 2, P 3, P 4, P 5 } BC{P 2 } A{P 3 } B{P 2 } C ABC ABACBC AB C
5
VLDB 2005Yidong Yuan @DBG.UNSW5 Motivation How to compute Skycube efficiently? existing skyline techniques are applicable no sharing computation Not efficient!
6
VLDB 2005Yidong Yuan @DBG.UNSW6 Motivation (cont.) nested-loop-based alg. BNL [ICDE 01] redundant comparison Not efficient! SFS [ICDE 03] : presort the dataset keep the candidate list minimum repeated sorting Not efficient! P4P4 P3P3 P5P5 P1P1 P2P2 A B Candidate List Comparison of Skyline on A Comparison of Skyline on A and B P1P1 -- P2P2 P1P1 P 1 (A) vs. P 2 (A) P 1 (B) vs. P 2 (B) ……
7
VLDB 2005Yidong Yuan @DBG.UNSW7 Motivation (cont.) divide-and-conquer-based alg. (DC [ICDE 01] ) repeat same divide/merge steps Not efficient! P4P4 P3P3 P5P5 P1P1 P2P2 A B mAmA m’’ A m’ A P4P4 P3P3 P5P5 P1P1 P2P2 A BC mAmA m’’ A m’ A Divide Step of Skyline on A and B Divide Step of Skyline on A, B, and C
8
VLDB 2005Yidong Yuan @DBG.UNSW8 Outline Introduction Skycube Computation Techniques Bottom-Up Skycube Algorithm (BUS) Top-Down Skycube Algorithm (TDS) Experiments Summary
9
VLDB 2005Yidong Yuan @DBG.UNSW9 Property of Skycube Distinct Value Condition no two data points have same value on the same dimension SKY U (S): skyline on sub-dimension set U SKY U (S) SKY V (S) U V General Case Keep track of the “bad guys”
10
VLDB 2005Yidong Yuan @DBG.UNSW10 Basic Idea compute the Skycube in a level-wise and bottom-up manner each skyline is computed by a nested-loop-based algorithm ABC ABACBC AB C
11
VLDB 2005Yidong Yuan @DBG.UNSW11 Sharing Strategies share-results: SKY U (S) SKY V (S) reduce the size of input reduce the # of dominance test share-sorting: sort the dataset on each dimension keep the candidate list minimum reduce the # of sorting from 2 d – 1 to d AB AB
12
VLDB 2005Yidong Yuan @DBG.UNSW12 Filtering Effective Dominance Test filter function: p = sum of p’s coordinates no false negative: p q q does not dominate p maintain the candidate list in a non-decreasing order of filtering values (e.g. avl-tree) P4P4 P3P3 P5P5 P1P1 P2P2 A B Sort on B P2P2 P5P5 P1P1 P3P3 P4P4 AB p 64659 Candidate List Comparison (without filter) Comparison (with filter) P5P5 P2P2 P 2 (A) vs. P 5 (A) P 2 (B) vs. P 5 (B) AB P 2 vs. AB P 5 Skyline on A and B
13
VLDB 2005Yidong Yuan @DBG.UNSW13 DC Algorithm P3P3 A B P5P5 P4P4 P1P1 P2P2 P3P3 A B P5P5 P4P4 P1P1 P2P2 mAmA S1S1 S2S2 P3P3 A B P5P5 P1P1 P2P2 mAmA mBmB S 12 S 22 S 11 S 21 Divide Step Merge Step
14
VLDB 2005Yidong Yuan @DBG.UNSW14 Sharing Opportunities share-partitioning A P1P1 P3P3 P4P4 P5P5 P2P2 B mAmA S1S1 S2S2 A P1P1 P3P3 P4P4 P5P5 P2P2 BC mAmA S1S1 S2S2 skyline on A and B skyline on A, B, and C mimi mjmj …… …… mimi mjmj …… ……
15
VLDB 2005Yidong Yuan @DBG.UNSW15 Sharing Opportunities (cont.) share-merging skyline on A and Bskyline on A, B, and C {P 3, P 5 } {P 1, P 2 } B {P 3, P 5 }{P 1, P 2, P 4 } BC {P 3, P 5 } {{P 1, P 2 }, {P 4 }} BC {P 3, P 5 } {P 1, P 2 } BC {P 3, P 5 } {P 1, P 2 } B {P 3, P 5 }above result C decompose merge step P3P3 A BC P5P5 P1P1 P2P2 mAmA S1S1 S2S2 {P 3, P 5 } {P 4 } BC P3P3 A B P5P5 P1P1 P2P2 mAmA S1S1 S2S2 P4P4
16
VLDB 2005Yidong Yuan @DBG.UNSW16 TDS Algorithm Basic Idea compute skylines on a path simultaneously find a minimal set of paths share-parent: using parent’s skyline result as the input ABC ABACBC AB C AC ABC AB AB C S SKY ABC (S) ABC AB A
17
VLDB 2005Yidong Yuan @DBG.UNSW17 Outline Introduction Skycube Computation Techniques Experiments Summary
18
VLDB 2005Yidong Yuan @DBG.UNSW18 Experiment Setting Algorithms (* our sharing strategies applied) BNLS: BNL-Skycube algorithm * SFSS: SFS-Skycube algorithm * DCS: DC-Skycube algorithm * BUS: Bottom-Up Skycube algorithm TDS: Top-Down Skycube algorithm Datasetcorrelated, independent, anti-correlated Dimensionality d [4, 10] Cardinality n [100k, 500k]
19
VLDB 2005Yidong Yuan @DBG.UNSW19 Effect of Dimensionality independent Dimensionality (n = 500k)
20
VLDB 2005Yidong Yuan @DBG.UNSW20 Effect of Dimensionality (cont.) correlated anti-correlated Dimensionality (n = 500k)
21
VLDB 2005Yidong Yuan @DBG.UNSW21 Effect of Cardinality anti-correlated Cardinality (d = 8) x100K
22
VLDB 2005Yidong Yuan @DBG.UNSW22 Effect of Duplicate Values independent (d = 8)
23
VLDB 2005Yidong Yuan @DBG.UNSW23 Outline Introduction Skycube Computation Techniques Experiments Summary
24
VLDB 2005Yidong Yuan @DBG.UNSW24 Summary A novel concept –– Skycube Skycube computation Techniques Bottom-Up Skycube algorithm share-results, share-sorting Top-Down Skycube algorithm share-partition-and-merging, share-parent Future Work I/O based techniques multiple skyline queries
25
VLDB 2005Yidong Yuan @DBG.UNSW25 Q&A Thank you.
26
VLDB 2005Yidong Yuan @DBG.UNSW26 Preliminaries Existing Skyline Computation Algorithms nested-loop-based Block-Nested-Loop (BNL) algorithm [BKS, ICDE 01] Sort-Filter-Skyline (SFS) algorithm [CGG+, ICDE 03] divide-and-conquer-based Divide-and-Conquer (DC) algorithm [BKS, ICDE 01] index-based Bitmap, Index-Method [TEO, VLDB 01] R-tree Index Based [KRR, VLDB 02; PTF+, SIGMOD 03]
27
VLDB 2005Yidong Yuan @DBG.UNSW27 Preliminaries –– BNL and SFS Algorithms BNL algorithm SFS algorithm entropy value (indicator of the dominance power) pre-sort the dataset (e.g., {P 5, P 2, P 3, P 1, P 4 }) P3P3 A B P5P5 P4P4 P1P1 P2P2 Current Cand. ListResults P1P1 P1P1 P2P2 P1P1 P 1, P 2 P3P3 P 1, P 2, P 3 P4P4 P5P5 P 2, P 3, P 5
28
VLDB 2005Yidong Yuan @DBG.UNSW28 Preliminaries –– DC Algorithm P3P3 A B P5P5 P4P4 P1P1 P2P2 P3P3 A B P5P5 P4P4 P1P1 P2P2 mAmA S1S1 S2S2 P3P3 A B P5P5 P4P4 P1P1 P2P2 mAmA mBmB S 12 S 22 S 11 S 21 Divide Step Merge Step
29
VLDB 2005Yidong Yuan @DBG.UNSW29 General Case Issue: SKY U (S) SKY V (S) does not necessarily hold Solution share-results: re-examine SKY U (S) on V SKY B (S) = {P 3, P 4, P 5 } SKY AB (S) = {P 3 } P3P3 A B P5P5 P4P4 P1P1 P2P2
30
VLDB 2005Yidong Yuan @DBG.UNSW30 Motivation (cont.) other techniques Index method [VLDB 01] R-tree based index [VLDB 02; SIGMOD 03] Goal Maximizing sharing computation! pre-computation (e.g. index) is not reusable repeat pre-computation Not efficient!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.