Efficient Computation of the Skyline Cube Yidong Yuan School of Computer Science & Engineering The University of New South Wales & NICTA Sydney, Australia.

Slides:



Advertisements
Similar presentations
Ken C. K. Lee, Baihua Zheng, Huajing Li, Wang-Chien Lee VLDB 07 Approaching the Skyline in Z Order 1.
Advertisements

1 Weiren Yu 1,2, Xuemin Lin 1, Wenjie Zhang 1 1 University of New South Wales 2 NICTA, Australia Towards Efficient SimRank Computation over Large Networks.
Finding Skyline Nodes in Large Networks. Evaluation Metrics:  Distance from the query node. (John)  Coverage of the Query Topics. (Big Data, Cloud Computing,
Finding the Sites with Best Accessibilities to Amenities Qianlu Lin, Chuan Xiao, Muhammad Aamir Cheema and Wei Wang University of New South Wales, Australia.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
VLDB 2011 Pohang University of Science and Technology (POSTECH) Republic of Korea Jongwuk Lee, Seung-won Hwang VLDB 2011.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
The Skyline Operator (Stephan Borzsonyi, Donald Kossmann, Konrad Stocker) Presenter: Shehnaaz Yusuf March 2005.
LOGO Association Rule Lecturer: Dr. Bo Yuan
Maintaining Sliding Widow Skylines on Data Streams.
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Click to edit Present’s Name SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries Shiyu Yang 1, Muhammad Aamir Cheema 2,1, Xuemin.
Connected Substructure Similarity Search Haichuan Shang The University of New South Wales & NICTA, Australia Joint Work: Xuemin Lin (The University of.
CircularTrip: An Effective Algorithm for Continuous kNN Queries Muhammad Aamir Cheema Database Research Group, The School of Computer Science and Engineering,
Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.
July 29HDMS'08 Caching Dynamic Skyline Queries D. Sacharidis 1, P. Bouros 1, T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.
Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,
1 Continuous k-dominant Skyline Query Processing Presented by Prasad Sriram Nilu Thakur.
A Unified Approach for Computing Top-k Pairs in Multidimensional Space Presented By: Muhammad Aamir Cheema 1 Joint work with Xuemin Lin 1, Haixun Wang.
Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.
Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm.
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
1 Efficient Algorithms for Optimal Location Queries in Road Networks Zitong Chen (Sun Yat-Sen University) Yubao Liu (Sun Yat-Sen University) Raymond Chi-Wing.
Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces Jian Pei # Wen Jin # Martin Ester # Yufei Tao + # Simon Fraser University,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey,
Click to edit Present’s Name Xiaoyang Zhang 1, Jianbin Qin 1, Wei Wang 1, Yifang Sun 1, Jiaheng Lu 2 HmSearch: An Efficient Hamming Distance Query Processing.
Efficient Methods for Data Cube Computation and Data Generalization
1 Progressive Computation of Constrained Subspace Skyline Queries Evangelos Dellis 1 Akrivi Vlachou 1 Ilya Vladimirskiy 1 Bernhard Seeger 1 Yannis Theodoridis.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.
Research and Practice at University of Queensland Wei Lu ( 卢卫 ) 2/19/2009.
Efficient Computation of Reverse Skyline Queries VLDB 2007.
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
Influence Zone: Efficiently Processing Reverse k Nearest Neighbors Queries Presented By: Muhammad Aamir Cheema Joint work with Xuemin Lin, Wenjie Zhang,
1 Top-k Dominating Queries DB seminar Speaker: Ken Yiu Date: 25/05/2006.
PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. Nov 12, 2004.
Efficient Processing of Top-k Spatial Preference Queries
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
DB Seminar Schedule Seminar Schedule ================================================================= Chui Chun Kit30/11/07 Gong Jian Jim7/12/07 Loo Kin.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
Presented by Ho Wai Shing
The σ-neighborhood skyline queries Chen, Yi-Chung; LEE, Chiang. The σ-neighborhood skyline queries. Information Sciences, 2015, 322: 張天彥 2015/12/05.
Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.
Efficient Computation of Combinatorial Skyline Queries Author: Yu-Chi Chung, I-Fang Su, and Chiang Lee Source: Information Systems, 38(2013), pp
Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.
Answering Top-k Queries with Multi-Dimensional Selections: The Ranking Cube Approach Dong Xin, Jiawei Han, Hong Cheng, Xiaolei Li Department of Computer.
Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.
S CALABLE S KYLINE C OMPUTATION U SING O BJECT - BASED S PACE P ARTITIONING Shiming Zhang Nikos Mamoulis David W. Cheung sigmod
HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Click to edit Present’s Name AP-Tree: Efficiently Support Continuous Spatial-Keyword Queries Over Stream Xiang Wang 1*, Ying Zhang 2, Wenjie Zhang 1, Xuemin.
Tian Xia and Donghui Zhang Northeastern University
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Stochastic Skyline Operator
TT-Join: Efficient Set Containment Join
Sofian Maabout University of Bordeaux. CNRS
Probabilistic n-of-N Skyline Computation over Uncertain Data Streams
Efficient Subgraph Similarity All-Matching
Publishing in Top Venues
Uncertain Data Mobile Group 报告人:郝兴.
Relaxing Join and Selection Queries
The Skyline Query in Databases Which Objects are the Most Important?
Efficient Processing of Top-k Spatial Preference Queries
Wei Wang University of New South Wales, Australia
Presentation transcript:

Efficient Computation of the Skyline Cube Yidong Yuan School of Computer Science & Engineering The University of New South Wales & NICTA Sydney, Australia Joint Work: Xuemin Lin (UNSW), Qing Liu (UNSW), Wei Wang (UNSW), Jeffrey Xu Yu (CUHK), Qing Zhang (UNSW & CSIRO)

VLDB 2005Yidong Outline  Introduction Skycube Computation Techniques Experiments Summary

VLDB 2005Yidong Skyline Query A real estate example P 5 P 1 skyline returns data points not dominated by others price (100K)distage… P1P1 335… P2P2 511… P3P3 144… P4P4 452… P5P5 223… Properties and Values Skyline on price & dist Skyline on price & age P1P1 P3P3 P5P5 P4P4 P2P2 price age P4P4 P3P3 P5P5 P1P1 P2P2 price dist (x 1, x 2, …, x d ) (y 1, y 2, …, y d )   i, x i  y i & ∃ k, x k <y k

VLDB 2005Yidong Skyline Cube Skycube Skyline on price & dist & age Skyline on price & dist Skyline on price & age …… A union of skyline results of all the non-empty subsets of d-dimensional set (2 d - 1) Lattice Structure of a Skycube Dataset Skycube Example ABC P1P1 335 P2P2 511 P3P3 144 P4P4 452 P5P5 223 skyline ABC{P 2, P 3, P 4, P 5 } AB{P 2, P 3, P 5 } AC{P 2, P 3, P 4, P 5 } BC{P 2 } A{P 3 } B{P 2 } C ABC ABACBC AB C

VLDB 2005Yidong Motivation How to compute Skycube efficiently? existing skyline techniques are applicable no sharing computation  Not efficient!

VLDB 2005Yidong Motivation (cont.) nested-loop-based alg. BNL [ICDE 01] redundant comparison  Not efficient! SFS [ICDE 03] : presort the dataset  keep the candidate list minimum repeated sorting  Not efficient! P4P4 P3P3 P5P5 P1P1 P2P2 A B Candidate List Comparison of Skyline on A Comparison of Skyline on A and B P1P1  -- P2P2 P1P1 P 1 (A) vs. P 2 (A) P 1 (B) vs. P 2 (B) ……

VLDB 2005Yidong Motivation (cont.) divide-and-conquer-based alg. (DC [ICDE 01] ) repeat same divide/merge steps  Not efficient! P4P4 P3P3 P5P5 P1P1 P2P2 A B mAmA m’’ A m’ A P4P4 P3P3 P5P5 P1P1 P2P2 A BC mAmA m’’ A m’ A Divide Step of Skyline on A and B Divide Step of Skyline on A, B, and C

VLDB 2005Yidong Outline Introduction  Skycube Computation Techniques Bottom-Up Skycube Algorithm (BUS) Top-Down Skycube Algorithm (TDS) Experiments Summary

VLDB 2005Yidong Property of Skycube Distinct Value Condition no two data points have same value on the same dimension SKY U (S): skyline on sub-dimension set U SKY U (S)  SKY V (S)  U  V General Case Keep track of the “bad guys”

VLDB 2005Yidong Basic Idea compute the Skycube in a level-wise and bottom-up manner each skyline is computed by a nested-loop-based algorithm ABC ABACBC AB C

VLDB 2005Yidong Sharing Strategies share-results: SKY U (S)  SKY V (S) reduce the size of input reduce the # of dominance test share-sorting: sort the dataset on each dimension keep the candidate list minimum reduce the # of sorting from 2 d – 1 to d AB AB

VLDB 2005Yidong Filtering Effective Dominance Test filter function:  p  = sum of p’s coordinates no false negative:  p    q   q does not dominate p maintain the candidate list in a non-decreasing order of filtering values (e.g. avl-tree) P4P4 P3P3 P5P5 P1P1 P2P2 A B Sort on B P2P2 P5P5 P1P1 P3P3 P4P4  AB  p  Candidate List Comparison (without filter) Comparison (with filter) P5P5 P2P2 P 2 (A) vs. P 5 (A) P 2 (B) vs. P 5 (B)  AB  P 2  vs.  AB  P 5  Skyline on A and B

VLDB 2005Yidong DC Algorithm P3P3 A B P5P5 P4P4 P1P1 P2P2 P3P3 A B P5P5 P4P4 P1P1 P2P2 mAmA S1S1 S2S2 P3P3 A B P5P5 P1P1 P2P2 mAmA mBmB S 12 S 22 S 11 S 21 Divide Step Merge Step

VLDB 2005Yidong Sharing Opportunities share-partitioning A P1P1 P3P3 P4P4 P5P5 P2P2 B mAmA S1S1 S2S2 A P1P1 P3P3 P4P4 P5P5 P2P2 BC mAmA S1S1 S2S2 skyline on A and B skyline on A, B, and C mimi mjmj …… …… mimi mjmj …… ……

VLDB 2005Yidong Sharing Opportunities (cont.) share-merging skyline on A and Bskyline on A, B, and C {P 3, P 5 } {P 1, P 2 } B {P 3, P 5 }{P 1, P 2, P 4 } BC {P 3, P 5 } {{P 1, P 2 }, {P 4 }} BC {P 3, P 5 } {P 1, P 2 } BC {P 3, P 5 } {P 1, P 2 } B {P 3, P 5 }above result C decompose merge step P3P3 A BC P5P5 P1P1 P2P2 mAmA S1S1 S2S2 {P 3, P 5 } {P 4 } BC P3P3 A B P5P5 P1P1 P2P2 mAmA S1S1 S2S2 P4P4

VLDB 2005Yidong TDS Algorithm Basic Idea compute skylines on a path simultaneously find a minimal set of paths share-parent: using parent’s skyline result as the input ABC ABACBC AB C AC ABC AB AB C S SKY ABC (S) ABC AB A

VLDB 2005Yidong Outline Introduction Skycube Computation Techniques  Experiments Summary

VLDB 2005Yidong Experiment Setting Algorithms (* our sharing strategies applied) BNLS: BNL-Skycube algorithm * SFSS: SFS-Skycube algorithm * DCS: DC-Skycube algorithm * BUS: Bottom-Up Skycube algorithm TDS: Top-Down Skycube algorithm Datasetcorrelated, independent, anti-correlated Dimensionality d  [4, 10] Cardinality n  [100k, 500k]

VLDB 2005Yidong Effect of Dimensionality independent Dimensionality (n = 500k)

VLDB 2005Yidong Effect of Dimensionality (cont.) correlated anti-correlated Dimensionality (n = 500k)

VLDB 2005Yidong Effect of Cardinality anti-correlated Cardinality (d = 8) x100K

VLDB 2005Yidong Effect of Duplicate Values independent (d = 8)

VLDB 2005Yidong Outline Introduction Skycube Computation Techniques Experiments  Summary

VLDB 2005Yidong Summary A novel concept –– Skycube Skycube computation Techniques Bottom-Up Skycube algorithm share-results, share-sorting Top-Down Skycube algorithm share-partition-and-merging, share-parent Future Work I/O based techniques multiple skyline queries

VLDB 2005Yidong Q&A Thank you.

VLDB 2005Yidong Preliminaries Existing Skyline Computation Algorithms nested-loop-based Block-Nested-Loop (BNL) algorithm [BKS, ICDE 01] Sort-Filter-Skyline (SFS) algorithm [CGG+, ICDE 03] divide-and-conquer-based Divide-and-Conquer (DC) algorithm [BKS, ICDE 01] index-based Bitmap, Index-Method [TEO, VLDB 01] R-tree Index Based [KRR, VLDB 02; PTF+, SIGMOD 03]

VLDB 2005Yidong Preliminaries –– BNL and SFS Algorithms BNL algorithm SFS algorithm entropy value (indicator of the dominance power) pre-sort the dataset (e.g., {P 5, P 2, P 3, P 1, P 4 }) P3P3 A B P5P5 P4P4 P1P1 P2P2 Current Cand. ListResults P1P1  P1P1 P2P2 P1P1 P 1, P 2 P3P3 P 1, P 2, P 3 P4P4 P5P5 P 2, P 3, P 5

VLDB 2005Yidong Preliminaries –– DC Algorithm P3P3 A B P5P5 P4P4 P1P1 P2P2 P3P3 A B P5P5 P4P4 P1P1 P2P2 mAmA S1S1 S2S2 P3P3 A B P5P5 P4P4 P1P1 P2P2 mAmA mBmB S 12 S 22 S 11 S 21 Divide Step Merge Step

VLDB 2005Yidong General Case Issue: SKY U (S)  SKY V (S) does not necessarily hold Solution share-results: re-examine SKY U (S) on V SKY B (S) = {P 3, P 4, P 5 } SKY AB (S) = {P 3 } P3P3 A B P5P5 P4P4 P1P1 P2P2

VLDB 2005Yidong Motivation (cont.) other techniques Index method [VLDB 01] R-tree based index [VLDB 02; SIGMOD 03] Goal Maximizing sharing computation!   pre-computation (e.g. index) is not reusable repeat pre-computation Not efficient!