Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.

Slides:



Advertisements
Similar presentations
Ken C. K. Lee, Baihua Zheng, Huajing Li, Wang-Chien Lee VLDB 07 Approaching the Skyline in Z Order 1.
Advertisements

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Jiang Chen Columbia University Ke Yi HKUST. Motivation  Uncertain data naturally arises in many applications: sensor data, fuzzy data integration, data.
Best-Effort Top-k Query Processing Under Budgetary Constraints
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Maintaining Sliding Widow Skylines on Data Streams.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
SUPPORTING TOP-K QUERIES IN RELATIONAL DATABASES. PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, MARCH 2004 Sowmya Muniraju.
July 29HDMS'08 Caching Dynamic Skyline Queries D. Sacharidis 1, P. Bouros 1, T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Evaluating Top-k Queries over Web-Accessible Databases Nicolas Bruno Luis Gravano Amélie Marian Columbia University.
Reaching the Top-k of the Skyline: A efficient Indexed Algorithm for Top-k Skyline Queries Marlene Goncalves and María-Esther Vidal Universidad Simón Bolívar,
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign.
Da Yan and Wilfred Ng The Hong Kong University of Science and Technology.
SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Nag Prajval B.C.
K-Hit Query: Top-k Query Processing with Probabilistic Utility Function SIGMOD2015 Peng Peng, Raymond C.-W. Wong CSE, HKUST 1.
Histograms for Selectivity Estimation
PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski.
Google’s Deep-Web Crawl By Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy August 30, 2008 Speaker : Sahana Chiwane.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Efficient Processing of Top-k Spatial Preference Queries
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
Joseph M. Hellerstein Peter J. Haas Helen J. Wang Presented by: Calvin R Noronha ( ) Deepak Anand ( ) By:
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
Clustering of Uncertain data objects by Voronoi- diagram-based approach Speaker: Chan Kai Fong, Paul Dept of CS, HKU.
CS4432: Database Systems II Query Processing- Part 2.
The σ-neighborhood skyline queries Chen, Yi-Chung; LEE, Chiang. The σ-neighborhood skyline queries. Information Sciences, 2015, 322: 張天彥 2015/12/05.
A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES
Online Interval Skyline Queries on Time Series ICDE 2009.
Bin Jiang, Jian Pei ICDE 2009 Online Interval Skyline Queries on Time Series 1.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
CS4432: Database Systems II
HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
15.1 – Introduction to physical-Query-plan operators
CACTUS-Clustering Categorical Data Using Summaries
CS 440 Database Management Systems
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Chapter 12: Query Processing
Preference Query Evaluation Over Expensive Attributes
Chapter 15 QUERY EXECUTION.
Query Execution Presented by Khadke, Suvarna CS 257
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Popular Ranking Algorithms
Probabilistic Data Management
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Chapter 12 Query Processing (1)
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo

HKU CS DB Seminar2 Skyline A new operator (like “ORDER BY”) in database systems A set of data points that is not dominated by any other data points

HKU CS DB Seminar3 Example Find some good places for us to hold the next DB Seminar Good  Closer to HKU (Min) Good  Larger Area (Max) Return those homes that are not worse than any others in ALL DIMENSIONS Dataset (Table Homes): HomeDistance from HKUArea (m 2 ) K.K Loo1 km10 Ben9 km100 Ivy5 km2 Nikos8 km250

HKU CS DB Seminar4 Outline Introduction to skyline queries Non-progressive skylining on the Web Basic Distributed Skyline Algorithm (BDS) Progressive skylining on the Web Experimental result Conclusion and future directions

HKU CS DB Seminar5 Skylining on the Web One distributed site holds one attribute Attribute “Distance from HKU” stored at HKU Attribute “Area (m 2 )” stored at Purdue HomeDistance from HKU K.K Loo1 km Ben9 km Ivy5 km Nikos8 km HomeArea (m 2 ) K.K Loo10 Ben100 Ivy2 Nikos250 Purdue HKU Interne t

HKU CS DB Seminar6 Accessing interfaces HomeDistance from HKU K.K Loo1 km Ben9 km Ivy5 km Nikos8 km HomeArea (m 2 ) K.K Loo10 Ben100 Ivy2 Nikos250 Purdue HKU Interne t Interfaces of Web-accessible sites: 1. Sorted Access (SA): HKU  getNext(): returns rank 1 st data tuple “K.K Loo” HKU  getNext()  2 nd “Ivy”, HKU  getNext()  3 rd “Nikos”, …. 2. Random Access (RA): Purdue  getScore(“K.K. Loo”)  10 m 2 HKU  getScore(“Nikos”)  8 km

HKU CS DB Seminar7 Basic distributed skyline algorithm (EDBT 04) Phase 1 – find all possible skyline: Perform sorted access on each source 1-by-1 S1  getNext(), S2  getNext(), S3  getNext() S1  getNext(), S2  getNext() …. …. Stop until there is an object which attribute values are all known

HKU CS DB Seminar8 Phase 1 f is the terminating object

HKU CS DB Seminar9 Phase 1 (15 sorted accesses)

HKU CS DB Seminar10 Implication f is the terminating object  Objects that do not appear must be dominated by f

HKU CS DB Seminar11 Phase 2 Find skyline from candidates in phase 1 During sequential scanning of sources, data structures K 1, K 2, K 3, …, Kn are created n is the no. of dimension If source i  getNext() returns a data object d 1. create an entry in Ki 2. update the lower_bound of the source i

HKU CS DB Seminar12 Phase 2: find skyline from candidates Ki A lemma shows that “Objects can only be dominated by objects in the same set Ki”

HKU CS DB Seminar13 Motivations BDS returns skyline results in a batch In practice, it would be useful to return skyline results progressively such that users could adjust their decisions right away Consider the “next DB seminar” skyline example: minimize “Distance from HKU”, maximize “Area” is first returned  From HKU to Nikos’s home needs to take a $50 bus!  Add the “travel-expense” attribute into the skyline query

HKU CS DB Seminar14 Progressive Distributed Skylining (PDS) Goal: Evaluates skyline queries progressively with minimal overhead Overhead: Network/Data source accesses Computational time

HKU CS DB Seminar15 Enable progressiveness To identify a data point belongs to the final skyline or not, we rely on the following lemma (assume the data values are distinct): If a data source Di returns data objects in a strictly monotonic order, an object O retrieved from Di would only be dominated by objects that are retrieved from Di before O

HKU CS DB Seminar16 If an object O is retrieved from a data source by sorted access, we could only need to test if O is dominated by any objects that appears before O in the same source only 2 usages: 1. We don’t need to consider objects appear in other data sources 2. After the test, we can output O as a skyline immediately  O must be a skyline, we do not need to worry about objects appear later would dominate O

HKU CS DB Seminar17 An R-tree approach Build an r-tree Ri for each attribute/data source i involved in the skyline query For each object O retrieved from source i, we check to see if any object in Ri dominates O If no such objects exists, O is a skyline (output it immediately) If some objects dominates O in Ri, O is not a skyline object (O is discarded immediately)

HKU CS DB Seminar18 D3.getNext() the 1 st time SA on D3 returns e e is a skyline (no object is better than e on D3), e(7,4) is projected into r-tree R 3 e(7,4) D1 D2

HKU CS DB Seminar19 D3.getNext() the 2 nd time SA on D3 returns c Construct a query Q(origin, c) on R 3 Q returns no answer  c is a skyline  insert c into R 3 e(7,4) c(2,5)

HKU CS DB Seminar20 D3.getNext() the 3 rd time SA on D3 returns j Construct a query Q(origin, j) on R 3 Q returns c as an answer  j is dominated by c  discard j e(7,4) c(2,5) j(6,10)

HKU CS DB Seminar21 D3.getNext() the 4 th time SA on D3 returns f, construct a query Q(origin, f) on R 3 Q returns no answer  f is a skyline Delete e after insertion of f to make the R-tree more compact and efficient e(7,4) c(2,5)

HKU CS DB Seminar22 The R-tree approach The R-tree is very small in size since it stores skyline objects with highest pruning power Containment query operation is very efficient

HKU CS DB Seminar23 A linear regression based heuristic The R-tree approach enable progressiveness with better efficiency We use a linear regression based heuristic to minimize the number of source accesses during the evaluation process

HKU CS DB Seminar24 A rank based approach 1. We use linear regression to estimate the rank of objects along the process 2. Assume the object with lowest rank is the real terminating object and probe the sources accordingly (rather than round- robin)

HKU CS DB Seminar25 Extensions Evaluation of top-K skyline queries Progress indicator (based on the estimated ranks) An clipart of Kevin Yip

HKU CS DB Seminar26 Experimental results – Number of source accesses

HKU CS DB Seminar27 Experimental results – Number of source accesses Random Distribution Denormalized Domain

HKU CS DB Seminar28 Experimental results – progressive behavior

HKU CS DB Seminar29 Experimental results – progress indicator

HKU CS DB Seminar30 Conclusion and future directions Skyline queries on the Web Return skyline points on-the-fly Future work: Improve the usability of PDS by allowing the users to barter between progressiveness and efficiency Compute skyline from real-time stream data Only 1 data source supports sorted access and the rest support random access only

HKU CS DB Seminar31 References S.Borzonyi, D.Kossmann, K.Stocker, The Skyline Operator, in ICDE D.Kossmann, F.Ramsak, S. Rost, Shooting Stars in the Sky: An Online Algorithm for Skyline Queries, in VLDB W.T.Balke, U.Guntzer, J.X. Zheng, Efficient Distributed Skylining for Web Information Systems, in EDBT 2004