Online Skyline Queries. Agenda Motivation: Top N vs. Skyline „Classic“ Algorithms –Block-Nested Loop Algorithm –Divide & Conquer Algorithm Online Algorithm.

Slides:



Advertisements
Similar presentations
Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem.
Advertisements

Choosing an Order for Joins
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
The Skyline Operator (Stephan Borzsonyi, Donald Kossmann, Konrad Stocker) Presenter: Shehnaaz Yusuf March 2005.
Implementation of relational operations
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule Stephen D. Bay 1 and Mark Schwabacher 2 1 Institute for.
CMPT 225 Sorting Algorithms Algorithm Analysis: Big O Notation.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
ISAC 教育學術資安資訊分享與分析中心研發專案 The Skyline Operator Stephan B¨orzs¨onyi, Donald Kossmann, Konrad Stocker EDBT
Continuous Intersection Joins Over Moving Objects Rui Zhang University of Melbourne Dan Lin Purdue University Kotagiri Ramamohanarao University of Melbourne.
July 29HDMS'08 Caching Dynamic Skyline Queries D. Sacharidis 1, P. Bouros 1, T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
1 Relational Query Optimization Module 5, Lecture 2.
1 Shooting Stars in the Sky:An Online Algorithm for Skyline Queries 作者: Donald Kossmann Frank Ramsak Steffen Rost 報告:黃士維.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces Jian Pei # Wen Jin # Martin Ester # Yufei Tao + # Simon Fraser University,
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Access Path Selection in a Relational Database Management System Selinger et al.
1 Overview of Query Evaluation Chapter Overview of Query Evaluation  Plan : Tree of R.A. ops, with choice of alg for each op.  Each operator typically.
Database Management 9. course. Execution of queries.
Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey,
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Primary Key, Cluster Key & Identity Loop, Hash & Merge Joins Joe Chang
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13.
Spatial Query Processing Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
CS4432: Database Systems II Query Processing- Part 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
M.Kersten MonetDB, Cracking and recycling Martin Kersten CWI Amsterdam.
Computing & Information Sciences Kansas State University Monday, 03 Nov 2008CIS 560: Database System Concepts Lecture 27 of 42 Monday, 03 November 2008.
HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.
Scalability of Local Image Descriptors Björn Þór Jónsson Department of Computer Science Reykjavík University Joint work with: Laurent Amsaleg (IRISA-CNRS)
Prof. U V THETE Dept. of Computer Science YMA
Indexing Multidimensional Data
Spatial Data Management
Strategies for Spatial Joins
SIMILARITY SEARCH The Metric Space Approach
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Database Management System
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Sorting by Tammy Bailey
Query Processing in Databases Dr. M. Gavrilova
Introduction to Query Optimization
Evaluation of Relational Operations
Sameh Shohdy, Yu Su, and Gagan Agrawal
Query Processing and Optimization
Chapter 15 QUERY EXECUTION.
Evaluation of Relational Operations: Other Operations
Introduction to Database Systems
Database Management Systems (CS 564)
Lecture#12: External Sorting (R&G, Ch13)
Database Applications (15-415) DBMS Internals- Part VII Lecture 19, March 27, 2018 Mohammad Hammoud.
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#13: Query Evaluation
Selected Topics: External Sorting, Join Algorithms, …
Skyline query with R*-Tree: Branch and Bound Skyline (BBS) Algorithm
Implementation of Relational Operations
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
Overview of Query Evaluation: JOINS
Wednesday, 5/8/2002 Hash table indexes, physical operators
Database Systems (資料庫系統)
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Evaluation of Relational Operations: Other Techniques
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Presentation transcript:

Online Skyline Queries

Agenda Motivation: Top N vs. Skyline „Classic“ Algorithms –Block-Nested Loop Algorithm –Divide & Conquer Algorithm Online Algorithm –Motivation –NN Algorithm Summary

Top N Queries Examples: –The five cheapest hotels? –How rich are the top 10 percent on an average? –Increase the salary of the ten best goalies! Top N in SQL (almost) not possible Extended SQL [Carey & Kossmann 1997] Algorithms and optimization techniques [Carey & Kossmann 1997, 1998]

Top N Queries Example: The five cheapest hotels SELECT * FROM Hotels ORDER BY price STOP AFTER 5; Almost all commercial DBMS support this (syntax varies; semantics for ties varies) What happens if you have several criteria?

Nearest Neighbor Search Cheap and close to the beach SELECT * FROM Hotels ORDER BY distance * x + price * y STOP AFTER 5; How to set x and y ?

Skyline Queries Hotels which are close to the beach and cheap. distance price x x x x x xx x x x x x x x x x x x Top 5 Skyline (Pareto Curve) x x x Convex Hull Literatur: Maximum Vector Problem. [Kung et al. 1975]

Syntax of Skyline Queries Additional SKYLINE OF clause [Börszönyi, Kossmann, Stocker 2001] Cheap & close to the beach SELECT * FROM Hotels WHERE city = ´Nassau´ SKYLINE OF distance MIN, price MIN;

Flight Reservation Book flight from Washington DC to San Jose SELECT * FROM Flights WHERE depDate < ´Nov-13´ SKYLINE OF price MIN, distance(27750, dept) MIN, distance(94000, arr) MIN, (`Nov-13` - depDate) MIN;

Visualisation (VR) Skyline of NY (visible buildings) SELECT * FROM Buildings WHERE city = `New York` SKYLINE OF h MAX, x DIFF, z MIN;

Location-based Services Cheap Italian restaurants that are close Query with current location as parameter SELECT * FROM Restaurants WHERE type = `Italian` SKYLINE OF price MIN, d(addr, ?) MIN;

Skyline and Standard SQL Skyline can be expressed as nested Queries SELECT * FROM Hotels h WHERE NOT EXISTS ( SELECT * FROM Hotels WHERE h.price >= price AND h.d >= d AND (h.price > price OR h.d > d)) Such queries are quite frequent in practice The response time is desastrous

Naive Algorithm Nested Loops Compare each point with all other points FOR i=1 TO N D = FALSE; j = 1; WHILE (NOT D) AND (j <= N) D = dominate(a[j], a[i]); j++; END WHILE IF (NOT D) output(a[i]); END FOR

Block Nested-Loops Algorithmn Problem of naive algorithm –N iterations through whole database (database often does not fit into main memory) –Each pair of points is compared twice Block Nested Loops Algorithm –Keep window of incomparable points –If window does not fit in memory, spill points to disk Assessment –N / window iterations through database –No double-comparisons

BNL Input Input: ABCDEFG Window size = 2 A y x D B C F E G StepWindowInputTempOutput 1,2ABCDEFG 3ACDEFG 4-7ACEFG 8 AC 9-11EGAC 12ACEG

Organisation of Window „Self-organizing List“ –Move „Hits“ to the beginning –Reduces CPU cost for comparisons (early stop) „Replacement“ –Maximize the „Volume“ of window –Additional CPU overhead for replacement policy –Less iterations because points in window are a better filter

Divide & Conquer Algorithm [Kung et al. 1975] Idea: –Partition the data into two sets –Apply algorithm recursively to both sets –Merge the results from both sets Magic is in „merge“ Best algorithm for „worst case“ O(n * (log n) (d-2) ) Poor in best case: O(n * (log n) (d-2) ) vs. O(n) Poor performance if DB does not fit in memory

Variants of D&C Algos M-way Partitioning –Partition into M sets (M > 2) –Extended merge algorithm –Optimized „Merge Tree“ –Dramatic reduction in CPU cost Early Skyline – Eliminate points „on-the-fly“ –Saves I/O and CPU costs –(in rare cases: additional CPU costs)

Performance Experiments Three data sets 1. Independent: Coordinaten of points are generated randomly using a uniform distribution 2. Correlated: Points which are good in one dimension are likely to be good in other dimensions, too. 3. Anti-correlated: Points which are good in one dimension are likely to be bad in other dimensions. DB with and 1 mio. points Vary size of memory, vary # of dimensions

Data Sets Independent (Skyline is large) Correlated (Skyline is small) Anti-correlated (Skyline is very large)

“Correlated” BNL is winner for “small” Skylines (best case) 8 dimensions, Skyline: 121 points

“Anti-correlated” Extended D&C is winner for large Skylines (worst case) 8 dimensions, Skyline: points

Summary (so far) Extend RDBMS with special Skyline algos BNL algorithm if Skyline is expected to be small Extended D&C in other cases However, algorithm are not interactive –Possibly, first results returned after 100 secs –User has no control

Online Algorithms: Requirements Immediately return the first results –Give response time guarantees for the first x points Progressive processing –More results, the longer user waits –Algorithm terminates with whole Skyline Fairness; User interaction –User gives hints while the results arrive Correct – never return wrong results (Flackern) Universal; easy to integrate into DB products

Online Skyline Algorithm [Kossmann, Ramsak, Rost 2002] Divide & Conquer Algorithmus –Search for Nearest Neighbor (e.g. with R* tree) –Partition Data space into Bounding Boxes –Search recursively for Nearest Neighbors in Bounding Boxes

Online Skyline Algorithm [Kossmann, Ramsak, Rost 2002] Divide & Conquer Algorithmus –Search for Nearest Neighbor (e.g. with R* tree) –Partition Data space into Bounding Boxes –Search recursively for Nearest Neighbors in Bounding Boxes Correctness - 2 Observations –Every Nearest Neighbor is in the Skyline –Every Nearest Neighbor in a Bounding Box is in the Skyline

NN Algorithm distance price x x xx x x x x x x x x x x x x x

NN Algorithm distance price x x xx x x x x x x x x x x x x x

NN Algorithm distance price x x xx x x x x x x x x x x x x x

NN Algorithm distance price x x xx x x x x x x x x x x x x x

NN Algorithm distance price x x xx x x x x x x x x x x x x x x x x x x

NN Algorithm distance price x x xx x x x x x x x x x x x x x x x x x x

Implementation NN Search with R* tree, UB tree,... –Bounding Boxes can be considered in index lookup –Additional predicates can also be considered –NN is efficient, off-the-shelf DB operation For d > 2, bounding boxes overlap –Duplicate elimination –Merging of Bounding Boxes –Propagation of NNs Algorithm can be used for location-based service –Parameterized search in R* tree

Experimentelle Bewertung M-way D&C NN (prop) NN (hybrid)

User Control distance price x x xx x x x x x x x x x x x x x

User Control distance price x x xx x x x x x x x x x x x x x User clicks into this are! „distance“ is more important than „price“

User Control distance price x x xx x x x x x x x x x x x x x

User Control distance price x x xx x x x x x x x x x x x x x

Related Work Maximum Vector Problem [Kung et al. 1975] –Only good if DB fits into main memory –Good in worst case, poor in good cases Progressive Skyline Computation [Tan et al. 2001] –Extended BNL Algorithm Konvexe Hülle [Yuval 1975] –Unknown how well this works for large DBs Mehrzieloptimierung (z.B.[Papadimitriou 2001]) –Approximate Pareto-Kurve Extended NN Algorithm [Papadias et al. 2003]

Summary Skyline has many applications (Decision Support, Visualisation, …) New „Batch“ and „Online“ Algorithms Special algorithms for Skyline + Join + Top N Future Work: Skyline-Ticker, high Update rates –Continuously display good restaurants in car –Continuously give good car offers