HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo
HKU CSIS DB Seminar Skyline A new operator in database systems Filters out a set of interesting points from a potential large set of data points A data point is interesting if it is not dominated by any other point
HKU CSIS DB Seminar Example Find some good places for us to hold the next DB Seminar Dataset (Table Homes): HomeDistance from HKUArea (m 2 ) Kevin1 km10 Ben9 km100 Felix5 km2 K.K Loo8 km250 Good Close to HKU (Min.) Good Max. Area (Max.) Return those homes that are not worse than any others in ALL DIMENSIONS
HKU CSIS DB Seminar Outline Introduction to Skyline Queries Skyline Operator in SQL Implementation Algorithms Progressive Algorithms Variations of Skyline Queries Experimental Result Conclusion
HKU CSIS DB Seminar The Skyline Operator ICDE 2001 S.Borzonyi, D.Kossmann, K.Stocker 1.Define the skyline operator in databases 2.Extension of SQL for skyline 3.Block-nested-loop Algorithm 4.Divide-and-conquer Algorithm
HKU CSIS DB Seminar Problem Definition Related to: maximum vector problem contour problem convex hull of a data set Assume the whole dataset fits in the memory
HKU CSIS DB Seminar SQL Extensions SKYLINE OF [DISTINCT] d1 [MIN | MAX], … … dm [MIN | MAX] SELECT … FROM … WHERE … GROUP BY … HAVING … SKYLINE OF [DISTINCT] d1 [MIN | MAX], … … dm [MIN | MAX] ORDER BY … d1… dm denote the dimensions participate the Skyline SELECT * FROM HOMES WHERE CITY=‘HK’ SKYLINE OF DIST MIN, AREA MAX;
HKU CSIS DB Seminar Naïve Approach for Skyline 1D skyline is equivalent to computing min, max in SQL Naïve 2D skyline: Sort the data according to the 2 dimensions Compare every tuple with its predecessor Sorting may need in 2 or more passes if the data are not fit into memory use current external sorting techniques
HKU CSIS DB Seminar Naïve 2D HomeDistance from HKUArea Kevin1 km10 Felix5 km2 KK8 km250 Ben9 km100 1.Sort by “Distance” 2.Compare “Felix” with “Kevin” eliminate “Felix” 3.Compare “KK” with “Kevin” incompatible part of skyline 4.Compare “Ben” with “KK” eliminate “Ben”
HKU CSIS DB Seminar Naïve 2D not works for > 2Ds If skyline involves more than 2D, sorting does not work HomeDistance from HKUAreaRent Kevin1 km10$9 Felix5 km2$5 KK8 km250$10 Ben9 km100$9 2D3D Cmp Felix, Kevin eliminated Cmp KK, Kevin part of skyline Cmp Ben, KK eliminated Cmp Felix, Kevin part of skyline Cmp KK, Felix part of skyline Cmp Ben, KK part of skyline No! Ben dominated by Kevin predecessor not work!
HKU CSIS DB Seminar Block-nested-loops Algorithm A straightforward approach: Compare each point p with every other point If p is not dominated part of skyline Scan the data file and keeping a list of candidate skyline points in main memory
HKU CSIS DB Seminar BNL cont. 1. Insert the 1 st data point into the list 2. For each subsequent point p: 1. If p is dominated by any point in the list, it is discarded 2. If p dominates any point in the list, insert it into the list and remove all points dominated by p 3. If p neither dominated, nor dominates any point in the list, inserted it as part of the skyline in the list
HKU CSIS DB Seminar BNL cont. The candidate list is self-organizing: Points that have been dominated other points are moved to the top of list Reduces the number of comparisons E.g. the self-organizing list holdings the partial skyline like: HomeDistance from HKUAreaRent Kevin1 km249$1 K.K8 km250$ … Other skylines which is not as strong as Kevin except a few dimensions
HKU CSIS DB Seminar More on BNL Point 3 in BNL: If p neither dominated, nor dominates any point in the list, inserted it as part of the skyline in the list. If the are no more space in the list, write p on a temporary file on disk. Tuples in tmp file will be further processed in next iteration of algorithm
HKU CSIS DB Seminar More on BNL (cont.) A E F H A B C D E F G H I J Dom. by A Incompatible with A Incompatible with A,E Dom On F, replace F G Incompatible with A,E,G Incompatible with A,E,G,H, but full! Incompatible with A,E,G,H, but full! J has not compare with I I J After 1 st Iteration, A,E,G,H are output as skylines, then clear up the list and treat I,J… as new data set and perform BNL again
HKU CSIS DB Seminar Short summary on BNL Easy to implement Any dimension without using index or sorting Relies on main memory may have many iterations Not adequate for on-line processing it has to read the entire data file before it returns the first skyline point (not progressively…)
HKU CSIS DB Seminar Divide-and-Conquer Algorithm PriceDist )Find the median of some dimension, say price, Price(med)=0.3 2)Split the input into 2 partitions according to Price(med) 3)Compute Skyline S1 in P1( =0.3) respectively by recursive partitioning. [Note: S1 is better than S2 on price] 4)Recursive partitioning until a partition contains very few (or 1) tuples 5)If only a few tuples, find out skyline is very easy 6)Merging the skylines of each partitions by eliminating those S2 which are dominated by S1 [Note: None of the tuples in S1 can be dominated by S2 as all tuples in S1 are better than S2 on price i.e. tuples in UPPER never be eliminated]
HKU CSIS DB Seminar Divide-and-Conquer Algorithm PriceDist )Find the median of some dimension, say price, Price(med)=0.3 2)Split the input into 2 partitions according to Price(med)
HKU CSIS DB Seminar Divide-and-Conquer Algorithm PriceDist S1 S2 S3 S4 S5 S6 S7
HKU CSIS DB Seminar Divide-and-Conquer Algorithm PriceDist S1 S2 S3 S4 S5 S6 S7 S1, S2 S4, S5, S7 S1,S2, S7 i.e. tuples in UPPER never be eliminated
HKU CSIS DB Seminar Efficient Progressive Skyline Computation VLDB 2001 K.L. Tan, P.K. Eng, B.C. Ooi Previous approach require at least one pass over the dataset to return the first interesting point, We propose: 1.Bitmap-based Algorithm 2.B+-tree-based Algorithm They can return the first interesting point once they identified.
HKU CSIS DB Seminar Progressive? Both bitmap and tree-base returns skyline very quickly Maybe useful if you are not willing to wait so long for the first few interesting homes out of the large dataset Also outperform BNL and D-&-C in overall response time
HKU CSIS DB Seminar Skyline by Bitmap Main idea: Given a point p, if “something” can tell you: p is not dominated by any other points in DB skyline! p is dominated by some points in DB throw away Non-blocking! Can return the skyline points immediately
HKU CSIS DB Seminar Bitmap All information requires to decide whether a point is in skyline are encoded in bitmaps A data point p = (p 1, p 2, …, p d ) where d is no. of dimensions, is mapped to a m-bit vector, m is number of distinct values over all dimensions
HKU CSIS DB Seminar Bitmap Distinct values on price and distance is 7 and 4 m = 11 PriceDist
HKU CSIS DB Seminar Bitmap representation Distinct value on x: 10 Distinct value on y: 10 m=20 20-bit vector E.g (4,8): 4 is 4-th smallest on dimension x, set 4-th to the leftmost be 1 (starting from right) 8 is 8-th smallest on y, set 8-th to the leftmost be 1) PointBitmap Representation (1,9)( , ) (2,10)( , ) (4,8 )( , ) (6,7 )( , ) (9,10 )( , ) (7,5 )( , ) (5,6 )( , ) (4,3 )( , ) (3,2 )( , ) (9,1 )( , ) (10,4 )( , ) (6,2 )( , ) (8,3)( , )
HKU CSIS DB Seminar Bitmap representation Do (4,8) is a skyline point? (min x, y) Create bit-strings Cx and Cy (Not CY Ng!) Cx= Cy= Cx & Cy = If Cx&Cy has more than one ‘1’, dominated by some points PointBitmap Representation (1,9)( , ) (2,10)( , ) (4,8 )( , ) (6,7 )( , ) (9,10 )( , ) (7,5 )( , ) (5,6 )( , ) (4,3 )( , ) (3,2 )( , ) (9,1 )( , ) (10,4 )( , ) (6,2 )( , ) (8,3)( , )
HKU CSIS DB Seminar Bitmap representation Do (3,2) is a skyline point? (min x, y) Create bit-strings Cx and Cy Cx= Cy= Cx & Cy = If Cx&Cy has only 1, it is a skyline PointBitmap Representation (1,9)( , ) (2,10)( , ) (4,8 )( , ) (6,7 )( , ) (9,10 )( , ) (7,5 )( , ) (5,6 )( , ) (4,3 )( , ) (3,2 )( , ) (9,1 )( , ) (10,4 )( , ) (6,2 )( , ) (8,3)( , )
HKU CSIS DB Seminar Short summary on Bitmap Need to pre-compute bitmap representation of every point Each point retrieve all bitmaps in order to get the juxtapositions (Cx and Cy) Large storage if the domain of each attributes are large
HKU CSIS DB Seminar Some other progressive algorithms B+-tree index (also proposed by BOC) Organizes the points into d lists (d is no. of dimensions in data) Build B+tree on the lists for retrieving skylines Suffer similar problem as bitmap approach
HKU CSIS DB Seminar Some other progressive algorithms (cont.) NN algorithm (by Donald Kossmann again) [VLDB 02]
HKU CSIS DB Seminar NN skyline
HKU CSIS DB Seminar Outline Introduction to Skyline Queries Skyline Operator in SQL Implementation Algorithms Progressive Algorithms Variations of Skyline Queries Experimental Result Conclusion
HKU CSIS DB Seminar An Optimal and Progressive Algorithm for Skyline Queries SIGMOD 2003 D.Papadias, Y. Tao, G. Fu, B. Seeger We propose: 1.A NN algorithm which is more efficient and I/O Optimal 2.Ranked skyline queries 3.Constrained skyline queries 4.Dynamic skyline queries 5.K-dominating queries
HKU CSIS DB Seminar Ranked Skyline A ranked skyline returns K skyline points that have minimum/max score according to a function f In our example, f = 3*Dist + 7*Area Return the top K homes Though skylines are returning interesting points, we may want the most interesting points according to our own preferences, especially the data set is large(say hotels) and skyline is also large!
HKU CSIS DB Seminar Constrained Skyline Returning the most interesting points in a specific data space
HKU CSIS DB Seminar Dynamic Skyline Returning update skyline dynamically E.g. Ask for hotels with minimum distance and price (again?) Minimum distance is now depends on my current location
HKU CSIS DB Seminar Enumerating Skyline Enumerate queries return, for each skyline point p, the number of points dominated by p Sometime useful if you want to know this skyline hotel C has dominated 1000 hotels, and another hotel Y dominated only 1 hotel maybe C is better than Y in many properties (e.g. price, dist, etc), but Y has only 1 properties better than C, e.g. with PS2
HKU CSIS DB Seminar Experimental Evaluation Running time comparison on progressive algorithms without NN approaches Index Bitmap D&C BNL
HKU CSIS DB Seminar Conclusion Introduction the skyline queries How to implement (support) the skyline operator in DBMS? Variation of skyline queries If the information are placed in different places, how to answer skyline queries on a mobile device?
HKU CSIS DB Seminar References S.Borzonyi, D.Kossmann, K.Stocker. The Skyline Operator. ICDE K.L. Tan, P.K. Eng, B.C. Ooi. Efficient Progressive Skyline Computation. VLDB D.Kossmann, F.Ramsak, S. Rost. Shooting Stars in the Sky: An Online Algorithm for Skyline Queries. VLDB D.Papadias, Y. Tao, G. Fu, B. Seeger. An Optimal and Progressive Algorithm for Skyline Queries. SIGMOD 2003.