Presentation is loading. Please wait.

Presentation is loading. Please wait.

Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey,

Similar presentations


Presentation on theme: "Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey,"— Presentation transcript:

1 Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey, Jarek Gryz York University Ryan Shipley The College of William and Mary Speaker: ZHANG Shiming (Simon) Supervisor: Prof. David Cheung Dr. Nikos Mamoulis

2 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 2 2015-10-8 Outline  Introduction  Skyline Vs Maximal Vector Problem  Goals & Accomplishments  Design & Analysis Considerations  Generic Algorithms & Analyses  LESS Algorithm & Performance  Conclusions This presentation based on this paper but not limited to it

3 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 3 2015-10-8 What is skyline?  Skyline Query  Given a set of d-dimensional data points, skyline query is to find a set of data points not dominated by others.  Adversarial skyline query: finds a set of data point not dominating others (not covered in any paper)  Dominate Relationship  A data point p dominates another data point q if and only if p is better than or as good as(preference) q on all dimensions and p is strictly better than q on at least one dimension   Monotone Preference Function

4 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 4 2015-10-8 What is skyline?  SQL Extensions  Find the maximals over tuples in the database context w.r.t skyline criteria SELECT...FROM...WHERE...GROUPBY...HAVING... SKYLINE OF [DISTINCT] d1 [MIN|MAX|DIFF],..., dm [MIN|MAX|DIFF] ORDERBY...

5 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 5 2015-10-8 What is skyline?  Skyline Examples  Interesting hotel # of rooms price Hotel Information (price, #of rooms) Skyline of hotels Price# of roomsName 7020Hotel 1 40 Hotel 2 10040Hotel 3 7050Hotel 4 10060Hotel 5 1070Hotel 6 4080Hotel 7 Not too crowded cheap hotel

6 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 6 2015-10-8 What is skyline?  Skyline Examples  Consider a Hotel table with columns name, address, dist(distance to the beach), stars (quality ranking), & price.

7 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 7 2015-10-8 Maximal Vector Problem  A classical interesting problem since the 1960’s  To identify the maximals over a collection of vectors  Tuples ≈ vectors (or points) in k-dim. space  Related to nearest neighbors convex hull

8 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 8 2015-10-8 Challenges of skyline query processing (not in this paper)  Search efficiency  Update efficiency  Scalability to skyline query variants and various-type data  High dimensionality and Large Data Set

9 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 9 2015-10-8 Related Work (not in this paper)  General Skyline Algorithms  BNL and D&C, B ö rzs ö nyi et al., ICDE’01  Bitmap and Index, Tan et al., VLDB’01  NN, Kossmann et al., VLDB’02  SFS, Chomicki et al., ICDE’03  BBS, Papadias et al., SIGMOD’03  LESS,Parke et al., VLDB’05  Static attributes vs. dynamic spatial attributes in SSQ SSQ is a dynamic skyline query, M. Sharifzadeh et al., VLDB’06  Z Order Skyline, Ken et al., VLDB’07  BBRS-Reverse Skyline, Evangelos et al., VLDB’07  ……  Nearest Neighbor Search  K-NN ……  Computational Geometry  Voronoi Diagram  Delaunay Graph  Convex Hull  High-Dimensional computational geometry  Maximal Vector Problem  FLET(Fast Linear Expected-Time),J.L. Bentley et al.,SODA 1990  Index on Skyline  Bitmap, B-tree, R-tree, aR-tree  ….  Spatial Skyline Query (SSQ): find the data points p i that are not spatially dominated by any other point p j with respect to the given query points {q}.

10 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 10 2015-10-8 Variations of Skyline Queries (not in this paper)  Constrained skyline (spatial skyline)  Ranked Skyline  Group-by Skyline  Dynamic Skyline or Multi-source Skyline  Enumerating Skyline/Top-K/K-Dominating Skyline  K-Skyband Skyline  Approximate Skyline  Reverse Skyline  Subspace Skyline  SkyCub in subspace  Probabilistic Skylines on Uncertain Data  Privacy Skyline  ……

11 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 11 2015-10-8 Goals & Accomplishments

12 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 12 2015-10-8 Design & Analysis Considerations  Relational Performance Criteria  External  I/O conscious (too much data for main memory)  well behaved  compatible with a query optimizer  CPU computational load (asymptotic runtime analyses)  generic (focus on generic maximal-vector algorithm)  no indexes, no pre-computed information  good properties  progressive, pipe-lineable, universality and etc.  at worse, linear run-time ( O(n) )

13 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 13 2015-10-8 Design Choices  divide-and-conquer (D&C) or scan-based  Can D&C be I/O conscious?  Can scan-based be efficient?  to sort or not to sort  Is sorting useful?  Is sorting too inefficient? (Not linear...)  comparison policy  Which vectors to compare next?  How to reduce the number of comparisons? ……

14 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 14 2015-10-8 A Model for Average-Case Analysis Component Independence (CI) Uniform Independence (UI)

15 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 15 2015-10-8 Expected Number of Maximals

16 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 16 2015-10-8 Algorithms & Analyses  Generic Algorithms

17 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 17 2015-10-8 Algorithms & Analyses  Generic Algorithms’ Performance

18 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 18 2015-10-8 Algorithms & Analyses  Divide-and-Conquer algorithms  No evidence to make an efficient external version  Although they are good in asymptotic complexity for n, dimension curve is a problem for k  Scan-based algorithms  Find global maximals early and eliminate non- maximals more quickly.

19 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 19 2015-10-8 DD&C:D&C|+Sort

20 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 20 2015-10-8 LD&C:D&C|-Sort

21 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 21 2015-10-8 Block Nested Loops (BNL) Algorithm O(kn) average case Under CI

22 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 22 2015-10-8 Sort Filter Skyline (SFS) Algorithm  Have a window (W) and stream (S), as with BNL.  Sort S first (via an external sort routine): e.g.,  Then, call improved BNL  Any w in the window is guaranteed to be maximal (skyline).

23 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 23 2015-10-8 BNL vs SFS

24 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 24 2015-10-8 BNL & SFS

25 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 25 2015-10-8 The LESS Algorithm  Combine best aspects of the algorithms, mainly BNL & SFS. EF Win--Elimination-Filter keep records with the best entropy scores SF Win--Skyline-Filter keep current skyline for further filter block-sort pass last merge pass

26 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 26 2015-10-8 LESS: Linear Average-Case  Issues & Improvement

27 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 27 2015-10-8 LESS: Performance  n = 500, 000  EF window: 200 vectors  SF window: 76 pages, 3,000 vectors  Pentium III, 733 MHz  RedHat Linux 7.3

28 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 28 2015-10-8 Conclusions  Future Works for Optimization of LESS

29 Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 29 2015-10-8


Download ppt "Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey,"

Similar presentations


Ads by Google