Download presentation
Presentation is loading. Please wait.
Published byHerbert Cooper Modified over 9 years ago
1
Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey, Jarek Gryz York University Ryan Shipley The College of William and Mary Speaker: ZHANG Shiming (Simon) Supervisor: Prof. David Cheung Dr. Nikos Mamoulis
2
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 2 2015-10-8 Outline Introduction Skyline Vs Maximal Vector Problem Goals & Accomplishments Design & Analysis Considerations Generic Algorithms & Analyses LESS Algorithm & Performance Conclusions This presentation based on this paper but not limited to it
3
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 3 2015-10-8 What is skyline? Skyline Query Given a set of d-dimensional data points, skyline query is to find a set of data points not dominated by others. Adversarial skyline query: finds a set of data point not dominating others (not covered in any paper) Dominate Relationship A data point p dominates another data point q if and only if p is better than or as good as(preference) q on all dimensions and p is strictly better than q on at least one dimension Monotone Preference Function
4
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 4 2015-10-8 What is skyline? SQL Extensions Find the maximals over tuples in the database context w.r.t skyline criteria SELECT...FROM...WHERE...GROUPBY...HAVING... SKYLINE OF [DISTINCT] d1 [MIN|MAX|DIFF],..., dm [MIN|MAX|DIFF] ORDERBY...
5
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 5 2015-10-8 What is skyline? Skyline Examples Interesting hotel # of rooms price Hotel Information (price, #of rooms) Skyline of hotels Price# of roomsName 7020Hotel 1 40 Hotel 2 10040Hotel 3 7050Hotel 4 10060Hotel 5 1070Hotel 6 4080Hotel 7 Not too crowded cheap hotel
6
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 6 2015-10-8 What is skyline? Skyline Examples Consider a Hotel table with columns name, address, dist(distance to the beach), stars (quality ranking), & price.
7
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 7 2015-10-8 Maximal Vector Problem A classical interesting problem since the 1960’s To identify the maximals over a collection of vectors Tuples ≈ vectors (or points) in k-dim. space Related to nearest neighbors convex hull
8
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 8 2015-10-8 Challenges of skyline query processing (not in this paper) Search efficiency Update efficiency Scalability to skyline query variants and various-type data High dimensionality and Large Data Set
9
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 9 2015-10-8 Related Work (not in this paper) General Skyline Algorithms BNL and D&C, B ö rzs ö nyi et al., ICDE’01 Bitmap and Index, Tan et al., VLDB’01 NN, Kossmann et al., VLDB’02 SFS, Chomicki et al., ICDE’03 BBS, Papadias et al., SIGMOD’03 LESS,Parke et al., VLDB’05 Static attributes vs. dynamic spatial attributes in SSQ SSQ is a dynamic skyline query, M. Sharifzadeh et al., VLDB’06 Z Order Skyline, Ken et al., VLDB’07 BBRS-Reverse Skyline, Evangelos et al., VLDB’07 …… Nearest Neighbor Search K-NN …… Computational Geometry Voronoi Diagram Delaunay Graph Convex Hull High-Dimensional computational geometry Maximal Vector Problem FLET(Fast Linear Expected-Time),J.L. Bentley et al.,SODA 1990 Index on Skyline Bitmap, B-tree, R-tree, aR-tree …. Spatial Skyline Query (SSQ): find the data points p i that are not spatially dominated by any other point p j with respect to the given query points {q}.
10
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 10 2015-10-8 Variations of Skyline Queries (not in this paper) Constrained skyline (spatial skyline) Ranked Skyline Group-by Skyline Dynamic Skyline or Multi-source Skyline Enumerating Skyline/Top-K/K-Dominating Skyline K-Skyband Skyline Approximate Skyline Reverse Skyline Subspace Skyline SkyCub in subspace Probabilistic Skylines on Uncertain Data Privacy Skyline ……
11
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 11 2015-10-8 Goals & Accomplishments
12
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 12 2015-10-8 Design & Analysis Considerations Relational Performance Criteria External I/O conscious (too much data for main memory) well behaved compatible with a query optimizer CPU computational load (asymptotic runtime analyses) generic (focus on generic maximal-vector algorithm) no indexes, no pre-computed information good properties progressive, pipe-lineable, universality and etc. at worse, linear run-time ( O(n) )
13
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 13 2015-10-8 Design Choices divide-and-conquer (D&C) or scan-based Can D&C be I/O conscious? Can scan-based be efficient? to sort or not to sort Is sorting useful? Is sorting too inefficient? (Not linear...) comparison policy Which vectors to compare next? How to reduce the number of comparisons? ……
14
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 14 2015-10-8 A Model for Average-Case Analysis Component Independence (CI) Uniform Independence (UI)
15
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 15 2015-10-8 Expected Number of Maximals
16
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 16 2015-10-8 Algorithms & Analyses Generic Algorithms
17
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 17 2015-10-8 Algorithms & Analyses Generic Algorithms’ Performance
18
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 18 2015-10-8 Algorithms & Analyses Divide-and-Conquer algorithms No evidence to make an efficient external version Although they are good in asymptotic complexity for n, dimension curve is a problem for k Scan-based algorithms Find global maximals early and eliminate non- maximals more quickly.
19
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 19 2015-10-8 DD&C:D&C|+Sort
20
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 20 2015-10-8 LD&C:D&C|-Sort
21
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 21 2015-10-8 Block Nested Loops (BNL) Algorithm O(kn) average case Under CI
22
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 22 2015-10-8 Sort Filter Skyline (SFS) Algorithm Have a window (W) and stream (S), as with BNL. Sort S first (via an external sort routine): e.g., Then, call improved BNL Any w in the window is guaranteed to be maximal (skyline).
23
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 23 2015-10-8 BNL vs SFS
24
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 24 2015-10-8 BNL & SFS
25
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 25 2015-10-8 The LESS Algorithm Combine best aspects of the algorithms, mainly BNL & SFS. EF Win--Elimination-Filter keep records with the best entropy scores SF Win--Skyline-Filter keep current skyline for further filter block-sort pass last merge pass
26
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 26 2015-10-8 LESS: Linear Average-Case Issues & Improvement
27
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 27 2015-10-8 LESS: Performance n = 500, 000 EF window: 200 vectors SF window: 76 pages, 3,000 vectors Pentium III, 733 MHz RedHat Linux 7.3
28
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 28 2015-10-8 Conclusions Future Works for Optimization of LESS
29
Department of Computer Sciences, The University of Hong Kong The 31st International Conference on Very Large Data Bases(VLDB05/VLDBJ06 August) 29 2015-10-8
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.