Download presentation
Presentation is loading. Please wait.
Published byBérengère Desmarais Modified over 5 years ago
1
On the interaction between multidimensional skylines and functional dependencies
Sofian Maabout University of Bordeaux. CNRS Joint work with Nicolas Hanusse, Patrick Kamnang Wanko, Carlos Ordonez
2
Skyline query Id Dist from INSA price a 100 50 b 90 200 c 280 d 40 e 240 55 f 245 285 h 95 300 O is in the skyline iff there is no other O’ better than O Skyline={a, b, c, d} not dominated by any hotel Intuitively, skyline points represent the best tradeoff
3
Multidimensional skylines
Users are allowed to ask queries using any combination of dimensions CEO: Best hotels = offering a swimming pool and air conditionning Student: Best hotels = cheapest and free wifi Skycube = set of all possible skylines How to optimize all these multidimensional skylines? Precompute ALL of them Full Skycube Precompute a SUBSET of them Partial Skycube
4
This talk How functional dependencies can help full and partial materialization of skycubes
5
Skyline Queries and Data Quality
Discard records with low quality is one dimension of data cleaning Compare tuples wrt their respective quality parameters Best tuples = those with best tradeoff wrt quality parameters
6
Skyline Queries and Data Quality
Id Name Phone Zip City Salary t1 Dupont 0123 69000 Bordeaux [1500,2000] t2 Paul 33000 [1500, 1600] Dupond 4567 Lyon [1500, 2000] t3 William 6789 [2000,2000] Zip City Phone Name
7
Skyline Queries and Data Quality
Id Name Phone Zip City Salary t1 Dupont 0123 69000 Bordeaux [1500,2000] t2 Paul 33000 [1500, 1600] t3 Dupond 4567 Lyon [1500, 2000] t4 William 6789 [2000,2000] t5 Mary 1357 75000 Paris [2500,2800] t1, t3 and t4 involved in Zip City violation t1 and t2 involved in Phone Name violation t1’s salary is less precise than t2’s
8
Skyline Queries and Data Quality
Id Name Phone Zip City Salary t1 Dupont 0123 69000 Bordeaux [1500,2000] t2 Paul 33000 [1500, 1600] t3 Dupond 4567 Lyon [1500, 2000] t4 William 6789 [2000,2000] t5 Mary 1357 75000 Paris [2500,2800] Id #FD Salary Uncertinty t1 2 500 t2 1 100 t3 t4 t5 300 Sky(#FDs,SU)= {t4, t5}
9
Skylines are not monotone
10
Functional dependencies & multidimensional skylines
A B BC A B A Theorem: If X Y then Sky(X) Sky(XY)
11
Closed subspaces X is closed iff XA for every A not in X
The minimal FD’s satisfied by T are C is closed AB is not closed
12
Example sqs Red : closed subspace
13
Skycube computation If partial materialization, just stop here
14
Skycube computation Need of an efficient procedure
15
Mining Closed Subspaces
Intuitive idea: For every A, find the maximal X st X A Every x X’s is potentially closed The intersection of these sets of x’s are the closed subspaces We adapt N. Hanusse, SM: A parallel algorithm for computing borders. CIKM’11
16
Mining Closed Subspaces
Maximal subspaces not determining B
17
Subspace Closure Let X be a subspace. Let Closed={Y | Y is closed} Then, X+ = smallest Y Closed s.t X Y
18
Closed Subspaces ABCD BCD ABC ABD ACD AB AC AD BC BD CD A B C D
19
Experiments We versus other proposals for fully computing the skycube.
QGS & QGL : Lee et al. VLDBJ’14 and BUS & TDS: Pei et al. TODS’06 Orion: Raïssi et al. VLDB’10 We versus closed skycubes: a losseless compression technique. Raïssi et al. VLDB’10 Assess query evaluation time
20
Experiments: (1) compute all skylines Synthetic data sets
Independent Correlated Anti-correlated
21
Experiments: (1) Full Skycube Synthetic data sets
Speedup = execution time of algorithm X / execution time of our algorithm FMC
22
Experiments: (1) Full Skycube Real Data
23
Experiments: (2) query optimization 1000 random skyline queries
0.31% out of the 2^20 queries are materialized. 49 ms to answer 1K skyline queries from the materialized ones instead of 99.92 seconds from the underlying data. Speed up > 2000 23 23
24
Experiments: (3) comparison with closed skycubes
Identify equivalent skylines and store just one copy compression of the whole skylines set E.g, Sky(C), Sky(D) and Sky(CD) are equivalent
25
Experiments: (3) comparison with closed skycubes
n20K, d=17 n 75K, d=10 n 100K, d=18 Number of materialized skylines (time to find and materialize them) Synthetic correlated data: n=100K, d=20: MICS=20sec, Closed didn’t finish after 36 hours More details in N. Hanusse, SM, P. Kamnang Wanko, C. Ordonez: Skycube Materialization Using the Topmost skyline of Functional Dependencies. TODS’16
26
Incomparability Dependencies
Definition: X ↬ Y iff t[X]=t’[X] t[Y] and t’[Y] incomparable Theorem: Sky(X) satisfies X ↬ Y Sky(X) Sky(XY) Property: XY X ↬ Y
27
Incomparability Dependencies
FDs do not detect Sky(B) Sky(AB) while Sky(B) satisifes B ↬ A IncoDs detect that Sky(B) Sky(BC) because Sky(B) doesn’t satisfy B ↬ C
28
Prioritized Skyline Expression = Sky(AB & CD) First computes Sky(AB)
If t[AB] = t’[AB] and t Sky(AB), then t and t’ are compared wrt C and D Kießling. Foundations of preferences in database systems. In VLDB’02, Chomicki et al. Preference elicitation in prioritized skyline queriesVLDBJ’12 Ciaccia et al. Output-sensitive Evaluation of Prioritized Skyline Queries. Sigmod’15
29
Prioritized Skyline Sky(AB)= {t1, t2, t3, t4}
Id A B C D t1 1 3 t2 2 t3 4 t4 5 Sky(AB)= {t1, t2, t3, t4} t1[AB]= t2[AB] and t1 dominates t2 wrt CD Sky(AB & CD) = {t1, t3, t4}
30
Prioritized Skyline Let = X1 & … & Xi & … & Xm
If X1…Xi-1 X and X Xi then ’ = X1& … & Xi\X & … & Xm AB C Sky(AB & CD) Sky(AB & D)
31
Conclusion Functional dependencies are helpful for both full and partial skycube materialization Incomparability dependencies characterize skyline inclusions Semantic optimization of prioritized skylines with FDs
32
Some Open questions Is it possible to come up with a Chase like procedure for priotirized skylines semantic optimization? What about Order dependencies ? Incremental maintenance Approximate skylines and approximate FDs t[A] is preferred to s[A] iff s[A] – t[A] > X Y iff t, s : t[X] ~ s[X] t[Y] ~ s[Y]
33
Thanks Questions
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.