Download presentation
Presentation is loading. Please wait.
1
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes
2
COMP9314Xuemin Lin@dbg.unsw.edu.au2 Outline Introduction n-of-N Queries (n 1, n 2 )-of-N Queries Performance Evaluation Conclusions
3
COMP9314Xuemin Lin@dbg.unsw.edu.au3 Skyline Skyline Query : Input: a set of points in d- dimensional space. Output: points not dominated by another point. (x 1, x 2, …, x d ) dominates (y 1, y 2, …, y d ) iff x i <=y i (1<=i<=d) & ∃ k, x k <y k.
4
COMP9314Xuemin Lin@dbg.unsw.edu.au4 Applications Multi-criteria decision making… Stock Trading Example: What are the top deals?
5
COMP9314Xuemin Lin@dbg.unsw.edu.au5 Skyline Query Over Sliding Window Stock Trading Example Top deals of a stock in the last 5 mins? last 4 mins, … Top deals of a stock in the last 10K deals? … Queries: n-of-N model ( n <= N): the most recent n elements (n 1, n 2 )-of-N model One-time queries Continuous queries
6
COMP9314Xuemin Lin@dbg.unsw.edu.au6 Challenges Insertions & deletions (possibly high speed). On-line information –memory requirement –processing speed Existing techniques do not support n-of-N: [Borzsonyi et al (ICDE01), Tan et al (VLDB01), Kossman et al (VLDB 02), Papadias et al (SIGMOD03), Kapoor (SIAM J. comp00)] –support the computation of whole dataset –O (n log d-2 n) for d >= 4 & O (n log n ) otherwise
7
COMP9314Xuemin Lin@dbg.unsw.edu.au7 Results n-of-N: keep N’ (N’ N) elements where N’ = O (log d N) if data distribution on each dimension is independent. a novel encoding scheme, with O (N’) space, leads to n- of-N query time O ( log N’ + s ) instead of O (n log d-2 n). a new trigger based technique for continuously processing an n-of-N query. –trigger update time: O ( log s). –result update time: O (logδ) where δ is a result change. (n 1, n 2 )-of-N: similar results.
8
COMP9314Xuemin Lin@dbg.unsw.edu.au8 n-of-N Queries e is redundant Point e in P N (the most recent N elements) iff –e expires w.r.t P N, or – ∃ e’ s.t. e’ e, and e’ is younger than e N = 6
9
COMP9314Xuemin Lin@dbg.unsw.edu.au9 Optimality Theorem: Non-redundant Points (R N ) vs. n-of-N Skyline Query Result (Q n,N ) –(P N – R N ) does not appear in any Q n,N –Q n,N must be a subset of R N – x R N n, x Q n,N –R N = O(log d-1 N) for “independent” distributions Only need to keep R N – the minimum number of elements to be kept.
10
COMP9314Xuemin Lin@dbg.unsw.edu.au10 Querying R N critical dominance: e e’ where e is the youngest. dominance graph G R N : R N and the critical dominance relationships.
11
COMP9314Xuemin Lin@dbg.unsw.edu.au11 Querying R N e Q n,N iff e is a root in G R N or e’ e in G R N & e’ has expired t(e’) < M – n + 1 <= t(e) nRNRN M-n+1Q n,N n=5{3,4,5,6,7 }3{3,4} n=4{4,5,6,7 }4{4, 7} n=3{5,6,7}5
12
COMP9314Xuemin Lin@dbg.unsw.edu.au12 Querying R N : Optimal Algorithm To answer an n-of-N Query, encode the G R N using intervals: Stab the intervals by (M-n+1). For all returned intervals (x,y), return point whose timestamp is y Technique: Use an interval tree index to achieve optimal O(log|R N |+s) query time e.g., n=6 (0,3] (0,4] (3,7] (4,5] (4,6]
13
COMP9314Xuemin Lin@dbg.unsw.edu.au13 Maintaining R N new element e new arrives: If the oldest e old R N expires, remove e old and update R N and G R N (interval tree). find D R N dominated by e new, update R N and G R N –Depth-first search on a R-tree of R N find e c e new, update G R N –Best-first search on the R-tree of R N e new = 8 e old = 3 D = {6} e = 4
14
COMP9314Xuemin Lin@dbg.unsw.edu.au14 Continuous n-of-N Query Trigger-based algorithm: Deletion: Q n,N – {e old }, and Q n,N – {D} Insertion: Q n,N {e new } if ( e’ c e new and t(e’ ) >= M-n+1) Maintain a min-heap of Q n,N for efficiency e new = 8 M = 8 e old = 3 D = {6} e’ = 4 M = 7 n = 4, N=5 Q 4,5 = {4,7} M = 8 n = 4, N=5 Q 4,5 = {5,7,8} Q 5,5 = {3,4}Q 5,5 = {4,7}
15
COMP9314Xuemin Lin@dbg.unsw.edu.au15 (n 1,n 2 )-of-N Query More complicated than n-of-N Query P N needs to be kept! (Old) critical dominance: t (a e ) = max { t (e’): e’ e & t (e’) < t (e) } backward critical dominance: t (b e ) = min { t (e’): e’ e & t (e’) > t (e)} e Q (n1,n2),N iff a e < M-n 2 +1 e M-n 1 +1 < b e CBC dominance graph: P N & the two kinds of dominance relationships (2,4)-of-7: {4, 6} a 5 = 3, b 5 = 6 (M-n 2 +1, M-n 1 +1] = (4,6]
16
COMP9314Xuemin Lin@dbg.unsw.edu.au16 Processing (n 1,n 2 )-of-N Query Encode the CBC dominance graph: –e ((a e, e], b e ) –build an interval tree on (a e, e] only Stab using M-n 2 +1 against the interval tree and check e <= M-n 1 +1 < b e ) on-the-fly: –O(logN+s*), sub-optimal (2,4)-of-7: ??? (M-n 2 +1, M-n 1 +1] = (4,6] (1,6] (3,4] (3,5]... (2,4)-of-7: {4, 6} Candidates: {4, 5, 6}
17
COMP9314Xuemin Lin@dbg.unsw.edu.au17 More on (n 1,n 2 )-of-N Query Maintenance: Similar to that of n-of-N query, but –Always expires the oldest element in P N, and maintain the interval tree and the R-tree on R N. –Implementation-wise: Use two interval trees to index R N and P N - R N, respectively. Continuous queries –More complicated A new skyline point might not be a skyline in the previous result, nor critically dominated by a skyline point in the previous result nor a newly arrived point –Basic idea Maintain additional Candidate Solutions (minimization) & triggers –Details in the full paper
18
COMP9314Xuemin Lin@dbg.unsw.edu.au18 Experiment Setup Hardware –P4 2.8G CPU, 1G Memory Datasets –Correlated, independent, and anti- correlated –d = 2 to 5, N = 10 6 Algorithms –KLP, nN, mnN, cnN, n12N, mn12N Metrics –Processing time Streaming rate
19
COMP9314Xuemin Lin@dbg.unsw.edu.au19 n-of-N Query Varying dimensionality –M up to 2M, N = 1M, n uniformly from [1K, 1M], #queries = 1000
20
COMP9314Xuemin Lin@dbg.unsw.edu.au20 n-of-N Query (cont’d) Varying n for correlated, independent, and anti-correlated datasets
21
COMP9314Xuemin Lin@dbg.unsw.edu.au21 Maintenance Costs 2d and 5d datasets, measure average and max time, N = i * 10 5
22
COMP9314Xuemin Lin@dbg.unsw.edu.au22 Scalability M (total number) = 2M, N = 1M, #queries = 2M anti-correlatedindependent
23
COMP9314Xuemin Lin@dbg.unsw.edu.au23 Continuous n-of-N Queries 2d & 5d datasets N = 10K and 1M 10 queries with n = i*(N/10) measures cnN avg, cnN max, nN avg, nN max
24
COMP9314Xuemin Lin@dbg.unsw.edu.au24 (n 1,n 2 )-of-N Queries Varying dimensionality –M up to 2M, N = 1M, #queries = 1000 –restricting n 2 – n 1 >= 500 Scalability – M = 2M, N = 1M, #queries = 2M
25
COMP9314Xuemin Lin@dbg.unsw.edu.au25 Maintenance 2d and 5d datasets measure average and max time N = i * 10 5
26
COMP9314Xuemin Lin@dbg.unsw.edu.au26 Conclusions Efficient algorithms for various sliding windows skyline queries –Keep only minimum number of points –Encode and index those points –Maintain all the data structures The proposed solutions –have theoretical guarantee on the performance, and –have demonstrated efficiency and scalability in the experiments Future work –Improve the current solution for (n 1,n 2 )-of-N queries –Approximate skyline queries
27
COMP9314Xuemin Lin@dbg.unsw.edu.au27 Q&A Thank You!
28
COMP9314Xuemin Lin@dbg.unsw.edu.au28 Reference [ICDE01] S. Borzsonyi, D. Kossmann, and K. Stocker. The skyline operator. ICDE, 2001. [VLDB01] K. Tan, P. Eng, and B. Ooi. Efficient progressive skyline computation. VLDB, 2001. [VLDB 02] D. Kossmann, F. Ramsak, and S. Rost. Shooting stars in the sky: An online algorithm for skyline queries. VLDB, 2002. [SIGMOD03] D. Papadias, Y. Tao, G. Fu, and B. Seeger. An optimal progressive alogrithm for skyline queries. SIGMOD, 2003. [SIAM J. comp00] S. Kapoor. Dynamic maintenance of maxima of 2- d point sets. SIAM J. Comput., 2000.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.