Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.

Slides:



Advertisements
Similar presentations
Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem.
Advertisements

VLDB 2011 Pohang University of Science and Technology (POSTECH) Republic of Korea Jongwuk Lee, Seung-won Hwang VLDB 2011.
Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Maintaining Sliding Widow Skylines on Data Streams.
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
Continuous Intersection Joins Over Moving Objects Rui Zhang University of Melbourne Dan Lin Purdue University Kotagiri Ramamohanarao University of Melbourne.
Indexing the imprecise positions of moving objects Xiaofeng Ding and Yansheng Lu Department of Computer Science Huazhong University of Science & Technology.
July 29HDMS'08 Caching Dynamic Skyline Queries D. Sacharidis 1, P. Bouros 1, T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Nearest Neighbor Search in Spatial and Spatiotemporal Databases
Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.
1 Continuous k-dominant Skyline Query Processing Presented by Prasad Sriram Nilu Thakur.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
Indexing Spatio-Temporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science.
Efficient Computation of the Skyline Cube Yidong Yuan School of Computer Science & Engineering The University of New South Wales & NICTA Sydney, Australia.
Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm.
Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces Jian Pei # Wen Jin # Martin Ester # Yufei Tao + # Simon Fraser University,
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey,
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Efficient Computation of Reverse Skyline Queries VLDB 2007.
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
1 Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Efficient Processing of Top-k Spatial Preference Queries
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.
Exact indexing of Dynamic Time Warping
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, Keyword Search on Relational Data Streams Alexander Markowetz Yin.
1 Online Computation and Continuous Maintaining of Quantile Summaries Tian Xia Database CCIS Northeastern University April 16, 2004.
Space-Efficient Online Computation of Quantile Summaries SIGMOD 01 Michael Greenwald & Sanjeev Khanna Presented by ellery.
Monitoring k-NN Queries over Moving Objects Xiaohui Yu University of Toronto Joint work with Ken Pu and Nick Koudas.
A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES
D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang 1, Tie Hua Zhou 1, Kyung Ah Kim 2, Eun Jong Cha 2, and Keun.
Online Interval Skyline Queries on Time Series ICDE 2009.
Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.
Bin Jiang, Jian Pei ICDE 2009 Online Interval Skyline Queries on Time Series 1.
Reuse or Never Reuse the Deleted Labels in XML Query Processing Based on Labeling Schemes Changqing Li, Tok Wang Ling, Min Hu.
S CALABLE S KYLINE C OMPUTATION U SING O BJECT - BASED S PACE P ARTITIONING Shiming Zhang Nikos Mamoulis David W. Cheung sigmod
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Online Skyline Queries. Agenda Motivation: Top N vs. Skyline „Classic“ Algorithms –Block-Nested Loop Algorithm –Divide & Conquer Algorithm Online Algorithm.
Tian Xia and Donghui Zhang Northeastern University
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
RE-Tree: An Efficient Index Structure for Regular Expressions
Query in Streaming Environment
Spatio-Temporal Databases
Probabilistic n-of-N Skyline Computation over Uncertain Data Streams
Presented by: Mahady Hasan Joint work with
Range Queries on Uncertain Data
Approximation and Load Shedding Sampling Methods
Relaxing Join and Selection Queries
Efficient Processing of Top-k Spatial Preference Queries
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes

COMP9314Xuemin Outline Introduction n-of-N Queries (n 1, n 2 )-of-N Queries Performance Evaluation Conclusions

COMP9314Xuemin Skyline Skyline Query : Input: a set of points in d- dimensional space. Output: points not dominated by another point. (x 1, x 2, …, x d ) dominates (y 1, y 2, …, y d ) iff x i <=y i (1<=i<=d) & ∃ k, x k <y k.

COMP9314Xuemin Applications Multi-criteria decision making… Stock Trading Example: What are the top deals?

COMP9314Xuemin Skyline Query Over Sliding Window Stock Trading Example Top deals of a stock in the last 5 mins? last 4 mins, … Top deals of a stock in the last 10K deals? … Queries: n-of-N model (  n <= N): the most recent n elements (n 1, n 2 )-of-N model One-time queries Continuous queries

COMP9314Xuemin Challenges Insertions & deletions (possibly high speed). On-line information –memory requirement –processing speed Existing techniques do not support n-of-N: [Borzsonyi et al (ICDE01), Tan et al (VLDB01), Kossman et al (VLDB 02), Papadias et al (SIGMOD03), Kapoor (SIAM J. comp00)] –support the computation of whole dataset –O (n log d-2 n) for d >= 4 & O (n log n ) otherwise

COMP9314Xuemin Results n-of-N: keep N’ (N’  N) elements where N’ = O (log d N) if data distribution on each dimension is independent. a novel encoding scheme, with O (N’) space, leads to n- of-N query time O ( log N’ + s ) instead of O (n log d-2 n). a new trigger based technique for continuously processing an n-of-N query. –trigger update time: O ( log s). –result update time: O (logδ) where δ is a result change. (n 1, n 2 )-of-N: similar results.

COMP9314Xuemin n-of-N Queries e is redundant Point e in P N (the most recent N elements) iff –e expires w.r.t P N, or – ∃ e’ s.t. e’  e, and e’ is younger than e N = 6

COMP9314Xuemin Optimality Theorem: Non-redundant Points (R N ) vs. n-of-N Skyline Query Result (Q n,N ) –(P N – R N ) does not appear in any Q n,N –Q n,N must be a subset of R N –  x  R N   n, x  Q n,N –R N = O(log d-1 N) for “independent” distributions Only need to keep R N – the minimum number of elements to be kept.

COMP9314Xuemin Querying R N critical dominance: e  e’ where e is the youngest. dominance graph G R N : R N and the critical dominance relationships.

COMP9314Xuemin Querying R N e  Q n,N iff e is a root in G R N or e’  e in G R N & e’ has expired  t(e’) < M – n + 1 <= t(e) nRNRN M-n+1Q n,N n=5{3,4,5,6,7 }3{3,4} n=4{4,5,6,7 }4{4, 7} n=3{5,6,7}5

COMP9314Xuemin Querying R N : Optimal Algorithm To answer an n-of-N Query, encode the G R N using intervals: Stab the intervals by (M-n+1). For all returned intervals (x,y), return point whose timestamp is y Technique: Use an interval tree index to achieve optimal O(log|R N |+s) query time e.g., n=6 (0,3] (0,4] (3,7] (4,5] (4,6]

COMP9314Xuemin Maintaining R N new element e new arrives: If the oldest e old  R N expires, remove e old and update R N and G R N (interval tree). find D  R N dominated by e new, update R N and G R N –Depth-first search on a R-tree of R N find e  c e new, update G R N –Best-first search on the R-tree of R N e new = 8 e old = 3 D = {6} e = 4

COMP9314Xuemin Continuous n-of-N Query Trigger-based algorithm: Deletion: Q n,N – {e old }, and Q n,N – {D} Insertion: Q n,N  {e new } if  (  e’  c e new and t(e’ ) >= M-n+1) Maintain a min-heap of Q n,N for efficiency e new = 8 M = 8 e old = 3 D = {6} e’ = 4 M = 7 n = 4, N=5 Q 4,5 = {4,7} M = 8 n = 4, N=5 Q 4,5 = {5,7,8} Q 5,5 = {3,4}Q 5,5 = {4,7}

COMP9314Xuemin (n 1,n 2 )-of-N Query More complicated than n-of-N Query P N needs to be kept! (Old) critical dominance: t (a e ) = max { t (e’): e’  e & t (e’) < t (e) } backward critical dominance: t (b e ) = min { t (e’): e’  e & t (e’) > t (e)} e  Q (n1,n2),N iff a e < M-n 2 +1  e  M-n 1 +1 < b e CBC dominance graph: P N & the two kinds of dominance relationships (2,4)-of-7: {4, 6} a 5 = 3, b 5 = 6 (M-n 2 +1, M-n 1 +1] = (4,6]

COMP9314Xuemin Processing (n 1,n 2 )-of-N Query Encode the CBC dominance graph: –e  ((a e, e], b e ) –build an interval tree on (a e, e] only Stab using M-n 2 +1 against the interval tree and check e <= M-n 1 +1 < b e ) on-the-fly: –O(logN+s*), sub-optimal (2,4)-of-7: ??? (M-n 2 +1, M-n 1 +1] = (4,6] (1,6] (3,4] (3,5]... (2,4)-of-7: {4, 6} Candidates: {4, 5, 6}

COMP9314Xuemin More on (n 1,n 2 )-of-N Query Maintenance: Similar to that of n-of-N query, but –Always expires the oldest element in P N, and maintain the interval tree and the R-tree on R N. –Implementation-wise: Use two interval trees to index R N and P N - R N, respectively. Continuous queries –More complicated A new skyline point might not be a skyline in the previous result, nor critically dominated by a skyline point in the previous result nor a newly arrived point –Basic idea Maintain additional Candidate Solutions (minimization) & triggers –Details in the full paper

COMP9314Xuemin Experiment Setup Hardware –P4 2.8G CPU, 1G Memory Datasets –Correlated, independent, and anti- correlated –d = 2 to 5, N = 10 6 Algorithms –KLP, nN, mnN, cnN, n12N, mn12N Metrics –Processing time  Streaming rate

COMP9314Xuemin n-of-N Query Varying dimensionality –M up to 2M, N = 1M, n uniformly from [1K, 1M], #queries = 1000

COMP9314Xuemin n-of-N Query (cont’d) Varying n for correlated, independent, and anti-correlated datasets

COMP9314Xuemin Maintenance Costs 2d and 5d datasets, measure average and max time, N = i * 10 5

COMP9314Xuemin Scalability M (total number) = 2M, N = 1M, #queries = 2M anti-correlatedindependent

COMP9314Xuemin Continuous n-of-N Queries 2d & 5d datasets N = 10K and 1M 10 queries with n = i*(N/10) measures cnN avg, cnN max, nN avg, nN max

COMP9314Xuemin (n 1,n 2 )-of-N Queries Varying dimensionality –M up to 2M, N = 1M, #queries = 1000 –restricting n 2 – n 1 >= 500 Scalability – M = 2M, N = 1M, #queries = 2M

COMP9314Xuemin Maintenance 2d and 5d datasets measure average and max time N = i * 10 5

COMP9314Xuemin Conclusions Efficient algorithms for various sliding windows skyline queries –Keep only minimum number of points –Encode and index those points –Maintain all the data structures The proposed solutions –have theoretical guarantee on the performance, and –have demonstrated efficiency and scalability in the experiments Future work –Improve the current solution for (n 1,n 2 )-of-N queries –Approximate skyline queries

COMP9314Xuemin Q&A Thank You!

COMP9314Xuemin Reference [ICDE01] S. Borzsonyi, D. Kossmann, and K. Stocker. The skyline operator. ICDE, [VLDB01] K. Tan, P. Eng, and B. Ooi. Efficient progressive skyline computation. VLDB, [VLDB 02] D. Kossmann, F. Ramsak, and S. Rost. Shooting stars in the sky: An online algorithm for skyline queries. VLDB, [SIGMOD03] D. Papadias, Y. Tao, G. Fu, and B. Seeger. An optimal progressive alogrithm for skyline queries. SIGMOD, [SIAM J. comp00] S. Kapoor. Dynamic maintenance of maxima of 2- d point sets. SIAM J. Comput., 2000.