Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.

Slides:



Advertisements
Similar presentations
Multi-Guarded Safe Zone: An Effective Technique to Monitor Moving Circular Range Queries Presented By: Muhammad Aamir Cheema 1 Joint work with Ljiljana.
Advertisements

Finding the Sites with Best Accessibilities to Amenities Qianlu Lin, Chuan Xiao, Muhammad Aamir Cheema and Wei Wang University of New South Wales, Australia.
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.
Spatio-temporal Databases
13/04/20151 SPARK: Top- k Keyword Query in Relational Database Wei Wang University of New South Wales Australia.
Efficient Evaluation of k-Range Nearest Neighbor Queries in Road Networks Jie BaoChi-Yin ChowMohamed F. Mokbel Department of Computer Science and Engineering.
Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
Jiang Chen Columbia University Ke Yi HKUST. Motivation  Uncertain data naturally arises in many applications: sensor data, fuzzy data integration, data.
Di Yang, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute VLDB 2009, Lyon, France 1 A Shared Execution Strategy for Multiple Pattern.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Click to edit Present’s Name SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries Shiyu Yang 1, Muhammad Aamir Cheema 2,1, Xuemin.
CircularTrip: An Effective Algorithm for Continuous kNN Queries Muhammad Aamir Cheema Database Research Group, The School of Computer Science and Engineering,
Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations Lu-An Tang, Yu Zheng, Xing Xie, Jing Yuan, Xiao Yu, Jiawei Han University of.
SPARK: Top-k Keyword Query in Relational Databases Yi Luo, Xuemin Lin, Wei Wang, Xiaofang Zhou Univ. of New South Wales, Univ. of Queensland SIGMOD 2007.
Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.
Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.
Spatio-temporal Databases Time Parameterized Queries.
Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.
1 SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref Presented by.
Reza Sherkat ICDE061 Reza Sherkat and Davood Rafiei Department of Computing Science University of Alberta Canada Efficiently Evaluating Order Preserving.
What ’ s Hot and What ’ s Not: Tracking Most Frequent Items Dynamically G. Cormode and S. Muthukrishman Rutgers University ACM Principles of Database Systems.
Continuous Data Stream Processing MAKE Lab Date: 2006/03/07 Post-Excellence Project Subproject 6.
A Unified Approach for Computing Top-k Pairs in Multidimensional Space Presented By: Muhammad Aamir Cheema 1 Joint work with Xuemin Lin 1, Haixun Wang.
Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.
Computer Science and Engineering Loyalty-based Selection: Retrieving Objects That Persistently Satisfy Criteria Presented By: Zhitao Shen Joint work with.
Kyriakos Mouratidis, Spiridon Bakiras, Dimitris Papadias SIGMOD
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Searching for Extremes Among Distributed Data Sources with Optimal Probing Zhenyu (Victor) Liu Computer Science Department, UCLA.
1 Exact Top-k Nearest Keyword Search in Large Networks Minhao Jiang†, Ada Wai-Chee Fu‡, Raymond Chi-Wing Wong† † The Hong Kong University of Science and.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.
Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),
1 Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Influence Zone: Efficiently Processing Reverse k Nearest Neighbors Queries Presented By: Muhammad Aamir Cheema Joint work with Xuemin Lin, Wenjie Zhang,
Distributed Spatio-Temporal Similarity Search Demetrios Zeinalipour-Yazti University of Cyprus Song Lin
Efficient Processing of Top-k Spatial Preference Queries
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
1 On Optimal Worst-Case Matching Cheng Long (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.
Information Technology (Some) Research Trends in Location-based Services Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia.
Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree
A Sampling-based Estimator for Top-k Selection Query Chung-Min ChenYibei Ling ICDE 2002 Presented by Kan Kin Fai.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Click to edit Present’s Name AP-Tree: Efficiently Support Continuous Spatial-Keyword Queries Over Stream Xiang Wang 1*, Ying Zhang 2, Wenjie Zhang 1, Xuemin.
A Unified Algorithm for Continuous Monitoring of Spatial Queries
A Unified Framework for Efficiently Processing Ranking Related Queries
Stochastic Skyline Operator
TT-Join: Efficient Set Containment Join
Spatio-temporal Pattern Queries
Spatio-temporal Databases
On Efficient Graph Substructure Selection
Probabilistic n-of-N Skyline Computation over Uncertain Data Streams
Efficient Subgraph Similarity All-Matching
Range-Efficient Computation of F0 over Massive Data Streams
Presented by: Mahady Hasan Joint work with
Spatio-temporal Databases
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema 1, Xuemin Lin 21, Wenjie Zhang 1, Haixun Wang 3 1 The University of New South Wales, Australia 2 East China Normal University 3 Microsoft Research Asia

2 Introduction Top-k Pairs Query: Given a scoring function score() that computes the score of a pair of objects, return k pairs of objects with the smallest scores. Examples: k closest pairs queries k furthest pairs queries Top-k Pairs against sliding windows Given a data stream, return top-k pairs among the most recent N objects. Applications Wireless sensor network, stock market, traffic monitoring and transaction monitoring

3 Motivation No existing work for general pairs queries over sliding windows Support arbitrary scoring functions. Example: Fraud detection over transaction streams –Query the transaction pairs that have small time difference but the locations are far away. Select a.id, b.id from trans a, trans b where a.id <> b.id and a.account = b.account order by |a.time - b.time| - dist(a.loc, b.loc) limit k window [24 hours] Select a.id, b.id from trans a, trans b where a.id <> b.id and a.account = b.account order by |a.time - b.time| - dist(a.loc, b.loc) limit k window [24 hours] :15:20New York$ :18:10L.A.$1000

4 Problem Definitions (Preliminaries) Sliding Windows –A sliding window contains most recent N objects of the data stream. –The number of pairs is N(N – 1) / 2 Sliding window of size 5 newer older o1o1 o2o2 o3o3 o4o4 o5o5 o6o6 o7o o0o0 Lower bound runtime cost : O(N) for each new object Lower bound storage cost : O(N) Age of an object: The age of a pair depends on the older object.

5 Contributions Unified framework First to study top-k pairs queries over sliding windows. Support arbitrarily complex scoring functions Support efficient queries for any window size n ≤ N and any k ≤ K Lower boundExpected cost for our algorithms Storage requirementO(N)O(N) + O(K log(N/K)) for each scoring function Skyband maintenance cost for each object O(N)O(N (log (log N) + log K)) Answering top-k pairsO(k)O(log(log n) + log K + k)

6 Preliminaries p1p1 p2p2 p4p4 p7p7 Age Score Map all the pairs to an age–score space Top-2 pairs K-skyband [Papadias et al., TODS05] keeps the minimum set for the candidate results. p 2 dominates p 5 because p 2.score < p 5.score and p 2 expires no later than p 5. Task1 : how we efficiently maintain the K-skyband Task2 : how we use the K- skyband to efficiently obtain top-k pairs against any sliding window n ≤ N p 1 (o 0, o 1 )  (p 1.age, p 1.score)  (1, 3) o1o1 o2o2 o3o3 o4o4 o0o0 p3p3 p5p5 p6p6 p8p8 p9p9 p Naive: O(N |SKB|) for checking all N-1 pairs Expected size of skyband is O(K log(N/K)) Our: O(N log|SKB|)

7 p1p1 p2p2 p3p3 p4p4 2-skyband Age Score p5p5 Efficient Skyband Maintenance Can we find a boundary between the skyband points and non-skyband points? K-staircase How can we efficiently compute the K-staircase and K-skyband? s1s1 Update the K-staircase and K-skyband in O(|SKB| log K)), Check if a pair is dominated by K-skyband in O(log |SKB|) time for each new pair by doing binary search. p5p5 K-staircase s1s1 s2s2 s2s2 p1p1 p6p6 p7p7

8 Window size = NAny window size = n < N Efficient Query Answering p3p3 p1p1 p5p5 p7p7 p8p8 2-skyband Age Score p6p6 p4p4 p2p2 Can we do better for any sliding window size n < N? Use Priority Search Tree to index the skyband points Self-balancing tree Efficient 3-sides range query 6 p1p1 3 p5p5 1 p7p7 4 p6p6 2 p8p8 9 p2p2 8 p3p3 5 p4p4 Priority Search Tree

9 Efficient Query Answering p3p3 p1p1 p5p5 p7p7 p8p8 2-skyband Age Score p6p6 p4p4 p2p2 Our contribution: Retrieve top-k pairs in the 1-sided range. An algorithm similar to post-order traversal costs O(log|SKB| + k) Any window size = n < N 6 p1p1 3 p5p5 1 p7p7 4 p6p6 2 p8p8 9 p2p2 8 p3p3 5 p4p4 Priority Search Tree

10 What else in the paper? Efficient continuous queries on the skyband. Continuously monitoring the top-k results for any fixed k (k ≤ K) and n (n ≤ N). Amortized O(k/n (log |SKB| + k)) time per update. Optimization on monotonic scoring functions. Handling the k-closest pairs, k-furthest pairs queries. Applying Threshold Algorithm on sorted lists Improving the number of considered pairs for each new object from N to (d+1) N d/(d+1) K 1/(d+1)

11 Experimental Settings Real dataset. –Sensor data in the Intel research lab –2.3 million records. Synthetic data. –Uniform, correlated and anti-correlated distributions. –2 million objects –Closest and furthest pairs in Manhattan distance

12 Experiments (Overall Cost on real data) SCase: our algorithm using K-staircase to maintain the skyband. Naïve: maintains kN pairs and sort them on their scores. LB: shows lower bound cost Varying K Varying N (in thousands)

13 Experiments (Query Answering) Linear: scan the skyband points to find the top-k pairs. Snapshot: our snapshot query algorithm. Continuous: our continuous query algorithm. LB: an algorithm to obtain top-k results in O(k) time. Varying K Varying |Q| (in thousands)

14 Conclusion: First to study a broad class of top-k pairs queries over sliding windows. We present efficient algorithms and show that the performance of our algorithm is reasonably close to the lower bound cost. We provide extensive experiment results on both real and synthetic data sets to show the efficiency and scalability of the proposed algorithms.

15 Question and Answer Thank You! Any Questions?

16 Related Work Top-k Query Processing Fagin’s Algorithm (FA), threshold Algorithm (TA), no-random access (NRA) Top-k Pairs Queries Processing k-closest pairs queries k-furthest pairs queries Top-k pairs queries [Cheema et al., ICDE’11] Data Stream Processing Top-k query processing over data stream [Mouratidis et al., SIGMOD’06] k-nearest neighbour queries [Böhm et al., ICDE’07]

17 Experiments (Skyband Maintenance algorithm) Basic: maintening algorithm without K-staircase SCase: our algorithm using K-staircase to maintain the skyband. TA: Optimized algorithm for monotonic scoring functions. LB: show lower bound cost # of attributesVarying K