The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Web Information Retrieval
Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
1 A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES Leong Hou U, Nikos Mamoulis, Kyriakos Mouratidis Gruppo 10: Paolo Barboni, Tommaso Campanella, Simone.
 Introduction  Views  Related Work  Preliminaries  Problems Discussed  Algorithm LPTA  View Selection Problem  Experimental Results.
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Maintaining Sliding Widow Skylines on Data Streams.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.
6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades.
Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.
Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1
Combining Fuzzy Information: an Overview Ronald Fagin Abdullah Mueen -- Slides by Abdullah Mueen.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
A Unified Approach for Computing Top-k Pairs in Multidimensional Space Presented By: Muhammad Aamir Cheema 1 Joint work with Xuemin Lin 1, Haixun Wang.
Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.
Indexing Spatio-Temporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science.
Reaching the Top-k of the Skyline: A efficient Indexed Algorithm for Top-k Skyline Queries Marlene Goncalves and María-Esther Vidal Universidad Simón Bolívar,
Top- K Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel Presenter: Avinandan Sengupta.
Evaluation of Top-k OLAP Queries Using Aggregate R-trees Nikos Mamoulis (HKU) Spiridon Bakiras (HKUST) Panos Kalnis (NUS)
CS246 Ranked Queries. Junghoo "John" Cho (UCLA Computer Science)2 Traditional Database Query (Dept = “CS”) & (GPA > 3.5) Boolean semantics Clear boundary.
Continuous Processing of Preference Queries in Data Streams : a Survey
Da Yan and Wilfred Ng The Hong Kong University of Science and Technology.
Kyriakos Mouratidis, Spiridon Bakiras, Dimitris Papadias SIGMOD
SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Searching for Extremes Among Distributed Data Sources with Optimal Probing Zhenyu (Victor) Liu Computer Science Department, UCLA.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
Efficient Processing of Top-k Spatial Preference Queries
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.
Probabilistic Contextual Skylines D. Sacharidis 1, A. Arvanitis 12, T. Sellis 12 1 Institute for the Management of Information Systems — “Athena” R.C.,
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
Combining Fuzzy Information: An Overview Ronald Fagin.
Presented by Suresh Barukula 2011csz  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.
Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
Top-k Query Processing Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem, and Moni Naor + Sushruth P. + Arjun Dasgupta.
Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.
Database Searching and Information Retrieval Presented by: Tushar Kumar.J Ritesh Bagga.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Efficient Semantic Web Service Discovery in Centralized and P2P Environments Dimitrios Skoutas 1,2 Dimitris Sacharidis.
1 VLDB, Background What is important for the user.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
A paper on Join Synopses for Approximate Query Answering
Spatio-temporal Pattern Queries
Rank Aggregation.
Xu Zhou Kenli Li Yantao Zhou Keqin Li
Continuous Density Queries for Moving Objects
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Efficient Processing of Top-k Spatial Preference Queries
Donghui Zhang, Tian Xia Northeastern University
Query Specific Ranking
Efficient Aggregation over Objects with Extent
Presentation transcript:

The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon Bakiras Dimitris Papadias Presenter: Kamiru

The university of Hong Kong Department of Computer Science Outline  Motivation  Problem Setting  Related Works  Top-k Queries  Skyband  Solutions  Top-k Computation  Maintenance Module  Skyband Monitoring Algorithm  Experimental Evaluation  Conclusion  Future Works

The university of Hong Kong Department of Computer Science Motivation  We define the top- k query first:  Given a dataset P and a preference function f, a top- k query retrieves the k tuples in P with the highest scores according to f.  One real life application is: find the top 5 hotels with the following preference function f(hotel) = -hotel.price + hotel.quality

The university of Hong Kong Department of Computer Science Motivation  Existing methods are not applicable to streaming environment  The internet traffic flow monitoring is one real life application for the streaming case.  The data on the internet have very high data rate  Each tuple may include Source IP address, destination IP address, start time, end time, MTU, TTL…etc.

The university of Hong Kong Department of Computer Science Motivation  The availability of such records  traffic estimation  network security  troubleshooting  For instance, top-k query helps the system to prevent the DDoS (Distributed Denial of Service) attack if it monitors the top- k flows with the largest individual throughput in real time

The university of Hong Kong Department of Computer Science Motivation  The server has higher chance to have DDoS attack than on this network. NoPackets destination ip NoPackets destination ip NoPacketsdestination ip

The university of Hong Kong Department of Computer Science Problem Setting  A function f is increasingly monotone on dimension x i if for any pair of tuples (points) p 1, p 2 with p 1.x i ≥p 2.x i and p 1.x j =p 2.x j  j!=i we have score(p 1 ) ≥ score(p 2 ), where score(p i )=f(p 1.x 1,…,p n.x n )  The decreasingly monotone can be defined as the same with the reverse operation ( ≤ ).

The university of Hong Kong Department of Computer Science Problem Setting  Notice that a function may be increasingly monotone on some dimensions, and decreasingly monotone on the remaining.  For instance, f(p)=p.x 1 –p.x 2,  f is increasingly monotone on x 1 and decreasingly monotone on x 2 x1x1 x2x2 f has higher value f has lower value line defined by f=x 1 -x 2 a b

The university of Hong Kong Department of Computer Science Problem Setting  Problem definition: Given a set of queries Q and a set of points P. The top-k results ( R q ) of query q  Q are {R q | |R q |=k, f(r i )>f(r j )}, which r i  R q, r j  R q  For each timestamp,  update the new arrival objects P ins  remove the objects which are expired P del  outputs the top- k results for each query q  Q to the remaining P

The university of Hong Kong Department of Computer Science Related Works – Top-k query computation  Several existing methods solve the top- k calculation in various scenarios.  They focus on computing the top-k results from multiple data repositories.  Fagin et. al. introduce two efficient methods for processing ranked queries:  Threshold algorithm (TA)  No Random Access algorithm (NRA)

The university of Hong Kong Department of Computer Science TA and NRA  Both methods need to do sorted access in parallel to each of the m sorted lists S i  which m is the number of inputs (attributes), the data in domain i are stored into S i  Descending order is used to scan the data points from all S i

The university of Hong Kong Department of Computer Science TA and NRA  As an object o is seen in input S i  TA  do random access to the other lists to find the grade x i of object o in every list S i. Then compute the value of function f.  NRA  does not access to other list. Instead of compute the value of function f, it just updates two bounding attributes.  Both algorithms stop when top- k result is large than threshold T

The university of Hong Kong Department of Computer Science Example of TA and NRA  Assume that we have 3 ranked inputs, and 5 records (a~e) in our database, find the top-1 query with the preference function f=SUM by TA and NRA.

The university of Hong Kong Department of Computer Science Example of TA and NRA  TA  First loop  Get object c, compute f(c)= =2 Update result R={(c,2)} Threshold value T=0.9+∞+∞=∞>R k.value, continue  Get object a, compute f(a)= =1.8 Do not update the results since R k.value>1.8 Threshold value T= ∞=∞>R k.value, continue  Get object c, do not compute f Threshold value T= =2.7>R k.value, continue  Second loop, …  Until T<R k.value S1S1 c 0.9 d 0.8 b 0.6 e 0.3 a 0.1 S2S2 a 0.9 b 0.8 e 0.6 d 0.4 c 0.2 S3S3 c 0.9 a 0.8 b 0.6 d 0.6 e 0.5

The university of Hong Kong Department of Computer Science Example of TA and NRA  NRA maintains the objects whose upper r ub and lower r lb bound of their aggregate score  For initial setting, if the range of value is [0,1]  r lb = {0,0,0,0,0}, r ub = {∞,∞,∞,∞,∞}

The university of Hong Kong Department of Computer Science Example of TA and NRA  NRA  Get object c (0.9), a (0.9), and c (0.9) from S 1, S 2, and S 3 r lb = {0.9,0,1.8,0,0} –Update newly accessed objects –Update r a lb =0.9+ r a lb =0.9 r ub = {2.7,0,2.7,0,0} –Update objects which have been seen so far –e.g. update r a ub = = 2.7 R = {(c,1.8)} t = min{r x lb :x  R} = 1.8 u = max{r x ub :x  R} = 2.7 if t<u then repeat, otherwise, leave  Get object d (0.8), b (0.8), and a (0.8) from x 1, x 2, and x 3 … S1S1 c 0.9 d 0.8 b 0.6 e 0.3 a 0.1 S2S2 a 0.9 b 0.8 e 0.6 d 0.4 c 0.2 S3S3 c 0.9 a 0.8 b 0.6 d 0.6 e 0.5

The university of Hong Kong Department of Computer Science LARA  Mamoulis proposed the LARA (Lattice-based Rank Aggregation) algorithm which is an optimized NRA method  LARA separates the algorithm into two phases  Growing phase If t=min{r x lb :x  R}<T, it is impossible to attempt any pruning. T is the sum of possible values from all inputs. In the above example, T=2.7 after the first loop.  Shrinking phase If an object o is not seen in growing phase, then o is not a result of the query r ub value only store to the lattice nodes instead of storing to object itself Avoid a lot of updates to objects which have seen so far S1S2S3S1S2S3 S1S2S1S2 S1S3S1S3 S2S3S2S3 S3S3 S2S2 S1S1 

The university of Hong Kong Department of Computer Science Conclusion of Top-k query computation  The performance NRA should be better than TA in conventional database, since it avoids a lot of random accesses.  The performance of LARA is much better than NRA which is shown on their experiments.

The university of Hong Kong Department of Computer Science Related Works – Skyband  The skyline is the points which are not dominated by any point  A record p i is said to dominate another p j, if and only if, p i is preferable to p j on every attribute  The skyline of a dataset contains all tuples that belong to the result of any top- 1 query with a monotone function.  The k -skyband contains the tuples that are dominated by at most k-1 other points p1p1 p2p2 p3p3 p4p4 p7p7 p6p6 p5p5 skyline 2-skyband

The university of Hong Kong Department of Computer Science Related Works – Skyband  The skyband is used to monitor the top-k results in score-time space.  Assume that we want to monitor the top- 2 results in the following example: score expiration time p1p1 p2p2 p3p3 p4p4 p5p5 score expiration time p1p1 p2p2 p3p3 p4p4 p5p5 {p 1,p 2 } {p 1,p 4 } {-} {p 1,p 3 }{p 4 }

The university of Hong Kong Department of Computer Science Top-k computation  Grid-based indexing method is used  For each cell c in grid G, maxscore(c) is the maximum possible value in cell c  For each query q  Start from: The algorithm starts from the c which has highest maxscore(c)  Terminate condition: The search terminates when the cell c under consideration has maxscore(c)  R k.value

The university of Hong Kong Department of Computer Science Top-k computation  An example is given to explain how the top-k computation works.  Assume that we have two inputs ( x 1 and x 2 ) and a function f=x 1 +2x 2  The highest maxscore(c) is c 4,4  maxscore(c)=f(P)  Scan c 4,4  Next scanning cell is c 3,4  maxscore(p’)>maxscore(p’’) ……  Until maxscore(c)  R k.value c 4,4 c 1,1 c 3,4 PP’P’ P’’ P’’’ P’’’’p1p1 p2p2 p3p3

The university of Hong Kong Department of Computer Science The maintenance module  Given two datasets: P ins and P del  For all p  P ins  Insert p into the corresponding cell c  For all q who visited c, Insert into q.R if f(p)  q.R k.value  For all p  P del  Delete p from the corresponding cell c  For all q who visited c, If p  q.R, mark q as affected

The university of Hong Kong Department of Computer Science The maintenance module  For each affected query q,  Invoke Top-k Computation(q)  For all c which are not scanned by Top-k Computation(q) Delete q from c.visitedquery

The university of Hong Kong Department of Computer Science Example of maintenance module  q:f=x 1 +2x 2, find top- 1 result  Timestamp 1  P ins ={p 3,p 4 }, P del ={p 1,p 2 }  Timestamp 2  P ins ={p 5 }, P del ={p 3 } p1p1 p2p2 p3p3 p4p4 p5p5

The university of Hong Kong Department of Computer Science Summary of the maintenance module  Insertion does not invoke any top-k re-computation  Deletion has more higher cost than insertion  Affected query need to do Top-k computation Update the cells which are not scanned by top-k computation, the worst case is |cell|

The university of Hong Kong Department of Computer Science Skyband Monitoring Algorithm  I demonstrate how to use the k-skyband to monitor the results in score-time space in previous slide  The dominance counter ( DC ) can be used to get the k- skyband  DC is the number of records with higher score that expire after p score expiration time p1p1 p2p2 p3p3 p4p4 p5p p6p6 Monitoring a top-2 query

The university of Hong Kong Department of Computer Science Skyband Monitoring Algorithm  The computation of dominance count can be calculated by a balance tree (BT)  The expiration time of every processed element of q.skyband is stored into a balanced tree BT sorted in descending order  The order of insertion is in descending score order  p.DC is simply the number of tulples that precede p in BT score expiration time p1p1 p2p2 p3p3 p4p4 p5p5 p1p1 p2p2 Balance tree p3p3 p1p1 p2p p4p4 p5p5

The university of Hong Kong Department of Computer Science Skyband Monitoring Algorithm  Given two datasets: P ins and P del  For all p  P ins  Insert p into the corresponding cell c  For all q who visited c, If f(p)  q.R k.value –Insert p into q.skyband and p.DC=0 –For each p’ in q.skyband with f(p’)  f(p) »Update p’.DC=p’.DC+1 »If p’.DC=k evict p’ from q.skyband

The university of Hong Kong Department of Computer Science Skyband Monitoring Algorithm  For all p  P del  Delete p from the corresponding cell c  For all q who visited c, If p  q.R, delete p from q.skyband  For all q whose skyband has changed  If q.skyband has at least k points q.R=top-k(q.skyband)  Else Invoke Top-k Computation(q) Compute dominance counters

The university of Hong Kong Department of Computer Science Experimental Evaluation  They evaluate the proposed methods using streams of both independent (IND) and anti-correlated (ANT) datasets IND (d=2) ANT (d=2)

The university of Hong Kong Department of Computer Science Experimental Evaluation  Default experimental setting  Data dimensionality (d): 4  Data cardinality (N): 1M  Arrival rate (r): 10K  Query cardinality (Q): 1K  Result cardinality (k): 20

The university of Hong Kong Department of Computer Science Experimental Evaluation

The university of Hong Kong Department of Computer Science Experimental Evaluation

The university of Hong Kong Department of Computer Science Experimental Evaluation

The university of Hong Kong Department of Computer Science Conclusions  The top-k computation module processes the minimum number of cells  Proposed two monitoring algorithms  TMA and SMA  TMA re-computes the result from scratch  SMA maintains a superset of the current answer in the form of k -skyband  In the experimental evaluation, SMA shows that it overcomes other proposed solutions

The university of Hong Kong Department of Computer Science Future works  Non-monotone preference function  Queries support various dimensionality  Cluster the queries to make a super query SQ, and monitor the results for these superset of queries

The university of Hong Kong Department of Computer Science Thank you for your attention! PS. Hope I can show this page on the time!

The university of Hong Kong Department of Computer Science References