Reverse Top-k Queries Akrivi Vlachou , Christos Doulkeridis , Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),

Slides:

Advertisements

Similar presentations

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.

Advertisements

Identifying the Most Influential Data Objects with Reverse Top-k Queries By Akrivi Vlachou 1, Christos Doulkeridis 1, Kjetil Nørvag 1 and Yannis Kotidis.

Indexing DNA Sequences Using q-Grams

Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das A Probabilistic Optimization Framework for the Empty-Answer.

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.

Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.

1 Top-K Algorithms: Concepts and Applications by Demetris Zeinalipour Visiting Lecturer Department of Computer Science University of Cyprus Department.

Computer Science and Engineering Inverted Linear Quadtree: Efﬁcient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.

VLDB 2011 Pohang University of Science and Technology (POSTECH) Republic of Korea Jongwuk Lee, Seung-won Hwang VLDB 2011.

Fast Algorithms For Hierarchical Range Histogram Constructions

PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries Vagelis Hristidis University of California, San Diego Nick Koudas AT&T.

Similarity Search on Bregman Divergence, Towards Non- Metric Indexing Zhenjie Zhang, Beng Chi Ooi, Srinivasan Parthasarathy, Anthony K. H. Tung.

School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.

Reverse Furthest Neighbors in Spatial Databases Bin Yao, Feifei Li, Piyush Kumar Florida State University, USA.

Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.

Effectively Indexing Uncertain Moving Objects for Predictive Queries School of Computing National University of Singapore Department of Computer Science.

Using Trees to Depict a Forest Bin Liu, H. V. Jagadish EECS, University of Michigan, Ann Arbor Presented by Sergey Shepshelvich 1.

Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.

Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.

Probabilistic Similarity Search for Uncertain Time Series Presented by CAO Chen 21 st Feb, 2011.

LSDS-IR’08, October 30, Peer-to-Peer Similarity Search over Widely Distributed Document Collections Christos Doulkeridis 1, Kjetil Nørvåg 2, Michalis.

Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.

Reaching the Top-k of the Skyline: A efficient Indexed Algorithm for Top-k Skyline Queries Marlene Goncalves and María-Esther Vidal Universidad Simón Bolívar,

Evaluation of Top-k OLAP Queries Using Aggregate R-trees Nikos Mamoulis (HKU) Spiridon Bakiras (HKUST) Panos Kalnis (NUS)

Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.

Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.

Discovering Interesting Subsets Using Statistical Analysis Maitreya Natu and Girish K. Palshikar Tata Research Development and Design Centre (TRDDC) Pune,

SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:

1 Progressive Computation of Constrained Subspace Skyline Queries Evangelos Dellis 1 Akrivi Vlachou 1 Ilya Vladimirskiy 1 Bernhard Seeger 1 Yannis Theodoridis.

The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.

Learning user preferences for 2CP-regression for a recommender system Alan Eckhardt, Peter Vojtáš Department of Software Engineering, Charles University.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.

PMLAB Finding Similar Image Quickly Using Object Shapes Heng Tao Shen Dept. of Computer Science National University of Singapore Presented by Chin-Yi Tsai.

Approximate XML Joins Huang-Chun Yu Li Xu. Introduction XML is widely used to integrate data from different sources. Perform join operation for XML documents:

A Quantitative Analysis and Performance Study For Similar- Search Methods In High- Dimensional Space Presented By Umang Shah Koushik.

Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Distributed Spatio-Temporal Similarity Search Demetrios Zeinalipour-Yazti University of Cyprus Song Lin

Efficient Processing of Top-k Spatial Preference Queries

On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:

9/2/2005VLDB 2005, Trondheim, Norway1 On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern.

The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won.

All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)

Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.

A Smart Pre-Classifier to Reduce Power Consumption of TCAMs for Multi-dimensional Packet Classification Yadi Ma, Suman Banerjee University of Wisconsin-Madison.

Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.

What does the Cloud mean for Data Management: Challenges and Opportunities Akrivi Vlachou Norwegian University of Science and Technology (NTNU), Trondheim,

Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.

Efficient Computation of Combinatorial Skyline Queries Author: Yu-Chi Chung, I-Fang Su, and Chiang Lee Source: Information Systems, 38(2013), pp

Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia

On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications 陳良弼 Arbee L.P. Chen National Chengchi University 9/21/2012 at NCHU.

Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.

Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.

Answering Top-k Queries with Multi-Dimensional Selections: The Ranking Cube Approach Dong Xin, Jiawei Han, Hong Cheng, Xiaolei Li Department of Computer.

Answering Why-not Questions on Top-K Queries Andy He and Eric Lo The Hong Kong Polytechnic University.

03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.

Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.

Rethinking Choices for Multi-dimensional Point Indexing You Jung Kim and Jignesh M. Patel University of Michigan.

Probabilistic Skylines on Uncertain Data (VLDB2007) Jian Pei et al Supervisor: Dr Benjamin Kao Presenter: For Date: 22 Feb 2008 ??: the possible world.

Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.

A Uniﬁed Framework for Efﬁciently Processing Ranking Related Queries

Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS

Probabilistic Data Management

Similarity Search: A Matching Based Approach

The Skyline Query in Databases Which Objects are the Most Important?

Efficient Processing of Top-k Spatial Preference Queries

Donghui Zhang, Tian Xia Northeastern University

Presentation transcript:

Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU), Trondheim, Norway # Athens University of Economics and Business (AUEB), Greece

2 Outline Motivation & Preliminaries Monochromatic Reverse Top-k Queries Bichromatic Reverse Top-k Queries Threshold-based Algorithm Materialized Views Experimental Evaluation Conclusions & Future Work

3 Rank-aware Query Processing Huge amount of available data Users prefer to retrieve a limited set of k ranked data objects that best match their preferences (top-k queries)

4 Top-k Query Given a scoring function f(), retrieve the k object that best match the user preferences Linear scoring function f w (p) = Σ w[i]*p[i] Weight w[i]: relative importance of attribute i Definition TOP k (w): Given a weighting vector w and a positive integer k, find the k data points p with the minimum f(p) scores Query line of w at point p: defines the score of p Query space of w defined by point p: number of enclosed points determines the rank of p

5 From the perspective of manufacturers: it is important that a product is returned in the highest ranked positions for as many user preferences as possible estimate the impact of a product compared to their competitors products advertise a product to potential customers Reversing the Top-k Query sales representative customer Which customers would be interested?

6 Reverse top-k query: Given a potential product q and a positive integer k, which are the weighting vectors w for which q is in the top-k query result set? Two different versions Monochromatic: no knowledge of user preferences Bichromatic: a dataset with user preferences is given Reversing the Top-k Query sales representative customer

7 Car Database Example A database containing information about different cars Different users have different preferences Bob prefers a cheap car, and does not care much about the age the best choice (top-1) for Bob is the car p 1 with score 2.5 Tom prefers a newer car rather than a cheap car the best choice for Tom and Max is the car p 2

8 Car Database Example Query point q=p 2, k=1: Bichromatic reverse top-k: {(0.2,0.8), (0.5,0.5)} advertise product to Tom and Max Monochromatic reverse top-k: line segment w[price]=[1/7,5/6] estimate the impact of p 2 as 69% Query point q=p 3, k=1: empty result set for the bichromatic query

9 Outline Motivation & Preliminaries Monochromatic Reverse Top-k Queries Bichromatic Reverse Top-k Queries Threshold-based Algorithm Materialized Views Experimental Evaluation Conclusions & Future Work

10 Monochromatic Reverse Top-k Query mRTOP k (q): Given a point q, a positive number k and a dataset S, the result set of the monochromatic reverse top-k query is the locus for which there exists p in TOP k (w i ) such that f wi (q) ≤ f wi (p). The solution space W can be split into a finite set of non- adjacent partitions such that query point q has the same rank for all the weighting vectors. For the monochromatic case: we focus on the 2-d space Solution space 2 2 mRTOP 1 (q) 1

11 Geometric Interpretation d=2, k =1 If q belongs to the convex hull, then there exists exactly one partition in mRTOP 1 (q) Weighting vectors that are perpendicular to pq and qr define the line segment For weighting vectors with smaller and larger slopes than w 1, the relative order of p and q changes Monochromatic reverse top-k, k>1: The solution space may contain more than 1 partition

12 Geometric Interpretation, k >1 Monochromatic reverse top-k, k>1: The solution space may contain more than 1 partition Point q is in the top-2 result set for both w 1 and w 3, but not for w w1w1 w2w2 w3w3 w2w2

13 Example of Algorithm Points that dominate q are ranked higher for any w Points that are dominated by q are ranked lower for any w Incomparable points change their relative ranking once Vectors that are perpendicular to the line segments p i q define the partitions

14 Outline Motivation & Preliminaries Monochromatic Reverse Top-k Queries Bichromatic Reverse Top-k Queries Threshold-based Algorithm Materialized Views Experimental Evaluation Conclusions & Future Work

15 Bichromatic Reverse Top-k Query bRTOPk(q): Given a point q, a positive number k and two datasets S and W, where S represents data points and W is a dataset containing different weighting vectors, a weighting vector w i belongs to the result set, if and only if there exists p in TOP k (w i ) such that f wi (q) ≤ f wi (p) Naïve approach: for each weighting vector process the top-k query test if query point q is in the top-k list

16 Threshold-based Algorithm (RTA) Goal: reduce the number of top-k evaluations by discarding weighting vectors Threshold-based Algorithm (RTA): sort the weighting vectors based on pairwise similarity top-k queries defined by similar vectors, have similar result sets evaluate the first top-k query, calculate a threshold For each weighting vector possibly prune based on threshold refine threshold

17 Example of RTA Algorithm (k=2) Evaluate top-2 query for w 1 Set threshold based on w 2 f w2 (q) > threshold  discard w 2 Refine threshold for w 3 W=[ w 1, w 2, w 3 ] Buffer: p1, p2 w1w1 q p4p4 p1p1 p2p2 p3p3 p5p5 p6p6 p7p7 p8p8 p9p9 p 10 w2w2 w3w3

18 Materialized Views Threshold-based Algorithm (RTA) reduce the top-k evaluations by discarding some weighting vectors that are not in the reverse top-k result set process at least as many top-k evaluations as the cardinality of the result set Materialized Views find weighting vectors that belong definitely to the result without top-k evaluation

19 Materialized Views Grid-based space partitioning cell C i lower left corner C i L upper right corner C i U We store for each cell C i the results of reverse top-k queries for corners C i L and C i U w 1, w 2, w 3

20 Materialized Views Given a point q enclosed in cell C i all weighting vectors in RTOP k (C i U ) belong to the result set of q only weighting vectors in RTOP k (C i L ) - RTOP k (C i U ) have to be examined Materialized views can be generalized for arbitrary k<K values w 1, w 2, w 3 w 1, w 2, w 3, w 4

21 Outline Motivation & Preliminaries Monochromatic Reverse Top-k Queries Bichromatic Reverse Top-k Queries Threshold-based Algorithm Materialized Views Experimental Evaluation Conclusions & Future Work

22 Experimental Setup Comparison between Naïve and RTA (varying dimensionality, cardinality, data distribution – real data) Queries: uniform and k-skyband points Metrics: time I/Os number of top-k evaluations

23 RTA vs. Naïve RTA outperforms naive by 1 to 2 orders of magnitude as dimensionality increases, |RTOP k (q)| decreases leading to fewer top-k evaluations uniform distribution of S and uniform weights W |S|=10K, |W|=10K, top-k=10, skyband query points

24 Scalability of RTA Algorithm naive requires |W| top-k query evaluations |W|=5K, correlated dataset: RTA needs on 544 out of 5000 top-k evaluations (saves 89.12% of the cost) the average size of the result set is 459 various distributions (UN, AC, CO) of S and uniform weights W |S|=10K or |W|=10K, d=5, top-k=10, skyband query points

25 Performance of RTA on Real Data uniform and clustered weights W (|W|=10K) clustered weights lead to fewer top-k evaluations NBA consists of tuples, d=5 (number of points scored, rebounds, assists, steals and blocks) HOUSE consists of tuples, d=6 (income spent on gas, electricity, water, heating, insurance, and property tax)

26 Outline Motivation & Preliminaries Example of Reverse Top-k Queries Monochromatic Reverse Top-k Queries Bichromatic Reverse Top-k Queries Threshold-based Algorithm Materialized Views Experimental Evaluation Conclusions & Future Work

27 We introduced reverse top-k queries geometric interpretation of the solution space efficient algorithm for bichromatic reverse top-k query materialized reverse top-k views Future Work interpretation of solution space for higher dimensions (monochromatic reverse top-k) improve the performance of the bichromatic reverse top-k computation Conclusions and Future Work

28 Thank you! More information: Related work: Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis, Kjetil Nørvåg: "Reverse Top-k Queries" Akrivi Vlachou, Christos Doulkeridis, Kjetil Nørvåg, Yannis Kotidis: "Identifying the Most Influential Data Objects with Reverse Top-k Queries"