Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),

Similar presentations


Presentation on theme: "Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),"— Presentation transcript:

1 Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU), Trondheim, Norway # Athens University of Economics and Business (AUEB), Greece

2 2 Outline Motivation & Preliminaries Monochromatic Reverse Top-k Queries Bichromatic Reverse Top-k Queries Threshold-based Algorithm Materialized Views Experimental Evaluation Conclusions & Future Work

3 3 Rank-aware Query Processing Huge amount of available data Users prefer to retrieve a limited set of k ranked data objects that best match their preferences (top-k queries)

4 4 Top-k Query Given a scoring function f(), retrieve the k object that best match the user preferences Linear scoring function f w (p) = Σ w[i]*p[i] Weight w[i]: relative importance of attribute i Definition TOP k (w): Given a weighting vector w and a positive integer k, find the k data points p with the minimum f(p) scores Query line of w at point p: defines the score of p Query space of w defined by point p: number of enclosed points determines the rank of p

5 5 From the perspective of manufacturers: it is important that a product is returned in the highest ranked positions for as many user preferences as possible estimate the impact of a product compared to their competitors products advertise a product to potential customers Reversing the Top-k Query sales representative customer Which customers would be interested?

6 6 Reverse top-k query: Given a potential product q and a positive integer k, which are the weighting vectors w for which q is in the top-k query result set? Two different versions Monochromatic: no knowledge of user preferences Bichromatic: a dataset with user preferences is given Reversing the Top-k Query sales representative customer

7 7 Car Database Example A database containing information about different cars Different users have different preferences Bob prefers a cheap car, and does not care much about the age the best choice (top-1) for Bob is the car p 1 with score 2.5 Tom prefers a newer car rather than a cheap car the best choice for Tom and Max is the car p 2

8 8 Car Database Example Query point q=p 2, k=1: Bichromatic reverse top-k: {(0.2,0.8), (0.5,0.5)} advertise product to Tom and Max Monochromatic reverse top-k: line segment w[price]=[1/7,5/6] estimate the impact of p 2 as 69% Query point q=p 3, k=1: empty result set for the bichromatic query

9 9 Outline Motivation & Preliminaries Monochromatic Reverse Top-k Queries Bichromatic Reverse Top-k Queries Threshold-based Algorithm Materialized Views Experimental Evaluation Conclusions & Future Work

10 10 Monochromatic Reverse Top-k Query mRTOP k (q): Given a point q, a positive number k and a dataset S, the result set of the monochromatic reverse top-k query is the locus for which there exists p in TOP k (w i ) such that f wi (q) ≤ f wi (p). The solution space W can be split into a finite set of non- adjacent partitions such that query point q has the same rank for all the weighting vectors. For the monochromatic case: we focus on the 2-d space Solution space 2 2 mRTOP 1 (q) 1

11 11 Geometric Interpretation d=2, k =1 If q belongs to the convex hull, then there exists exactly one partition in mRTOP 1 (q) Weighting vectors that are perpendicular to pq and qr define the line segment For weighting vectors with smaller and larger slopes than w 1, the relative order of p and q changes Monochromatic reverse top-k, k>1: The solution space may contain more than 1 partition

12 12 Geometric Interpretation, k >1 Monochromatic reverse top-k, k>1: The solution space may contain more than 1 partition Point q is in the top-2 result set for both w 1 and w 3, but not for w 2 2 2 3 w1w1 w2w2 w3w3 w2w2

13 13 Example of Algorithm Points that dominate q are ranked higher for any w Points that are dominated by q are ranked lower for any w Incomparable points change their relative ranking once Vectors that are perpendicular to the line segments p i q define the partitions

14 14 Outline Motivation & Preliminaries Monochromatic Reverse Top-k Queries Bichromatic Reverse Top-k Queries Threshold-based Algorithm Materialized Views Experimental Evaluation Conclusions & Future Work

15 15 Bichromatic Reverse Top-k Query bRTOPk(q): Given a point q, a positive number k and two datasets S and W, where S represents data points and W is a dataset containing different weighting vectors, a weighting vector w i belongs to the result set, if and only if there exists p in TOP k (w i ) such that f wi (q) ≤ f wi (p) Naïve approach: for each weighting vector process the top-k query test if query point q is in the top-k list

16 16 Threshold-based Algorithm (RTA) Goal: reduce the number of top-k evaluations by discarding weighting vectors Threshold-based Algorithm (RTA): sort the weighting vectors based on pairwise similarity top-k queries defined by similar vectors, have similar result sets evaluate the first top-k query, calculate a threshold For each weighting vector possibly prune based on threshold refine threshold

17 17 Example of RTA Algorithm (k=2) Evaluate top-2 query for w 1 Set threshold based on w 2 f w2 (q) > threshold  discard w 2 Refine threshold for w 3 W=[ w 1, w 2, w 3 ] Buffer: p1, p2 w1w1 q p4p4 p1p1 p2p2 p3p3 p5p5 p6p6 p7p7 p8p8 p9p9 p 10 w2w2 w3w3

18 18 Materialized Views Threshold-based Algorithm (RTA) reduce the top-k evaluations by discarding some weighting vectors that are not in the reverse top-k result set process at least as many top-k evaluations as the cardinality of the result set Materialized Views find weighting vectors that belong definitely to the result without top-k evaluation

19 19 Materialized Views Grid-based space partitioning cell C i lower left corner C i L upper right corner C i U We store for each cell C i the results of reverse top-k queries for corners C i L and C i U w 1, w 2, w 3

20 20 Materialized Views Given a point q enclosed in cell C i all weighting vectors in RTOP k (C i U ) belong to the result set of q only weighting vectors in RTOP k (C i L ) - RTOP k (C i U ) have to be examined Materialized views can be generalized for arbitrary k<K values w 1, w 2, w 3 w 1, w 2, w 3, w 4

21 21 Outline Motivation & Preliminaries Monochromatic Reverse Top-k Queries Bichromatic Reverse Top-k Queries Threshold-based Algorithm Materialized Views Experimental Evaluation Conclusions & Future Work

22 22 Experimental Setup Comparison between Naïve and RTA (varying dimensionality, cardinality, data distribution – real data) Queries: uniform and k-skyband points Metrics: time I/Os number of top-k evaluations

23 23 RTA vs. Naïve RTA outperforms naive by 1 to 2 orders of magnitude as dimensionality increases, |RTOP k (q)| decreases leading to fewer top-k evaluations uniform distribution of S and uniform weights W |S|=10K, |W|=10K, top-k=10, skyband query points

24 24 Scalability of RTA Algorithm naive requires |W| top-k query evaluations |W|=5K, correlated dataset: RTA needs on 544 out of 5000 top-k evaluations (saves 89.12% of the cost) the average size of the result set is 459 various distributions (UN, AC, CO) of S and uniform weights W |S|=10K or |W|=10K, d=5, top-k=10, skyband query points

25 25 Performance of RTA on Real Data uniform and clustered weights W (|W|=10K) clustered weights lead to fewer top-k evaluations NBA consists of 17265 tuples, d=5 (number of points scored, rebounds, assists, steals and blocks) HOUSE consists of 127930 tuples, d=6 (income spent on gas, electricity, water, heating, insurance, and property tax)

26 26 Outline Motivation & Preliminaries Example of Reverse Top-k Queries Monochromatic Reverse Top-k Queries Bichromatic Reverse Top-k Queries Threshold-based Algorithm Materialized Views Experimental Evaluation Conclusions & Future Work

27 27 We introduced reverse top-k queries geometric interpretation of the solution space efficient algorithm for bichromatic reverse top-k query materialized reverse top-k views Future Work interpretation of solution space for higher dimensions (monochromatic reverse top-k) improve the performance of the bichromatic reverse top-k computation Conclusions and Future Work

28 28 Thank you! More information: http://www.idi.ntnu.no/~vlachou/ Related work: Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis, Kjetil Nørvåg: "Reverse Top-k Queries" Akrivi Vlachou, Christos Doulkeridis, Kjetil Nørvåg, Yannis Kotidis: "Identifying the Most Influential Data Objects with Reverse Top-k Queries"


Download ppt "Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),"

Similar presentations


Ads by Google