Presentation is loading. Please wait.

Presentation is loading. Please wait.

On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications 陳良弼 Arbee L.P. Chen National Chengchi University 9/21/2012 at NCHU.

Similar presentations


Presentation on theme: "On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications 陳良弼 Arbee L.P. Chen National Chengchi University 9/21/2012 at NCHU."— Presentation transcript:

1 On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications 陳良弼 Arbee L.P. Chen National Chengchi University 9/21/2012 at NCHU

2 IEEE International Conference on Data Engineering (ICDE) A premium international conference on databases Inaugural conference held at Los Angeles in 1984 Held in Taiwan in 1995

3 ICDE2012 Research Papers Distribution System Aspects – Privacy and Security 8% – Storage Management and Performance 7% – Entity resolution/Versioning 7% – Query Processing 31% Top-k query 9% Distributed/parallel/map-reduce 8% Location-aware 5% Execution Plan 5% Graph indexing 4%

4 Text/Web/Keyword Search 19% Stream/Trajectory/Sequence/Spatio-Temporal 10% Social Media 7% Uncertain Database 6% Data Mining 5%

5 Efficient Dual-Resolution Layer Indexing for Top-k Queries, ICDE2012 H1H1 H2H2 H3H3 H4H4 H5H5 H6H6 H7H7 H8H8 H9H9 pricedistance to the airportservice H1H1 0.550.40.5 H2H2 0.450.60.4 H9H9 0.5 0.1

6 H1H1 H2H2 H3H3 H4H4 H5H5 H6H6 H7H7 H8H8 H9H9 (price, distance to the airport) (0.6, 0.2) (0.55, 0.4) (0.45, 0.6) (0.3, 0.7) (0.55, 0.3) (0.3, 0.6) (0.2, 0.7) (0.7, 0.4) (0.5, 0.5) 0.525 0.5 0.45 0.475 0.425 0.4 0.55 0.5

7 H1H1 H4H4 H5H5 H6H6 H7H7 (price, distance to the airport) (0.6, 0.2) (0.55, 0.4) (0.55, 0.3) (0.3, 0.6) (0.2, 0.7) Hotel H7H7 H6H6 H4H4 H5H5 H1H1 0.45 0.475 0.425 0.4

8 Answering Why-not Questions on Top-k Queries, ICDE2012 Top-k query (Cleanliness, delicious, Parking spaces) (95,80,40) (70,20,30) (50,90,60) (75,70,50) (85,60,60) (58,20,30) Top-2(0.4,0.5,0.1) 82 41 71 70 36.2 p1 p2 p3 p4 p5 p6 69

9 Why-not question (Cleanliness, delicious, Parking spaces) Why p5 is not in my top-2 query list? 82 41 71 69 70 36.2 p1 p2 p3 p4 p5 p6 p5 does not exist? Should I change my weights? Should I revise my query to look for top-5 hotels? (95,80,40) (70,20,30) (50,90,60) (75,70,50) (85,60,60) (58,20,30) Top-2(0.5,0.4,0.1) 83.5 46 67 70.5 40 71.7

10 The Min-dist Location Selection Query, ICDE2012 c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 f1f1 f2f2 p1p1 p2p2 Nearest facility distance Minimize Nearest facility distance

11 c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 f1f1 f2f2 p1p1 Nearest facility distance

12 c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 f1f1 f2f2 p2p2

13 Introduction kNN (k-Nearest Neighbors) Queries Assume k = 3 q ab c kNN(q) = {a, b, c} 13

14 Introduction RkNN (Reverse k-Nearest Neighbors) Queries q a d Assume k = 3 RkNN(q) = {a, …} d 14

15 Introduction BRkNN (Bi-chromatic Reverse k-Nearest Neighbors) Queries q a d Assume k = 3 BRkNN(q) = {a, …} d Two types of data 15

16 Application I shop customer Which location is the best?

17 Top-n Reverse kNN Queries Given two types of data G (goal) and C (condition) G: C: Retrieve n data points from G, which have the largest BRkNN values g1g1 g2g2 g3g3 Example: n=2, k=2 BR2NN value of g 1 = 4 BR2NN value of g 2 = 9 BR2NN value of g 3 = 5 BR2 Top-2 = {g 2, g 3 }

18 Voronoi Diagram of G 18 : goal point (VD-node) : condition point

19 A Filter-Refinement Framework for Solving BRkNN Queries VD i Assume k = 2 Lower-bound region of VDi (layer 0) Upper-bound region of VD i (layer 0 ~ layer (k-1)) Layer 0 Layer 1 19

20 Filter phase VD i Assume k = 2 Construct bisectors layer by layer to reduce the region 20

21 Refinement Phase Assume k = 2 For a data point p, we want to check VDs at layer 1 ~ layer 2 to make sure whether VDi is one of the 2NN of p VD i 21 p

22 Refinement Phase Assume k = 2 VD i p VD i : (VD 13, 1.2) (VD 26, 1.4) (VD 27, 1.7) (VD 3, 1.7) (VD 4, 1.8) (VD 30, 2.1) (VD 5, 2.5) (VD 7, 4.8) VD 30 dist(p, VD 30 ) > 1.2 0.9 2.1 >1.2 … 22

23 Refinement Phase Assume k = 2 VD i p VD i : (VD 13, 1.2) (VD 26, 1.4) (VD 27, 1.7) (VD 3, 1.7) (VD 4, 1.8) (VD 30, 2.1) (VD 5, 2.5) (VD 7, 4.8) 0.9 2.1 >1.2 dist(VD i, VD j ) > 2  dist(VD i, p) … 23 VD 30

24 Application II 24 Maximum Coverage BRkNN Queries Retrieve 2 points from dataset G Assume k = 2

25 25 BRkNN value = 9

26 26 BRkNN value = 8

27 27 total = 12

28 28 total = 14

29 Maximum Coverage BRkNN Queries Given: – A set of goal points (G) – A set of condition points (C) – k: the k value of BRkNN Goal: – Find n points from G, g 1, g 2, …, g n, which maximize | ∪ i=1~n BRkNN(g i,G,C)| G C 29

30 Application III Find n Most Favorite Products based on Reverse Top- k Queries

31 AirlineFareFood a1a1 0.80.2 a2a2 0.60.4 a3a3 1 a4a4 0.8 a5a5 0.40.6 HotelLocationComfortCleanness h1h1 0.40.60.4 h2h2 0.6 h3h3 0.40.80.2 h4h4 0.6 0.2 h5h5 0.60.80.4 h6h6 10.20.6 AirlinesHotels PackageFareFoodLocationComfortCleanness (a 1, h 1 )0.80.20.40.60.4 (a 1, h 2 )0.80.20.40.6 (a 1, h 3 )0.80.20.40.80.2 … (a 5, h 5 )0.40.6 0.80.4 (a 5, h 6 )0.40.610.20.6 All candidate packages Which are the most favorite packages? 31

32 PackageFareFoodLocationComfortCleanness (a 1, h 1 )0.80.20.40.60.4 (a 1, h 2 )0.80.20.40.6 (a 1, h 3 )0.80.20.40.80.2 … (a 5, h 5 )0.40.6 0.80.4 (a 5, h 6 )0.40.610.20.6 All candidate packages CustomerFareFoodLocationComfortCleanness c1c1 00.20.50.10.2 c2c2 0.10.30.10.30.2 c3c3 0.300.10.3 c4c4 0.10.20.30.1 c5c5 0 0.300.6 Customer preferences C1- (a 1, h 1 ): 0.8  0+0.2  0.2+0.4  0.5+0.6  0.1+0.4  0.2 =0.38 (a 1, h 2 ): 0.8  0+0.2  0.2+0.4  0.5+0.6  0.1+0.6  0.2 =0.42 … C2- (a 1, h 1 ): 0.8  0.1+0.2  0.3+0.4  0.1+0.6  0.3+0.4  0.2 =0.44 (a 1, h 2 ): 0.8  0.1+0.2  0.3+0.4  0.1+0.6  0.3+0.6  0.2 =0.48 … CustomerFareFoodLocationComfortCleannessTop-2 favorites c1c1 00.20.50.10.2{(a 3, h 6 ), (a 5, h 6 )} c2c2 0.10.30.10.30.2{(a 3, h 2 ), (a 3, h 5 )} c3c3 0.300.10.3 {(a 1, h 2 ), (a 1, h 5 )} c4c4 0.30.10.20.30.1 {(a 1, h 5 ), (a 2, h 5 ), (a 3, h 5 )} c5c5 00.10.300.6{(a 3, h 6 ), (a 4, h 6 )} 32 Top-k Queries (Customer’s View)

33 PackageFareFoodLocationComfortCleanness (a 1, h 1 )0.80.20.40.60.4 (a 1, h 2 )0.80.20.40.6 (a 1, h 3 )0.80.20.40.80.2 … (a 5, h 5 )0.40.6 0.80.4 (a 5, h 6 )0.40.610.20.6 All candidate packages Customer preferences CustomerFareFoodLocationComfortCleannessTop-2 favorites c1c1 00.20.50.10.2{(a 3, h 6 ), (a 5, h 6 )} c2c2 0.10.30.10.30.2{(a 3, h 2 ), (a 3, h 5 )} c3c3 0.300.10.3 {(a 1, h 2 ), (a 1, h 5 )} c4c4 0.30.10.20.30.1 {(a 1, h 5 ), (a 2, h 5 ), (a 3, h 5 )} c5c5 00.10.300.6{(a 3, h 6 ), (a 4, h 6 )} Retrieve the customers whose top-2 favorites contain (a 1, h 2 ) 33  {c 3 } #customers in the reverse top-k query for a product is a good estimate of the favoring degree of the product in the market Reverse Top-k Queries (Travel Agency’s View)

34 PackageFareFoodLocationComfortCleanness (a 1, h 1 )0.80.20.40.60.4 (a 1, h 2 )0.80.20.40.6 … (a 1, h 5 )0.80.20.60.80.4 … (a 3, h 6 )0.4110.20.6 … (a 5, h 6 )0.40.610.20.6 All candidate packages Customer preferences CustomerFareFoodLocationComfortCleannessTop-2 favorites c1c1 00.20.50.10.2{(a 3, h 6 ), (a 5, h 6 )} c2c2 0.10.30.10.30.2{(a 3, h 2 ), (a 3, h 5 )} c3c3 0.300.10.3 {(a 1, h 2 ), (a 1, h 5 )} c4c4 0.30.10.20.30.1 {(a 1, h 5 ), (a 2, h 5 ), (a 3, h 5 )} c5c5 00.10.300.6{(a 3, h 6 ), (a 4, h 6 )} (a 1, h 2 ): {c 3 } (a 1, h 5 ): {c 3, c 4 } (a 2, h 5 ): {c 4 } (a 3, h 2 ): {c 2 } (a 3, h 5 ): {c 2, c 4 } (a 3, h 6 ): {c 1, c 5 } (a 4, h 6 ): {c 5 } (a 5, h 6 ): {c 1 } 34 k (#packages considered by customers) = 2 (a 1, h 2 ): {c 3 } (a 1, h 5 ): {c 3, c 4 } (a 2, h 5 ): {c 4 } (a 3, h 2 ): {c 2 } (a 3, h 5 ): {c 2, c 4 } (a 3, h 6 ): {c 1, c 5 } (a 4, h 6 ): {c 5 } (a 5, h 6 ): {c 1 } n (#packages to be offered by the travel agency) = 2

35 Given a set of component tables T 1, T 2, …, and T x, which form a set of the candidate products P, a set of customers C with different preferences on the products, and two positive integers k and n RTOP k (cp, P, C): the set of the customers whose top-k favorites contain the candidate product cp Retrieve the minimum subset P’ of P such that |P’|  n and is maximized Maximum coverage problem: NP-hard 35 Problem Definition of n-k MFP

36 36 An object p is said to dominate another object q if and only if p is larger than or equal to q on all dimensions and p is larger than q on at least one dimension Given a set of multi-dimensional objects, the skyline consists of the objects which are not dominated by any other object 0 A1 A2 Skyline

37 Only the component tuples dominated by at most (k-1) other tuples in the same component table have the possibility of being a part of a top-k product for a customer c 37 AirlineFareFood … a3a3 0.41 a4a4 0.8 a5a5 0.40.6 Airlines HotelLocationComfortCleanness h1h1 0.40.60.4 … Hotels PackageFareFoodLocationComfortCleanness (a 3, h 1 )0.41 0.60.4 (a 4, h 1 )0.40.80.40.60.4 (a 5, h 1 )0.40.60.40.60.4

38 AirlineFareFood a 1 (0)0.80.2 a 2 (0)0.60.4 a 3 (0)0.41 a 4 (1)0.40.8 a 5 (2)0.40.6 HotelLocationComfortCleanness h 1 (2)0.40.60.4 h 2 (0)0.40.6 h 3 (1)0.40.80.2 h 4 (1)0.6 0.2 h 5 (0)0.60.80.4 h 6 (0)10.20.6 38 AirlineFareFood a 1 (0)0.80.2 a 2 (0)0.60.4 a 3 (0)0.41 a 4 (1)0.40.8 a 5 (2)0.40.6 HotelLocationComfortCleanness h 1 (2)0.40.60.4 h 2 (0)0.40.6 h 3 (1)0.40.80.2 h 4 (1)0.6 0.2 h 5 (0)0.60.80.4 h 6 (0)10.20.6 AirlinesHotels AirlineFareFood a 1 (0)0.80.2 a 2 (0)0.60.4 a 3 (0)0.41 a 4 (1)0.40.8 HotelLocationComfortCleanness h 2 (0)0.40.6 h 3 (1)0.40.80.2 h 4 (1)0.6 0.2 h 5 (0)0.60.80.4 h 6 (0)10.20.6

39 For any two candidate products cp 1 and cp 2 in P, if cp 1 dominates cp 2, RTOP k (cp 2, P, C)  RTOP k (cp 1, P, C) For any candidate product cp in P, if cp  Skyline(P), cp  n-k MFP 39 0 A1 A2 The candidate products in the n-k MFP must be in Skyline(P)

40  : the set of candidate products generated from Skyline(T 1 ), Skyline(T 2 ), …, and Skyline(T x ) A candidate product cp  Skyline(P) if and only if cp   [VLDB’09] Only the skyline tuples of each component table have the possibility of being a part of a candidate product in the n-k MFP 40 AirlinesHotels AirlineFareFood a 1 (0)0.80.2 a 2 (0)0.60.4 a 3 (0)0.41 a 4 (1)0.40.8 HotelLocationComfortCleanness h 2 (0)0.40.6 h 3 (1)0.40.80.2 h 4 (1)0.6 0.2 h 5 (0)0.60.80.4 h 6 (0)10.20.6

41 Only the customers in RTOP k (cp, Skyline(P), C) possibly become the members in RTOP k (cp, P, C) 41 PackageUpper bound (a 1, h 2 ){c 3 } (a 1, h 5 ){c 3, c 4 } (a 1, h 6 ){} (a 2, h 2 ){} (a 2, h 5 ){c 4 } (a 2, h 6 ){c 1, c 5 } (a 3, h 2 ){c 2 } (a 3, h 5 ){c 2, c 4 } (a 3, h 6 ){c 1, c 5 } The upper bounds of the remaining candidate packages RTOP k (cp, Skyline(P), C) is an upper bound of RTOP k (cp, P, C)

42 42 PackageUpper bound (a 1, h 2 ){c 3 } (a 1, h 5 ){c 3, c 4 } (a 2, h 5 ){c 4 } (a 2, h 6 ){c 1, c 5 } (a 3, h 2 ){c 2 } (a 3, h 5 ){c 2, c 4 } (a 3, h 6 ){c 1, c 5 } The top-2 favorites of C 3 : {(a 1, h 5 ), (a 1, h 2 )} The top-2 favorites of C 4 : {(a 1, h 5 ), (a 2, h 5 ), (a 3, h 5 )} P’ : {(a 1, h 5 )}

43 43 PackageUpper bound (a 2, h 6 ){c 1, c 5 } (a 3, h 2 ){c 2 } (a 3, h 5 ){c 2 } (a 3, h 6 ){c 1, c 5 } The top-2 favorites of C 1 : {(a 3, h 6 ), (a 4, h 6 )} The top-2 favorites of C 5 : {(a 3, h 6 ), (a 4, h 6 )} P’ : {(a 1, h 5 ), (a 3, h 6 )}P’ : {(a 1, h 5 )}

44 Application IV u1 u2 Year 1 1 1 1 1 1 2 k=1 : user preferences : products Mileage Find Most Favorite Products by Top-k Reverse Skyline Queries

45 Thank you for your attention!


Download ppt "On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications 陳良弼 Arbee L.P. Chen National Chengchi University 9/21/2012 at NCHU."

Similar presentations


Ads by Google