On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications 陳良弼 Arbee L.P. Chen National Chengchi University 9/21/2012 at NCHU.

Slides:

Advertisements

Similar presentations

Identifying the Most Influential Data Objects with Reverse Top-k Queries By Akrivi Vlachou 1, Christos Doulkeridis 1, Kjetil Nørvag 1 and Yannis Kotidis.

Advertisements

Finding the Sites with Best Accessibilities to Amenities Qianlu Lin, Chuan Xiao, Muhammad Aamir Cheema and Wei Wang University of New South Wales, Australia.

Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.

指導教授：陳良弼老師報告者：鄧雅文  Introduction  Related Work  Problem Formulation  Future Work.

Spatio-temporal Databases

1 Chapter 5 : Query Processing and Optimization Group 4: Nipun Garg, Surabhi Mithal

1 Finding Shortest Paths on Terrains by Killing Two Birds with One Stone Manohar Kaul (Aarhus University) Raymond Chi-Wing Wong (Hong Kong University of.

School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.

Click to edit Present’s Name SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries Shiyu Yang 1, Muhammad Aamir Cheema 2,1, Xuemin.

Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.

1 NNH: Improving Performance of Nearest- Neighbor Searches Using Histograms Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research) Chen Li (UC Irvine)

Efﬁcient Reverse k-Nearest Neighbors Retrieval with Local kNN-Distance Estimation Mike Lin.

Constructing Popular Routes from Uncertain Trajectories Authors of Paper: Ling-Yin Wei (National Chiao Tung University, Hsinchu) Yu Zheng (Microsoft Research.

A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.

 Motivation  Reverse Queries  From Reverse to Inverse  Inverse Queries  Formal Definition  Applications  Framework  Experiments  Future Extensions.

Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.

Cheng, Xie, Yiu, Chen, Sun UV-diagram: a Voronoi Diagram for uncertain data 26th IEEE International Conference on Data Engineering Reynold Cheng (University.

Spatio-temporal Databases Time Parameterized Queries.

1 Efficient Method for Maximizing Bichromatic Reverse Nearest Neighbor Raymond Chi-Wing Wong (Hong Kong University of Science and Technology) M. Tamer.

Effective Indexing and Filtering for Similarity Search in Large Biosequence Databases O. Ozturk and H. Ferhatosmanoglu. IEEE International Symp. on Bioinformatics.

Continuous Data Stream Processing MAKE Lab Date: 2006/03/07 Post-Excellence Project Subproject 6.

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.

Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.

Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science.

Research Overview Kyriakos Mouratidis Assistant Professor School of Information Systems Singapore Management University

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:

Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey,

Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.

Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),

Efficient Computation of Reverse Skyline Queries VLDB 2007.

Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.

K-Hit Query: Top-k Query Processing with Probabilistic Utility Function SIGMOD2015 Peng Peng, Raymond C.-W. Wong CSE, HKUST 1.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Influence Zone: Efficiently Processing Reverse k Nearest Neighbors Queries Presented By: Muhammad Aamir Cheema Joint work with Xuemin Lin, Wenjie Zhang,

Data Management+ Laboratory Dynamic Skylines Considering Range Queries Speaker: Adam Adviser: Yuling Hsueh 16th International Conference, DASFAA 2011 Wen-Chi.

Efficient Processing of Top-k Spatial Preference Queries

Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.

On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:

1 On Optimal Worst-Case Matching Cheng Long (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and.

Clustering of Uncertain data objects by Voronoi- diagram-based approach Speaker: Chan Kai Fong, Paul Dept of CS, HKU.

Efficient Computing k-Coverage Paths in Multihop Wireless Sensor Networks XuFei Mao, ShaoJie Tang, and Xiang-Yang Li Dept. of Computer Science, Illinois.

The σ-neighborhood skyline queries Chen, Yi-Chung; LEE, Chiang. The σ-neighborhood skyline queries. Information Sciences, 2015, 322: 張天彥 2015/12/05.

Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.

KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

Information Technology (Some) Research Trends in Location-based Services Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia.

1 Finding Competitive Price Yu Peng (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and Technology)

Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.

Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.

Spatial Range Querying for Gaussian-Based Imprecise Query Objects Yoshiharu Ishikawa, Yuichi Iijima Nagoya University Jeffrey Xu Yu The Chinese University.

Answering Why-not Questions on Top-K Queries Andy He and Eric Lo The Hong Kong Polytechnic University.

Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.

Presented by: Dardan Xhymshiti Fall  Type: Research paper  Authors:  International conference on Very Large Data Bases. Yoonjar Park Seoul National.

Da Yan, Raymond Chi-Wing Wong, and Wilfred Ng The Hong Kong University of Science and Technology.

KNN & Naïve Bayes Hongning Wang

1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.

Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.

Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS

Visualization of query processing over large-scale road networks

Preference Query Evaluation Over Expensive Attributes

Introduction to Spatial Databases

Voronoi-based Geospatial Query Processing with MapReduce

Probabilistic Data Management

Xu Zhou Kenli Li Yantao Zhou Keqin Li

Uncertain Data Mobile Group 报告人：郝兴.

The Skyline Query in Databases Which Objects are the Most Important?

Efficient Processing of Top-k Spatial Preference Queries

Presentation transcript:

On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications 陳良弼 Arbee L.P. Chen National Chengchi University 9/21/2012 at NCHU

IEEE International Conference on Data Engineering (ICDE) A premium international conference on databases Inaugural conference held at Los Angeles in 1984 Held in Taiwan in 1995

ICDE2012 Research Papers Distribution System Aspects – Privacy and Security 8% – Storage Management and Performance 7% – Entity resolution/Versioning 7% – Query Processing 31% Top-k query 9% Distributed/parallel/map-reduce 8% Location-aware 5% Execution Plan 5% Graph indexing 4%

Text/Web/Keyword Search 19% Stream/Trajectory/Sequence/Spatio-Temporal 10% Social Media 7% Uncertain Database 6% Data Mining 5%

Efficient Dual-Resolution Layer Indexing for Top-k Queries, ICDE2012 H1H1 H2H2 H3H3 H4H4 H5H5 H6H6 H7H7 H8H8 H9H9 pricedistance to the airportservice H1H H2H H9H

H1H1 H2H2 H3H3 H4H4 H5H5 H6H6 H7H7 H8H8 H9H9 (price, distance to the airport) (0.6, 0.2) (0.55, 0.4) (0.45, 0.6) (0.3, 0.7) (0.55, 0.3) (0.3, 0.6) (0.2, 0.7) (0.7, 0.4) (0.5, 0.5)

H1H1 H4H4 H5H5 H6H6 H7H7 (price, distance to the airport) (0.6, 0.2) (0.55, 0.4) (0.55, 0.3) (0.3, 0.6) (0.2, 0.7) Hotel H7H7 H6H6 H4H4 H5H5 H1H

Answering Why-not Questions on Top-k Queries, ICDE2012 Top-k query (Cleanliness, delicious, Parking spaces) (95,80,40) (70,20,30) (50,90,60) (75,70,50) (85,60,60) (58,20,30) Top-2(0.4,0.5,0.1) p1 p2 p3 p4 p5 p6 69

Why-not question (Cleanliness, delicious, Parking spaces) Why p5 is not in my top-2 query list? p1 p2 p3 p4 p5 p6 p5 does not exist? Should I change my weights? Should I revise my query to look for top-5 hotels? (95,80,40) (70,20,30) (50,90,60) (75,70,50) (85,60,60) (58,20,30) Top-2(0.5,0.4,0.1)

The Min-dist Location Selection Query, ICDE2012 c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 f1f1 f2f2 p1p1 p2p2 Nearest facility distance Minimize Nearest facility distance

c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 f1f1 f2f2 p1p1 Nearest facility distance

c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 f1f1 f2f2 p2p2

Introduction kNN (k-Nearest Neighbors) Queries Assume k = 3 q ab c kNN(q) = {a, b, c} 13

Introduction RkNN (Reverse k-Nearest Neighbors) Queries q a d Assume k = 3 RkNN(q) = {a, …} d 14

Introduction BRkNN (Bi-chromatic Reverse k-Nearest Neighbors) Queries q a d Assume k = 3 BRkNN(q) = {a, …} d Two types of data 15

Application I shop customer Which location is the best?

Top-n Reverse kNN Queries Given two types of data G (goal) and C (condition) G: C: Retrieve n data points from G, which have the largest BRkNN values g1g1 g2g2 g3g3 Example: n=2, k=2 BR2NN value of g 1 = 4 BR2NN value of g 2 = 9 BR2NN value of g 3 = 5 BR2 Top-2 = {g 2, g 3 }

Voronoi Diagram of G 18 : goal point (VD-node) : condition point

A Filter-Refinement Framework for Solving BRkNN Queries VD i Assume k = 2 Lower-bound region of VDi (layer 0) Upper-bound region of VD i (layer 0 ~ layer (k-1)) Layer 0 Layer 1 19

Filter phase VD i Assume k = 2 Construct bisectors layer by layer to reduce the region 20

Refinement Phase Assume k = 2 For a data point p, we want to check VDs at layer 1 ~ layer 2 to make sure whether VDi is one of the 2NN of p VD i 21 p

Refinement Phase Assume k = 2 VD i p VD i : (VD 13, 1.2) (VD 26, 1.4) (VD 27, 1.7) (VD 3, 1.7) (VD 4, 1.8) (VD 30, 2.1) (VD 5, 2.5) (VD 7, 4.8) VD 30 dist(p, VD 30 ) ＞ >1.2 … 22

Refinement Phase Assume k = 2 VD i p VD i : (VD 13, 1.2) (VD 26, 1.4) (VD 27, 1.7) (VD 3, 1.7) (VD 4, 1.8) (VD 30, 2.1) (VD 5, 2.5) (VD 7, 4.8) >1.2 dist(VD i, VD j ) ＞ 2  dist(VD i, p) … 23 VD 30

Application II 24 Maximum Coverage BRkNN Queries Retrieve 2 points from dataset G Assume k = 2

25 BRkNN value = 9

26 BRkNN value = 8

27 total = 12

28 total = 14

Maximum Coverage BRkNN Queries Given: – A set of goal points (G) – A set of condition points (C) – k: the k value of BRkNN Goal: – Find n points from G, g 1, g 2, …, g n, which maximize | ∪ i=1~n BRkNN(g i,G,C)| G C 29

Application III Find n Most Favorite Products based on Reverse Top- k Queries

AirlineFareFood a1a a2a a3a3 1 a4a4 0.8 a5a HotelLocationComfortCleanness h1h h2h2 0.6 h3h h4h h5h h6h AirlinesHotels PackageFareFoodLocationComfortCleanness (a 1, h 1 ) (a 1, h 2 ) (a 1, h 3 ) … (a 5, h 5 ) (a 5, h 6 ) All candidate packages Which are the most favorite packages? 31

PackageFareFoodLocationComfortCleanness (a 1, h 1 ) (a 1, h 2 ) (a 1, h 3 ) … (a 5, h 5 ) (a 5, h 6 ) All candidate packages CustomerFareFoodLocationComfortCleanness c1c c2c c3c c4c c5c Customer preferences C1- (a 1, h 1 ): 0.8      0.2 =0.38 (a 1, h 2 ): 0.8      0.2 =0.42 … C2- (a 1, h 1 ): 0.8      0.2 =0.44 (a 1, h 2 ): 0.8      0.2 =0.48 … CustomerFareFoodLocationComfortCleannessTop-2 favorites c1c {(a 3, h 6 ), (a 5, h 6 )} c2c {(a 3, h 2 ), (a 3, h 5 )} c3c {(a 1, h 2 ), (a 1, h 5 )} c4c {(a 1, h 5 ), (a 2, h 5 ), (a 3, h 5 )} c5c {(a 3, h 6 ), (a 4, h 6 )} 32 Top-k Queries (Customer’s View)

PackageFareFoodLocationComfortCleanness (a 1, h 1 ) (a 1, h 2 ) (a 1, h 3 ) … (a 5, h 5 ) (a 5, h 6 ) All candidate packages Customer preferences CustomerFareFoodLocationComfortCleannessTop-2 favorites c1c {(a 3, h 6 ), (a 5, h 6 )} c2c {(a 3, h 2 ), (a 3, h 5 )} c3c {(a 1, h 2 ), (a 1, h 5 )} c4c {(a 1, h 5 ), (a 2, h 5 ), (a 3, h 5 )} c5c {(a 3, h 6 ), (a 4, h 6 )} Retrieve the customers whose top-2 favorites contain (a 1, h 2 ) 33  {c 3 } #customers in the reverse top-k query for a product is a good estimate of the favoring degree of the product in the market Reverse Top-k Queries (Travel Agency’s View)

PackageFareFoodLocationComfortCleanness (a 1, h 1 ) (a 1, h 2 ) … (a 1, h 5 ) … (a 3, h 6 ) … (a 5, h 6 ) All candidate packages Customer preferences CustomerFareFoodLocationComfortCleannessTop-2 favorites c1c {(a 3, h 6 ), (a 5, h 6 )} c2c {(a 3, h 2 ), (a 3, h 5 )} c3c {(a 1, h 2 ), (a 1, h 5 )} c4c {(a 1, h 5 ), (a 2, h 5 ), (a 3, h 5 )} c5c {(a 3, h 6 ), (a 4, h 6 )} (a 1, h 2 ): {c 3 } (a 1, h 5 ): {c 3, c 4 } (a 2, h 5 ): {c 4 } (a 3, h 2 ): {c 2 } (a 3, h 5 ): {c 2, c 4 } (a 3, h 6 ): {c 1, c 5 } (a 4, h 6 ): {c 5 } (a 5, h 6 ): {c 1 } 34 k (#packages considered by customers) = 2 (a 1, h 2 ): {c 3 } (a 1, h 5 ): {c 3, c 4 } (a 2, h 5 ): {c 4 } (a 3, h 2 ): {c 2 } (a 3, h 5 ): {c 2, c 4 } (a 3, h 6 ): {c 1, c 5 } (a 4, h 6 ): {c 5 } (a 5, h 6 ): {c 1 } n (#packages to be offered by the travel agency) = 2

Given a set of component tables T 1, T 2, …, and T x, which form a set of the candidate products P, a set of customers C with different preferences on the products, and two positive integers k and n RTOP k (cp, P, C): the set of the customers whose top-k favorites contain the candidate product cp Retrieve the minimum subset P’ of P such that |P’|  n and is maximized Maximum coverage problem: NP-hard 35 Problem Definition of n-k MFP

36 An object p is said to dominate another object q if and only if p is larger than or equal to q on all dimensions and p is larger than q on at least one dimension Given a set of multi-dimensional objects, the skyline consists of the objects which are not dominated by any other object 0 A1 A2 Skyline

Only the component tuples dominated by at most (k-1) other tuples in the same component table have the possibility of being a part of a top-k product for a customer c 37 AirlineFareFood … a3a a4a4 0.8 a5a Airlines HotelLocationComfortCleanness h1h … Hotels PackageFareFoodLocationComfortCleanness (a 3, h 1 ) (a 4, h 1 ) (a 5, h 1 )

AirlineFareFood a 1 (0) a 2 (0) a 3 (0)0.41 a 4 (1) a 5 (2) HotelLocationComfortCleanness h 1 (2) h 2 (0) h 3 (1) h 4 (1) h 5 (0) h 6 (0) AirlineFareFood a 1 (0) a 2 (0) a 3 (0)0.41 a 4 (1) a 5 (2) HotelLocationComfortCleanness h 1 (2) h 2 (0) h 3 (1) h 4 (1) h 5 (0) h 6 (0) AirlinesHotels AirlineFareFood a 1 (0) a 2 (0) a 3 (0)0.41 a 4 (1) HotelLocationComfortCleanness h 2 (0) h 3 (1) h 4 (1) h 5 (0) h 6 (0)

For any two candidate products cp 1 and cp 2 in P, if cp 1 dominates cp 2, RTOP k (cp 2, P, C)  RTOP k (cp 1, P, C) For any candidate product cp in P, if cp  Skyline(P), cp  n-k MFP 39 0 A1 A2 The candidate products in the n-k MFP must be in Skyline(P)

 : the set of candidate products generated from Skyline(T 1 ), Skyline(T 2 ), …, and Skyline(T x ) A candidate product cp  Skyline(P) if and only if cp   [VLDB’09] Only the skyline tuples of each component table have the possibility of being a part of a candidate product in the n-k MFP 40 AirlinesHotels AirlineFareFood a 1 (0) a 2 (0) a 3 (0)0.41 a 4 (1) HotelLocationComfortCleanness h 2 (0) h 3 (1) h 4 (1) h 5 (0) h 6 (0)

Only the customers in RTOP k (cp, Skyline(P), C) possibly become the members in RTOP k (cp, P, C) 41 PackageUpper bound (a 1, h 2 ){c 3 } (a 1, h 5 ){c 3, c 4 } (a 1, h 6 ){} (a 2, h 2 ){} (a 2, h 5 ){c 4 } (a 2, h 6 ){c 1, c 5 } (a 3, h 2 ){c 2 } (a 3, h 5 ){c 2, c 4 } (a 3, h 6 ){c 1, c 5 } The upper bounds of the remaining candidate packages RTOP k (cp, Skyline(P), C) is an upper bound of RTOP k (cp, P, C)

42 PackageUpper bound (a 1, h 2 ){c 3 } (a 1, h 5 ){c 3, c 4 } (a 2, h 5 ){c 4 } (a 2, h 6 ){c 1, c 5 } (a 3, h 2 ){c 2 } (a 3, h 5 ){c 2, c 4 } (a 3, h 6 ){c 1, c 5 } The top-2 favorites of C 3 : {(a 1, h 5 ), (a 1, h 2 )} The top-2 favorites of C 4 : {(a 1, h 5 ), (a 2, h 5 ), (a 3, h 5 )} P’ : {(a 1, h 5 )}

43 PackageUpper bound (a 2, h 6 ){c 1, c 5 } (a 3, h 2 ){c 2 } (a 3, h 5 ){c 2 } (a 3, h 6 ){c 1, c 5 } The top-2 favorites of C 1 : {(a 3, h 6 ), (a 4, h 6 )} The top-2 favorites of C 5 : {(a 3, h 6 ), (a 4, h 6 )} P’ : {(a 1, h 5 ), (a 3, h 6 )}P’ : {(a 1, h 5 )}

Application IV u1 u2 Year k=1 : user preferences : products Mileage Find Most Favorite Products by Top-k Reverse Skyline Queries

Thank you for your attention!