Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.

Slides:

Advertisements

Similar presentations

Lazy Updates: An Efficient Technique to Continuously Monitoring Reverse kNN Presented By: Ying Zhang Joint work with Muhammad Aamir Cheema, Xuemin Lin,

Advertisements

Multi-Guarded Safe Zone: An Effective Technique to Monitor Moving Circular Range Queries Presented By: Muhammad Aamir Cheema 1 Joint work with Ljiljana.

Weiren Yu 1, Jiajin Le 2, Xuemin Lin 1, Wenjie Zhang 1 On the Efficiency of Estimating Penetrating Rank on Large Graphs 1 University of New South Wales.

Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.

Finding the Sites with Best Accessibilities to Amenities Qianlu Lin, Chuan Xiao, Muhammad Aamir Cheema and Wei Wang University of New South Wales, Australia.

Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.

Computer Science and Engineering Inverted Linear Quadtree: Efﬁcient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.

Spatio-temporal Databases

Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.

Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.

Best-Effort Top-k Query Processing Under Budgetary Constraints

Fast Algorithms For Hierarchical Range Histogram Constructions

School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.

Click to edit Present’s Name SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries Shiyu Yang 1, Muhammad Aamir Cheema 2,1, Xuemin.

Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations Lu-An Tang, Yu Zheng, Xing Xie, Jing Yuan, Xiao Yu, Jiawei Han University of.

1 NNH: Improving Performance of Nearest- Neighbor Searches Using Histograms Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research) Chen Li (UC Irvine)

VLDB’2007 review Denis Mindolin. VLDB’07 program.

1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.

Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.

Effectively Indexing Uncertain Moving Objects for Predictive Queries School of Computing National University of Singapore Department of Computer Science.

Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:

A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.

DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December

Spatio-temporal Databases Time Parameterized Queries.

Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.

Reza Sherkat ICDE061 Reza Sherkat and Davood Rafiei Department of Computing Science University of Alberta Canada Efficiently Evaluating Order Preserving.

1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.

Spatial Queries Nearest Neighbor Queries.

A Unified Approach for Computing Top-k Pairs in Multidimensional Space Presented By: Muhammad Aamir Cheema 1 Joint work with Xuemin Lin 1, Haixun Wang.

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.

Presented by: Duong, Huu Kinh Luan March 14 th, 2011.

Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.

Computer Science and Engineering Loyalty-based Selection: Retrieving Objects That Persistently Satisfy Criteria Presented By: Zhitao Shen Joint work with.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,

Click to edit Present’s Name Xiaoyang Zhang 1, Jianbin Qin 1, Wei Wang 1, Yifang Sun 1, Jiaheng Lu 2 HmSearch: An Efficient Hamming Distance Query Processing.

© 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.

Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.

Influence Zone: Efficiently Processing Reverse k Nearest Neighbors Queries Presented By: Muhammad Aamir Cheema Joint work with Xuemin Lin, Wenjie Zhang,

Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.

Efficient Processing of Top-k Spatial Preference Queries

Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.

On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:

9/2/2005VLDB 2005, Trondheim, Norway1 On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern.

Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.

Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.

Information Technology (Some) Research Trends in Location-based Services Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia.

Privacy Preserving Outlier Detection using Locality Sensitive Hashing

1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.

Probabilistic Skylines on Uncertain Data (VLDB2007) Jian Pei et al Supervisor: Dr Benjamin Kao Presenter: For Date: 22 Feb 2008 ??: the possible world.

Computer Science and Engineering Ranking Complex Objects in a Multi-dimensional Space Wenjie Zhang, Ying Zhang, Xuemin Lin The University of New South.

Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.

Click to edit Present’s Name AP-Tree: Efficiently Support Continuous Spatial-Keyword Queries Over Stream Xiang Wang 1*, Ying Zhang 2, Wenjie Zhang 1, Xuemin.

A Unified Algorithm for Continuous Monitoring of Spatial Queries

A Uniﬁed Framework for Efﬁciently Processing Ranking Related Queries

Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS

Stochastic Skyline Operator

TT-Join: Efficient Set Containment Join

Probabilistic Data Management

Distributed Probabilistic Range-Aggregate Query on Uncertain Data

Probabilistic n-of-N Skyline Computation over Uncertain Data Streams

Similarity Search: A Matching Based Approach

Publishing in Top Venues

Uncertain Data Mobile Group 报告人：郝兴.

Efficient Processing of Top-k Spatial Preference Queries

Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)

Presentation transcript:

Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia

KNN Query Find k objects closest to query q 2 q o1o1 o4o4 o2o2 o3o3 2-NN of q: o 2 and o 4

KNN Query Find k objects closest to query q 3 What if objects have multiple values ? q Player A Player B

Existing Models for Multi-instances Probabilistic KNN Models – Probability is computed by summing up probabilities of all valid possible worlds – Probable top-k NN: [Bekales, Soliman, Ilyas. VLDB 08] Probabilistic verifiers: [Cheng, Chen, Mokebel, Chow. ICDE 08] 4

Probabilistic KNN Model 5 q and have the same probability to be NN of q

Probabilistic KNN Model 6 q and have the same probability to be NN of q

Probabilistic KNN Model 7 q and have the same probability to be NN of q

Probabilistic KNN Model 8 ties Not very sensitive to relative distribution of objects instances Different semantics than multi-value objects !

Aggregate Distances Existing: mean, minimal, maximal, etc Generalization: quantile distance 9

Applications GIS Location based service Radio Frequency Identify Device … 10

Contributions Distance between two multi-value objects – Quantile distance – Quantile group-base distance KNN processing – Linear time algorithm for quantile distance – Pruning Techniques on object & instance level 11

Preliminaries Φ-quantile: – The first element in a sorted list with the cumulative weight not smaller than Φ, where Φ is a number in (0, 1]. 12 Sorted elements: Weights: quantile0.8-quantile Cumulative weight: 0.45

Preliminaries Multi-valued Object: – A set of weighted instances in d-dimensional space – The sum of the weights equals to 1 13

Quantile Distances Φ-quantile distance 14 Instance pairs: (q 2, u 1 ) (q 3, u 1 ) (q 3, u 2 ) (q 1, u 1 ) (q 2, u 2 ) (q 1, u 2 ) Weights: 1/8 1/8 1/8 1/4 1/8 1/ quantile0.5-quantile ½ ¼ ¼ ½ ½

Quantile Distances Φ-quantile group-base distance – Candidate group: a subset S’ of elements set Q x U such that the total weights of elements in is not smaller than Φ. – The minimal total weighted distance among all candidate groups, namely with the minimal 15

Quantile Distances Φ-quantile group-base distance 16 Instance pairs: (q 2, u 1 ) (q 3, u 1 ) (q 3, u 2 ) (q 1, u 1 ) (q 2, u 2 ) (q 1, u 2 ) Weights: 1/8 1/8 1/8 1/4 1/8 1/4 Φ = 0.5

Problem Definition Given a set of multi-valued objects, a multi- valued query object and Φ in (0, 1], retrieve K objects with smallest Φ-quantile distance and Φ-quantile group-base distance to the query object. 17

Framework Seeding – Select k objects and compute quantile distance --- k seeds are selected based on mean – Challenge: efficient computation of quantile distances Refinement: – Determine the final solution – Challenge: efficient pruning techniques 18

Φ-quantile Distance Multi-value objects U and V – Overall m=|U| * |V| instance pairs – Existing: Disk-based Quantile computation: O(mlogm) with pruning --- Yiu et al EDBT 06 – Our contribution: O(m) algorithm with pruning Basic Idea: index instances by R-tree; prune entry pairs at higher levels using bounding techniques 19

Φ-quantile Distance 20 U1U1 U2U2 0.5 V V2V2 0.6 (U 1, V 1 ) 0.2 (U 1, V 2 ) 0.3 (U 2, V 2 ) 0.3 (U 2, V 1 ) 0.2 Φ=0.6 Φ=0.6 – 0.2 lower bound upper bound

Φ-quantile KNN λ k : the largest distance of the K seeded objects Global R-tree: indexes the MBB of the multi-value objects set Local R-tree: indexes the instances of multi-value objects (query, object set) 21

Φ-quantile KNN Distance based pruning rule on Global R-tree 22 Q E from global R-tree ≥λ k

Φ-quantile KNN Weight based pruning rule on Global R-tree Entry E: Let Ω denote the set of entries with minimal distance ≤ λ k, if the total weight of entries in Ω is ≤ Φ, then every object in E could be pruned. 23 Q E from global R-tree Q1Q1 Q2Q2 minimal distance: mindist(Q 2, E) ≤ λ k weight(Q2) ≤Φ

Φ-quantile KNN Pruning on the Local R-tree of Q and U – Trim the local R-tree of both Q and U by λ k level by level. Discard entries in Q with minimal distance to U larger than λ k, similarly for U. – Denote the total weight of remaining entries as P Q and P U, respectively, if P Q * P U ≤ Φ, then U can be pruned. 24

Φ-quantile KNN Pruning on the Local R-tree of Q and U 25 Q U Q1Q1 Q2Q2 U1U1 U2U2 minimal distance: mindist(Q 1, U) ≥ λ k minimal distance: mindist(Q, U 2 ) ≥ λ k weight(Q 2 ) * weight(U 1 ) ≤Φ

Φ-quantile KNN Overall processing – 1. Pruning on Global R-tree based on distance – 2. Pruning on Global R-tree based on weights – 3. Pruning on Local R-tree – Φ-quantile distance computation 26

Φ-quantile Group-base Distance From |Q| * |U| instance pairs, select a set of pairs such that – the sum of weights is ≥ Φ – the weighted distance is minimized Reduced to Knapsack problem. NP-hard 27

Φ-quantile Group-base Distance Approximate algorithm with ratio 2 and time complexity O(m log m), where m is the total number of instance pairs. 28 possible solution: 1.72 (1, 0.28), (3, 0.48) sorted distance: 1 weight: possible solution: 1 (1, 0.28), (2, 0.12), (4, 0.12)

Φ-quantile Group-base KNN Prune a set of objects on Global R-tree based on distance 29 Q Φ * L ≥ λ k E L

Φ-quantile Group-base KNN Prune a set of objects on Global R-tree based on weights – Traverse the local R-tree of Q – Sort entry pairs according to lower bound distance at each level of Q 30

Φ = (1, 0.28)(2, 0.12)(3, 0.48)(4, 0.12) 0.5-quantile d1= 1 * * 0.12 d2= 3 * ( Φ – 0.28 – 0.12) If d1 + d2 ≥ λ k, then E could be removed. Q1Q1 Q2Q2 Q3Q3 Q4Q4 EQGlobal R-tree

Φ-quantile Group-base KNN Overall Processing – 1. Distance based pruning on Global R-tree – 2. Weight based pruning on Global R-tree – Φ-quantile Group-base distance calculation 32

Experiment Real Dataset – NBA player statistics – 339, 721 records of 1,313 players – 3 dimension Synthetic Dataset – Size: 10, 000 to 50, 000 – Instances: 400 to 2, 000 – Dimension: 2 to 5 – MBB length, spatial distribution, instance distribution, weight distribution … 33

Φ-quantile Distance Computation 34

Overall Performance Naive algorithms refer to techniques without pruning rules 35

Pruning Powers 36

Accuracy 37

Varying Parameters 38

Conclusions Distance between two multi-value objects based on quantile A linear algorithm for quantile distance computation with pruning Efficient KNN processing techniques 39

Thanks 40