Download presentation
Presentation is loading. Please wait.
Published byWendy Murphy Modified over 9 years ago
1
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
Thomas Bernecker, Tobias Emrich, Hans-Peter Kriegel, Matthias Renz, Stefan Zankl and Andreas Zuefle Ludwig-Maximilians-Universität München (LMU) Munich, Germany {bernecker, emrich, kriegel, renz,
2
Outline Background Framework for Probabilistic RkNN Processing
Uncertain Data Model Reverse k-nearest neighbour queries Reverse k-nearest neighbour queries on uncertain objects Framework for Probabilistic RkNN Processing Approximation Spatial Filter Probabilistic Filter Verification Evaluation + Summary Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
3
Objects are described by a multi-dimensional probability distribution
Background Datamodel Framework RkNN Queries Summary PRkNN Queries Objects are described by a multi-dimensional probability distribution Object Independence Assumption Queries are answered according to possible worlds semantic Object PDFs can be spatially bounded Continuous or discrete representation User ratings for „Life of Brian“ Uncertain Attribute a PDFX Attribute können abhängig voneinander sein Mean keine gute reprensentation Action Uncertain Attribute b Humor Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 3
4
RkNN(q) = {o DB | q kNN(o)}
Background Datamodel Framework RkNN Queries Summary PRkNN Queries RkNN(q) = {o DB | q kNN(o)} o2 o1 What is it good for? Market segmentation Outlier detection Incremental algorithms … o3 o4 o5 q Datamining -> Market Segmentation Outlier Detection Incremental -> Continous Nearest Neighbour o6 R1NN(q) = {o7} R2NN(q) = {o7, o5, o4} o7 Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
5
Background Datamodel Framework RkNN Queries Summary PRkNN Queries
„Is O‘ R1NN of Q?“ O2 O‘ O1 Q Note: The query object may be uncertain.as well! Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
6
=> In some worlds it is
Background Datamodel Framework RkNN Queries Summary PRkNN Queries „Is O‘ R1NN of Q?“ => In some worlds it is O2 O‘ O1 Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
7
=> In other worlds it is not
Background Datamodel Framework RkNN Queries Summary PRkNN Queries „Is O‘ R1NN of Q?“ => In other worlds it is not O2 O‘ O1 Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
8
Definition of Probabilistic RkNN
Background Datamodel Framework RkNN Queries Summary PRkNN Queries Definition of Probabilistic RkNN PRkNN(Q, τ) = {O DB | P(O RkNN(Q)) ≥ τ} {O DB | P(Q kNN(O)) ≥ τ} O2 O‘ P(Q 1NN(O‘)) = 21/24 e.g. O‘ PR1NN(Q, 0.5) O1 Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
9
Framework for PRkNN query processing Approximation (Indexing)
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Framework for PRkNN query processing Approximation (Indexing) Simplification of spatial-probabilistic keys Spatial Filter Filter objects according to simple spatial keys Probabilistic Filter Derive lower/upper bounds of qualification probability (by means of simple spatial-probabilistic keys) Filter objects according to lower/upper probability bounds Verification Computation of the exact probability (very expensive) Monte-Carlo Sampling (many samples required) Modularization Comparison of different algorithms Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
10
R*-Tree for indexing objects (global index)
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification R*-Tree for indexing objects (global index) Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
11
AR*-Tree for indexing instances (local index)
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification AR*-Tree for indexing instances (local index) 0.3 0.15 1.0 0.15 0.15 0.25 0.15 0.1 0.1 0.2 0.45 Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
12
Pruning based on rectangular approximations only [1].
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Pruning based on rectangular approximations only [1]. [1] Tobias Emrich, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Andreas Züfle: Boosting Spatial Pruning: On Optimal Pruning of MBRs. SIGMOD Conference 2010: 39-50 For any O‘ intersecting this region, Q may possibly be closer than O. For any O‘ in this region, O is closer than Q. Task Find k objects O DB\O‘ which are closer to O‘ than to Q O Q B For any O‘ in this region, O is not closer than Q. Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
13
Probability of O to be closer to O‘ than Q?
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Probability of O to be closer to O‘ than Q? O Q O‘ B Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
14
„O is closer to O‘ than Q with at least x% probability“
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Probability of O to be closer to O‘ than Q? „O is closer to O‘ than Q with at least x% probability“ O Q O‘ Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
15
„O is closer to O‘ than Q with at most x% probability“
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Probability of O to be closer to O‘ than Q? „O is closer to O‘ than Q with at most x% probability“ O Q O‘ Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
16
How many objects O DB are closer to O‘ than Q?
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Exemplary statements O1 is closer to O’ with at least 20% and at most 50% O2 is closer to O’ with at least 60% and at most 80% Correctly deriving these bounds is not trivial (see paper) How many objects O DB are closer to O‘ than Q? Consider the following uncertain generating function x-term: probability of the object to be closer to O’ than Q z-term: probability of the object to be further from O’ than Q y-term: uncertainty => (0.2x + 0.3y + 0.5z) * (0.6x + 0.2y + 0.2z) Expansion yields 0.12x² xz + 0.1z² xy yz y² Beim splitten müssen gewisse regeln beachtet werden 1 Term pro objekt Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
17
0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 0.12x² xz + 0.1z² xy yz y² 80 % 60 % probability 40 % 20 % 1 2 # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
18
0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 0.12x² xz + 0.1z² xy yz y² 80 % 60 % probability 40 % 20 % 1 2 # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
19
0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 0.12x² xz + 0.1z² xy yz y² 80 % 60 % probability 40 % 20 % 1 2 # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
20
0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 0.12x² xz + 0.1z² xy yz y² 80 % 60 % probability 40 % 20 % 1 2 # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
21
0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 0.12x² xz + 0.1z² xy yz y² 80 % 60 % probability 40 % 20 % 1 2 # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
22
0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 0.12x² xz + 0.1z² xy yz y² 80 % 60 % probability 40 % 20 % 1 2 # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
23
Example PRkNN queries PR1NN (Q, 50%) O‘ is not part of the result
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 80 % 100 % 60 % 80 % probability 40 % 60 % probability 20 % 40 % 20 % 1 2 Exact # objects O DB that are closer to O‘ than Q 1 2 Maximum # objects O DB that are closer to O‘ than Q Example PRkNN queries PR1NN (Q, 50%) O‘ is not part of the result PR2NN (Q, 40%) O‘ is part of the result PR2NN (Q, 80%) O‘ has to be further investigated cdf Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
24
Example PRkNN queries PR1NN (Q, 50%) O‘ is not part of the result
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 80 % 100 % 60 % 80 % probability 40 % 60 % probability 20 % 40 % 20 % 1 2 Exact # objects O DB that are closer to O‘ than Q 1 2 Maximum # objects O DB that are closer to O‘ than Q Example PRkNN queries PR1NN (Q, 50%) O‘ is not part of the result PR2NN (Q, 40%) O‘ is part of the result PR2NN (Q, 80%) O‘ has to be further investigated cdf Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 24
25
Example PRkNN queries PR1NN (Q, 50%) O‘ is not part of the result
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 80 % 100 % 60 % 80 % probability 40 % 60 % probability 20 % 40 % 20 % 1 2 Exact # objects O DB that are closer to O‘ than Q 1 2 Maximum # objects O DB that are closer to O‘ than Q Example PRkNN queries PR1NN (Q, 50%) O‘ is not part of the result PR2NN (Q, 40%) O‘ is part of the result PR2NN (Q, 80%) O‘ has to be further investigated cdf Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 25
26
Example PRkNN queries PR1NN (Q, 50%) O‘ is not part of the result
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 80 % 100 % 60 % 80 % probability 40 % 60 % probability 20 % 40 % 20 % 1 2 Exact # objects O DB that are closer to O‘ than Q 1 2 Maximum # objects O DB that are closer to O‘ than Q Example PRkNN queries PR1NN (Q, 50%) O‘ is not part of the result PR2NN (Q, 40%) O‘ is part of the result PR2NN (Q, 80%) O‘ has to be further investigated cdf Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 26
27
Options for Verification
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Options for Verification Consideration of all possible worlds (exponential) Adabting probabilistic nearest neighbour ranking [2] on instance level of objects (polynomial) Monte-Carlo based (linear in the number of samples) [2] Jian Li, Barna Saha, Amol Deshpande: A Unified Approach to Ranking in Probabilistic Databases. PVLDB 2(1): (2009) Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
28
Spatial Filter Background Evaluation Framework Conclusion Summary
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
29
Background Evaluation Framework Conclusion Summary
Probabilitsic Filter Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
30
Comparison to other algorithms
Background Evaluation Framework Conclusion Summary Comparison to other algorithms
31
Framework for PRkNN query processing
Background Evaluation Framework Conclusion Summary Framework for PRkNN query processing Deriving probabilistic pruning bounds for single objects Accumulate theses bounds using uncertain generating functions Cost model for choosing the optimal value for tree depth Comparison to existing algorithms for PRNN processing
32
Thanks! Questions? Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
33
Dependency on k
34
Problem of dependency O’ Q O1, O2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.