Download presentation
Presentation is loading. Please wait.
1
Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10
2
Database Group@CSE Outline Background Motivation Problem Definition Query Answering Approach Experimental Results
3
Database Group@CSE Background k-Nearest NeighborsUncertain Graphs 15 5 5 5 Find out 2-nearest neighbors for vertex B
4
Database Group@CSE Motivation DistancePathProbability 5B-D0.3 20B-A-D B-C-D 0.25648 ∞ No path0.44352 Define meaningful distance functions which is more useful to identify true neighbors Introduce a novel pruning algorithm to process knn queries in uncertain graphs. 15(0.2)15(0.6) 5(0.7)5(0.3) 5(0.4) most-probable-path-distance
5
Database Group@CSE Problem Definition Assumption: Independence among edges Probabilistic Graph Model G(V, E, P, W) V and E denote the set of nodes and edges respectively; P denotes the probabilities associated with each edge; W assigns each edge with a weight k-NN Query
6
Database Group@CSE Distances Median-Distance(s, t) Majority-Distance(s, t) Expected-Reliable-Distance(s, t)
7
Database Group@CSE Challenges For computation of median-distance and majority-distance, we need to obtain their distributions over all possible worlds. For computation of expected-reliable- distance, it has been proved as a #P hard problem.
8
Database Group@CSE Sampling
9
Database Group@CSE Sample Size for Median-D
10
Database Group@CSE Sample Size for E-R-D
11
Database Group@CSE Qualitative Analysis Classification Experiment Testing data: two classes, one is a triplet set of the form and the other is a triplet set of the form A classifier: it tries to identify the true neighbors. Measure: Data sets: Protein-protein interaction network DBLP Co-authorship network
12
Database Group@CSE Results
13
Database Group@CSE Observation Median-D Considering a new probability distribution The below lemma could be achieved D is a distance value
14
Database Group@CSE Core Pruning Scheme Query Transformation d D, M (s, t 1 ) d M (s, t 1 ) < d M (s, t 2 ) d M (s, t 1 ) >= d M (s, t 2 ) => d D, M (s, t 1 ) >= d D, M (s, t 2 )
15
Database Group@CSE Median-D kNN Query Answering
16
Database Group@CSE Majority-D kNN Query Answering The condition of d which is the exact majority distance should be Pr(d) >= 1 – P, P denotes the sum of visited nodes’ probabilities. For the node which enters the kNN-set could be possibly replaced by another node with smaller majority distance at a later step.
17
Database Group@CSE Experimental Results Dataset overviewConvergence of D-F Using the distance of a sample of 500 pw s as the ground truth
18
Database Group@CSE Efficiency of k-NN Pruning The fraction of visited nodes (pruning efficiency) as a function of k Pruning efficiency as a function of sample size
19
Database Group@CSE Quality of Results Pruning efficiency as a function of edge probability Median-D Stability as a function of the number of possible worlds
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.