Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Similar presentations


Presentation on theme: "Database k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10."— Presentation transcript:

1 Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10

2 Database Group@CSE Outline Background Motivation Problem Definition Query Answering Approach Experimental Results

3 Database Group@CSE Background k-Nearest NeighborsUncertain Graphs 15 5 5 5 Find out 2-nearest neighbors for vertex B

4 Database Group@CSE Motivation DistancePathProbability 5B-D0.3 20B-A-D B-C-D 0.25648 ∞ No path0.44352 Define meaningful distance functions which is more useful to identify true neighbors Introduce a novel pruning algorithm to process knn queries in uncertain graphs. 15(0.2)15(0.6) 5(0.7)5(0.3) 5(0.4) most-probable-path-distance

5 Database Group@CSE Problem Definition Assumption: Independence among edges Probabilistic Graph Model G(V, E, P, W) V and E denote the set of nodes and edges respectively; P denotes the probabilities associated with each edge; W assigns each edge with a weight k-NN Query

6 Database Group@CSE Distances Median-Distance(s, t) Majority-Distance(s, t) Expected-Reliable-Distance(s, t)

7 Database Group@CSE Challenges For computation of median-distance and majority-distance, we need to obtain their distributions over all possible worlds. For computation of expected-reliable- distance, it has been proved as a #P hard problem.

8 Database Group@CSE Sampling

9 Database Group@CSE Sample Size for Median-D

10 Database Group@CSE Sample Size for E-R-D

11 Database Group@CSE Qualitative Analysis Classification Experiment Testing data: two classes, one is a triplet set of the form and the other is a triplet set of the form A classifier: it tries to identify the true neighbors. Measure: Data sets: Protein-protein interaction network DBLP Co-authorship network

12 Database Group@CSE Results

13 Database Group@CSE Observation Median-D Considering a new probability distribution The below lemma could be achieved D is a distance value

14 Database Group@CSE Core Pruning Scheme Query Transformation d D, M (s, t 1 ) d M (s, t 1 ) < d M (s, t 2 ) d M (s, t 1 ) >= d M (s, t 2 ) => d D, M (s, t 1 ) >= d D, M (s, t 2 )

15 Database Group@CSE Median-D kNN Query Answering

16 Database Group@CSE Majority-D kNN Query Answering The condition of d which is the exact majority distance should be Pr(d) >= 1 – P, P denotes the sum of visited nodes’ probabilities. For the node which enters the kNN-set could be possibly replaced by another node with smaller majority distance at a later step.

17 Database Group@CSE Experimental Results Dataset overviewConvergence of D-F Using the distance of a sample of 500 pw s as the ground truth

18 Database Group@CSE Efficiency of k-NN Pruning The fraction of visited nodes (pruning efficiency) as a function of k Pruning efficiency as a function of sample size

19 Database Group@CSE Quality of Results Pruning efficiency as a function of edge probability Median-D Stability as a function of the number of possible worlds


Download ppt "Database k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10."

Similar presentations


Ads by Google