Download presentation
Presentation is loading. Please wait.
Published byAlex Gladwin Modified over 9 years ago
1
Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA
2
Why Uncertain Graphs? Protein-Protein Interaction Networks False Positive > 45% Social Networks Probabilistic Trust/Influence Model Uncertainty is ubiquitous! Increasing importance of graph/network data Social Network, Biological Network, Traffic/Transportation Network, Peer-to-Peer Network Probabilistic perspective gets more and more attention recently.
3
Uncertain Graph Model Existence Probability Edge Independence Possible worlds (2 #Edge ) Weight of G2: Pr(G2) =0.5(1-0.5) (1-0.3) 0.20.60.7 (1-0.1)(1-0.4)(1-0.9) **** **** G2: = 0.0007938 G1:
4
Distance-Constraint Reachability (DCR) Problem Source Target What is the probability that s can reach t within distance d? A generalization of the two-terminal network reliability problem, which has no distance constraint. Given distance constraint d and two vertices s and t,
5
Important Applications Peer-to-Peer (P2P) Networks –Communication happens only when node distance is limited. Social Networks –Trust/Influence can only be propagated only through small number of hops. Traffic Networks –Travel distance (travel time) query –What is the probability that we can reach the airport within one hour?
6
Example: Exact Computation d = 2, ? First Step: Enumerate all possible worlds (2 9 ), Pr(G 1 )Pr(G 2 )Pr(G 3 )Pr(G 4 ) … +Pr(G 1 )* 0+Pr(G 2 )Pr(G 3 )Pr(G 4 )* 1* 0* 1+++ … Second Step: Check for distance-constraint connectivity, =
7
Approximating Distance-Constraint Reachability Computation Hardness –Two-terminal network reliability is #P- Complete. –DCR is a generalization. Our goal is to approximate through Sampling –Unbiased estimator –Minimal variance –Low computational cost
8
Start from the most intuitive estimators, right?
9
Direct Sampling Approach Sampling Process –Sample n graphs –Sample each graph according to edge probability
10
Direct Sampling Approach (Cont’) Estimator Unbiased Variance = 1, s reach t within d; = 0, otherwise. Indicator function
11
Path-Based Approach Generate Path Set –Enumerate all paths from s to t with length ≤ d –Enumeration methods E.g., DFS
12
Path-Based Approach (Cont’) Path set Exactly computed by Inclusion-Exclusion principle Approximated by Monte-Carlo Algorithm by R. M. Karp and M. G. Luby ( ) Unbiased Variance
13
Can we do better?
14
Divide-and-Conquer Methodology Example +(s,a)-(s,a) +(a,t) -(a,t) +(s,b) -(s,b) … … … … … …
15
1.# of leaf nodes is smaller than 2 |E|. 2.Each possible world exists only in one leaf node. 3.Reachability is the sum of the weights of blue nodes. 4.Leaf nodes form a nice sample space. Divide and Conquer (Cont’) all possible worlds Graphs having e 1 Graphs not Having e 1 s can reach t.s can not reach t. Summarize:
16
How do we sample? Unequal probability sampling –Hansen-Hurwitz (HH) estimator –Horvitz-Thomson (HT) estimator Sample Unit Start from here Pr i : Sample Unit Weight; Sum of possible worlds’ probabilities in the node. q i : sampling probability, determined by properties of coins along the way.
17
Hansen-Hurwitz (HH) Estimator Estimator Unbiased Variance To minimize the variance above, we have :Pr i = q i = 1, blue node = 0, red node Pr i = p(e 1 )*p(e 2 )*(1-p(e 3 ))*… P(e 1 )1-P(e 1 ) P(e 2 ) P(e 3 ) 1-P(e 3 ) 1-P(e 2 ) p(e 1 ) : 1 – p(e 1 ) p(e 2 ) : 1 – p(e 2 ) p(e 3 ) : 1 – p(e 3 ) P(e 4 ) 1-P(e 4 ) Weight Sampling probability sample size Pr i : the leaf node weight q i : the sampling probability
18
Horvitz-Thomson (HT) Estimator Estimator Unbiased Variance –To minimize vairance, we find Pr i = q i –Smaller variance than HH estimator # of Unique sample units
19
Can we further reduce the variance and computational cost?
20
Recursive Estimator 1.Unbiased 2.Variance: Sample the entire space n times Sample the sub- space n 1 times Sample the sub- space n 2 times We can not minimize the variance without knowing τ 1 and τ 2. Then what can we do? n 1 + n 2 = n
21
Sample Allocation We guess: What if –n 1 = n*p(e) –n 2 = n*(1-p(e))? We find: Variance reduced! –HH Estimator: –HT Estimator:
22
Sample Allocation (Cont’) Sampling Time Reduced!! Directly allocate samples Toss coin when sample size is small Sample size = n n 1 =n*p(e 1 ) n 2 =n*(1-p(e 1 )) n 3 =n 1 *p(e 2 ) n 4 =n 1 *(1-p(e 2 ))
23
Experimental Setup Experiment setting –Goal: Relative Error Variance Computational Time –System Specification 2.0GHz Dual Core AMD Opteron CPU 4.0GB RAM Linux
24
Experimental Results Synthetic datasets –Erdös-Rényi random graphs –Vertex#: 5000, edge density: 10, Sample size: 1000 –Categorized by extracted-subgraph size (#edge) –For each category, 1000 queries
25
Experimental Results Real datasets –DBLP: 226,000 vertices, 1,400,000 edges –Yeast PPIN: 5499 vertices, 63796 edges –Fly PPIN: 7518 vertices, 51660 edges –Extracted subgraphs size: 20 ~ 50 edges
26
Conclusions We first propose a novel s-t distance-constraint reachability problem in uncertain graphs. One efficient exact computation algorithm is developed based on a divide-and-conquer scheme. Compared with two classic reachability estimators, two significant unequal probability sampling estimators Hansen-Hurwitz (HH) estimator and Horvitz-Thomson (HT) estimator. Based on the enumeration tree framework, two recursive estimators Recursive HH, and Recursive HT are constructed to reduce estimation variance and time. Experiments demonstrate the accuracy and efficiency of our estimators.
27
Thank you ! Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.