Download presentation
Presentation is loading. Please wait.
Published byEmery Randall Modified over 9 years ago
1
Link Reconstruction from Partial Information Gong Xiaofeng, Li Kun & C. H. Lai TSL@NUS
2
General situations where problems may arise Observed network (A NxN filled with 0s and 1s) Scenarios: A) no side information. statistical analysis, clustering, modeling, process, etc. B) Some links are uncertain (positions known) link reconstruction problem, based on model, similarity measure. C) Some 1s are set to be 0s (positions unknown) variant problem of link reconstruction, possible related to link prediction. D) network is subject to change. one kind of prediction problem (link prediction), node prediction, network evolution, etc.
3
B.1 Problem of network reconstruction 1 2 3 4 5 Guess out the values (0 or 1) of dashed arrows. There are some unknown links, which may be corrupted, missed or unable to measure at time. Presumptions: o Network has structures. o Unknown links are fairly sampled. oNumber of unknown links are small.
4
B.2 Procedures of reconstruction of links Available information -> fitted probabilistic model P(NxN) -> connection probability p(i,j) of each unknown links (i,j) -> determine a threshold of connection probability Pt -> set (i,j) to be 1, if p(i,j)>pt, and 0 otherwise observed network parameters model function optimization connection probability threshold reconstruction or prediction modeling prediction
5
B.3 Reformulated signal detection problem Observed network -> 3 types of signals, 0, 1 and ?. Fitted model -> connection probabilities, P0 and P1. Signals (P?) to be classified -> ? Problem: Giving connection probability P? -> type of signal (0 or 1) Assumption under certain model: Unknown links do not influence significantly the reliability of fitted model (P0 and P1), i.e., Connection probability P? of any unknown link can be regarded as be sampled from P0 or P1.
6
Searching an optimal detection scheme? e.g., Neyman-Pearson criterion, Observation (data): connection probability (p) Hypothesis: H0: 0-link and H1: 1-link Data space E: R0 and R1, acceptance region Decision D: D0 (accept H0) and D1 (accept H1) B.4 An equivalent hypothesis testing problem
7
B.5 Measuring reconstruction performance actual value predictingoutcome pn p’True Positive (TP)False Positive (FP)P’ n’False Negative (FN)True Negative (TN)N’ PN Contingency table (or confusion matrix) statistics defined: Sensitivity or True Positive Rate ( TPR ) : TPR=TP/P=TP/(TP+FN) False Positive Rate ( FPR ) : FPR=FP/N=FP/(FP+TN) Accuracy ( ACC ) : ACC=(TP+TN)/(P+N) True Negative Rate or Specificity ( SPC ) : SPC=TN/N=1-FPR Positive Predictive Value ( PPV ) : PPV=TP/(TP+FP) Receiver Operating Characteristic ( ROC ) : TPR vs. FPR
8
B.6 Relation to performance measures f0(p) R4R3 R2 R1 f1(p) pt connection probabilities
9
B.7 Criterion of MAP For reconstruction problem, we choose criterion to maximize the a posteriori probability of the two hypothesis.
10
A.1 Probabilistic model of structured networks
11
A.2 Estimate model parameters (MLE)
12
B.8 Example network
13
B.9 Density function of connection probabilities
14
B.10 MAP detector minimizes average error Density function is usually jagged and difficult to work with. Distribution function is preferred. Consider the minimum average error (cost).
15
B.11 Distribution of connection probabilities
16
B.12 Generalizability of algorithm Unknowns following same distribution approximately? Possible reasons for unfavorable burst at tail, source of model error.
17
B.13 Robustness of algorithm sensitive to number of unknown links?
18
B.14 Comparison of operation points
19
B.15 Reconstruction results PNACC (%)TP/P (%)TN/N (%)TP/(TP+FP) (%) 201529398.1380.6098.7971.68 222527298.1380.6398.8674.90 192530298.1175.5298.9271.78 224527098.2580.8098.9977.35 235525998.1375.3299.1479.73 217527798.3878.3499.2080.19 204529098.3177.4599.1177.07 192530298.2571.8899.2176.67 231526398.1677.0699.0978.76 217527797.9371.8999.0074.64 213.55280.598.1876.9599.0376.28 USAir Network, 10% missed
20
C.1 A variant problem of link reconstruction Observed network -> types of signals, 0 and 1. 1 2 3 4 5 some 0s are originally 1s, but be set as 0s. position unknown, number known or unknown.
21
C.2 Procedures for the variant problem Available information -> fitted probabilistic model P(NxN) -> connection probability p(i,j) of each 0-link (i,j) -> (a) number (M) unknown -> determine a threshold of connection probability Pt -> set (i,j) to be 1, if p(i,j)>pt, and 0 otherwise (b) number (M) known -> scoring: ranking connection probabilities of candidate links (all 0-links) -> set M links with highest score to be 1s.
22
C.3 Algorithm based on common neighbor
23
C.4 Comparison between two methods Probability density functions Distribution functions
24
C.5 Generalizability and robustness of algorithms
25
C.6 Reconstruction performance by ranking
26
D.1 Problem of link prediction Procedure is identical to that of the variant link reconstruction problem. Econophysics Co-authorship network (N=506, m=519, nL=379)
27
D.2 Factors to affect prediction performance Problem of generalizability: a) size of the training set, or time span of prediction; b) time-changing growing mechanism
28
D.3 Effects of training set size Assume new links to be known, examine the variant problem above: training data set is not able to capture underlying distribution faithfully, either size is too small or growing rule is time dependent.
29
Conclusions The problem of network reconstruction is thoroughly studied. Under more general framework, the problem can be reformulated as hypothesis testing problem, which gives deeper insights into our understanding of the problem, and enable us to relate the reconstruction performance of various methods to quantities at more fundamental level.
30
THANK YOU
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.