Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modeling, sampling, generating Networks with MRV

Similar presentations


Presentation on theme: "Modeling, sampling, generating Networks with MRV"— Presentation transcript:

1 Modeling, sampling, generating Networks with MRV
Umass Team

2 Agenda sampling and estimation of network degree distributions – Don
effect of MRV on network characteristics – Bo generative models for MRV - Shan

3 Characterizing Joint Degree Distribution of Directed Networks
Fabricio Murai, Don Towsley Umass-Amherst

4 Outline RW-based estimation sensitivity of accuracy to
hidden incoming edges visible incoming edges sensitivity of accuracy to heavy tails reciprocity correlation

5 Directed networks: hidden edges
outgoing edges visible, incoming edges hidden random walk (RW) based estimation (INFOCOM’12) during walk, construct undirected graph consistent with walk follow outgoing edge on first visit walk new undirected graph on revisits use known solution for undirected RW to characterize network can estimate outgoing degree distribution impossibility result for joint in/outdegree distribution lack of statistical information wo sampling most of graph

6 RW-based: Visible Edges
transform digraph to undirected graph collect samples using RW 𝑠 1 , 𝑠 2 ,…, 𝑠 𝑛 , 𝑠 𝑘 =( 𝑖 𝑘 , 𝑜 𝑘 ) estimate 𝜑 𝑖,𝑗 = 1 𝑛 𝑘 ℎ 𝑖𝑗 ( 𝑠 𝑘 ) 𝜋 ( 𝑠 𝑘 ) ℎ 𝑖𝑗 ( 𝑠 𝑘 ) , 𝑖,𝑗=0,1,… ℎ 𝑖𝑗 𝑠 𝑘 = 1, 𝑖 𝑘 =𝑖, 𝑜 𝑘 =𝑗 0, otherwise 𝜋 𝑠 𝑘 =𝐶×deg( 𝑠 𝑘 ) deg( 𝑠 𝑘 ) is degree of new undirected graph, 𝐶 chosen to make 𝜋 a distribution

7 RW-based degree distribution estimation
performance on real datasets vs. uniform vertex sampling vs. DURW for marginal outdegree distribution effect of heavy tails? effect of reciprocity, correlation?

8 Real Datasets in/out degrees appear to be heavy-tailed Don,
The numbers here are not perfectly accurate, since I’m taking the Giant CC. I’ll try to get the right numbers.

9 Behavior of RW on real datasets
YouTube NMRSE sampling budget 10% of graph size (B = 0.1) NRMSE indegree outdegree log(PMF) indegree outdegree log(NRMSE) Simulation results. NRMSE: behaves as expected. Two forces: High probability mass causes upsampling – small error for low total degree; high total degree causes upsampling – errors decrease with total indegree. Middle left area: small pmf, small total degree.

10 RW vs. Vertex Sampling YouTube, log(NRMSE) with B=0.1 Vertex Sampling
indegree outdegree Simulation results. B=0.1 Assuming jump cost=10; hence Vertex sampling only has 1/10 samples. Vertex sampling performs as well as RW for very low total degree; worse everywhere elese

11 Web-Google B=0.1 log(PMF) log(NRMSE) outdegree outdegree indegree
Here the in-degree spans one order of magnitude more. (web pages can’t have too many out-links)

12 Wiki-Talk B=0.1 log(NRMSE) log(PMF) outdegree outdegree indegree
Here the out-degree spans one order of magnitude more.

13 Out-degree distribution estimation: DURW vs. RW
correlation: 0.95 reciprocity: 0.79 RW based on all edges provides (slightly) lower errors Misusing indegree information? NRMSE YouTube: high reciprocity, high correlation High correlation -> both methods exhibit the same trends High reciprocity -> DURW “sees” graph as RW outdegree

14 Other datasets Correlation: 0.13 Correlation: 0.47 Reciprocity: 0.31
outdegree NRMSE Correlation: 0.47 Reciprocity: 0.14 outdegree NRMSE WikiTalk: medium correlation -> trends are different in the middle low reciprocity -> NRMSE is one scale of magnitude smaller for RW Web-Google: low correlation -> nodes with large out-degree have small in-degree and end-up downsampled with RW, thus, larger error Medium reciprocity -> NRMSEs have about the same order of magnitude

15 Sensitivity of RW method to graph structure
reciprocity correlation Methodology: approximate RW sampling by independent edge sampling focus on joint distributions with pareto(3) marginals

16 Approximating RW by edge sampling
indegree outdegree RW sampling 𝑓 𝑖,𝑗,𝑟 =𝑃(𝑖𝑛=𝑖, 𝑜𝑢𝑡=𝑗, 𝑟𝑒𝑐𝑖𝑝=𝑟) select random edge select random endpoint prob. sample node with indeg i, outdeg j, recip deg r: produces unbiased estimate experiments suggest above expression sufficiently accurate indegree outdegree Edge sampling

17 Controlling reciprocity
Goal: given degree distribution 𝜃 𝑖𝑗 , generate 𝜃 𝑖𝑗𝑟 =𝑃(𝑖𝑛=𝑖, 𝑜𝑢𝑡=𝑗, 𝑟𝑒𝑐𝑖𝑝=𝑟) with arbitrary amount of reciprocity Model given in-degree i, out-degree j, number of reciprocated edges is 𝑟~𝐵𝑖𝑛𝑜𝑚 min 𝑖,𝑗 ,𝑝 𝑝 – tuning parameter, min reciprocity (𝑝=0), max reciprocity (𝑝=1) distr. may not be graphical

18 Controlling correlation
Goal: given marginal distribution 𝑔 𝑖 , 𝑖=0,1,… generate 𝜃 𝑖𝑗 s.t. 𝜃 𝑖∗ = 𝜃 ∗𝑗 = 𝑔 𝑖 with arbitrary nonnegative Neyman-Pearson correlation Model mixture of perfect correlation 𝑓 1 𝑖,𝑗 = 𝑔 𝑖, 𝑖=𝑗 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 no correlation 𝑓 2 𝑖,𝑗 = 𝑔 𝑖 ×𝑔 𝑗 𝜃 𝑖𝑗 =α 𝑓 1 𝑖,𝑗 +(1−α) 𝑓 2 𝑖,𝑗 𝛼 – tuning parameter

19 Pareto: effect of reciprocity
Correlation, 𝛼=0.99 Pareto: effect of reciprocity indegree outdegree log(NRMSE(recip(0)/NRMSE(recip(1))) Q: effect of reciprocity on RW estimation efficiency 𝜃 𝑖𝑗 𝑝 - in/outdegree distribution under 𝑝-reciprocity model focus on 𝑁𝑅𝑀𝑆𝐸( 𝜃 𝑖𝑗 0 ) 𝑁𝑅𝑀𝑆𝐸( 𝜃 𝑖𝑗 1 ) reciprocity hurts estimation when degrees correlated diagonal estimation helped when no correlation recip(0) better No correlation, 𝛼=0 indegree outdegree log(NRMSE(recip(0)/NRMSE(recip(1))) When reciprocity is maximum, the proportion of samples along the diagonal does not change. Hence, log-ratio is zero. recip(0) better recip(1) better

20 Pareto: effect of correlation
indegree outdegree log(NRMSE(corr(0)/NRMSE(corr(.99))) Q: how does correlation affect efficiency of RW-based estimation 𝜃 𝑖𝑗 𝛼 - in/outdegree distribution under 𝛼-correlation model focus on 𝑁𝑅𝑀𝑆𝐸( 𝜃 𝑖𝑗 0 ) 𝑁𝑅𝑀𝑆𝐸( 𝜃 𝑖𝑗 ) increased correlation -> increased accuracy on diagonal indegree = outdegree log(NRMSE(corr(0)/NRMSE(corr(.99))) When reciprocity is maximum, the proportion of samples along the diagonal does not change. Hence, log-ratio is zero.

21 Capturing MRV characteristic in all of above
Roadmap rewiring algorithms to generate arbitrary reciprocity, correlation in real networks more flexible correlation model (input covariance matrix, possibly different marginals) evaluation on networks other network structural properties Capturing MRV characteristic in all of above


Download ppt "Modeling, sampling, generating Networks with MRV"

Similar presentations


Ads by Google