Download presentation
Presentation is loading. Please wait.
Published byJoel George Modified over 8 years ago
1
1 Coarse-Grained Topology Estimation via Graph Sampling Maciej Kurant 1 Minas Gjoka 2 Yan Wang 2 Zack W. Almquist 2 Carter T. Butts 2 Athina Markopoulou 2 1 ETHZ, 2 UC Irvine 17 Aug 2012, WOSN
2
Coarse-grained topology A B nodes belong to different categories Example categories: countries universities workplaces religion age music genres … www.facebook.com/notes/facebook-data-team/mapping-global-friendship-ties/ (19 March 2012)
3
Number of edges between A and B ? Coarse-grained topology A B nodes belong to different categories Not normalized by the size of categories!
4
Probability that a random node in A is a neighbor of a random node in B 4 Coarse-grained topology A B A, B - all nodes labeled by ‘A’ and ‘B’, respectively all existing edges between A and B all possible edges between A and B nodes belong to different categories
5
Facebook: 800+M users 150 friends each (on average) 8 bytes (64 bits) per user ID The raw connectivity data, with no attributes: 800 x 150 x 8B = 960 GB This is neither feasible nor practical. Solution: Sampling! To get this data, one would have to download: 200 TB of HTML data! 5 Name School / Workplace City or country (before 2010) List of friends
6
6 Coarse-grained topology from a sample UIS – Uniform Independence Sample A B RW – Random Walk sample A B estimate
7
7 Coarse-grained topology from a sample UIS – Uniform Independence Sample A B RW – Random Walk sample A B estimate sampling probability w(v) proportional to node degree
8
UISRW N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) A Estimating category size |A|
9
N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) This correction is essential! UISRW Estimating category size |A|
10
N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) all existing edges between A and B all possible edges between A and B all observed edges between A and B all edges we could have observed between A and B A, BA, B UISRW Estimating edge weights w(A,B) (induced)
11
N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) UISRW Estimating edge weights w(A,B) (induced)
12
N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) E a,B - all edges between node a and nodes in B UISRW Estimating edge weights w(A,B) (star sampling)
13
N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) E a,B - all edges between node a and nodes in B UISRW Estimating edge weights w(A,B) (star sampling)
14
N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) E a,B - all edges between node a and nodes in B UISRW Estimating edge weights w(A,B) (star sampling) A, BA, B
15
N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) E a,B - all edges between node a and nodes in B UISRW Estimating edge weights w(A,B) (star sampling)
16
16 UIS RW Category size Edge weight inducedstarinducedstar Estimators A B We prove the consistency of all these estimators
17
Performance evaluation 17
18
Facebook: Texas sample size |S| Fully known graph
19
sample size |S| Facebook online Online graph [swrw10] M. Kurant, M. Gjoka, C. T. Butts and A. Markopoulou, “Walking on a Graph with a Magnifying Glass”, SIGMETRICS 2011.
20
geosocialmap.com 20
21
geosocialmap.com
24
Public and private colleges in the USA geosocialmap.com 24
25
geosocialmap.com The world according to Facebook 25
26
26 Egypt Saudi Arabia United Arab Emirates Lebanon Jordan Israel Strong clusters among middle-eastern countries
27
UIS A B Summary Consistent estimators under induced and star sampling Coarse-grained topology Original (unknown) topology RW geosocialmap.com More info: http://odysseas.calit2.uci.edu/osn Kiitos!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.