1 Coarse-Grained Topology Estimation via Graph Sampling Maciej Kurant 1 Minas Gjoka 2 Yan Wang 2 Zack W. Almquist 2 Carter T. Butts 2 Athina Markopoulou 2 1 ETHZ, 2 UC Irvine 17 Aug 2012, WOSN
Coarse-grained topology A B nodes belong to different categories Example categories: countries universities workplaces religion age music genres … (19 March 2012)
Number of edges between A and B ? Coarse-grained topology A B nodes belong to different categories Not normalized by the size of categories!
Probability that a random node in A is a neighbor of a random node in B 4 Coarse-grained topology A B A, B - all nodes labeled by ‘A’ and ‘B’, respectively all existing edges between A and B all possible edges between A and B nodes belong to different categories
Facebook: 800+M users 150 friends each (on average) 8 bytes (64 bits) per user ID The raw connectivity data, with no attributes: 800 x 150 x 8B = 960 GB This is neither feasible nor practical. Solution: Sampling! To get this data, one would have to download: 200 TB of HTML data! 5 Name School / Workplace City or country (before 2010) List of friends
6 Coarse-grained topology from a sample UIS – Uniform Independence Sample A B RW – Random Walk sample A B estimate
7 Coarse-grained topology from a sample UIS – Uniform Independence Sample A B RW – Random Walk sample A B estimate sampling probability w(v) proportional to node degree
UISRW N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) A Estimating category size |A|
N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) This correction is essential! UISRW Estimating category size |A|
N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) all existing edges between A and B all possible edges between A and B all observed edges between A and B all edges we could have observed between A and B A, BA, B UISRW Estimating edge weights w(A,B) (induced)
N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) UISRW Estimating edge weights w(A,B) (induced)
N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) E a,B - all edges between node a and nodes in B UISRW Estimating edge weights w(A,B) (star sampling)
N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) E a,B - all edges between node a and nodes in B UISRW Estimating edge weights w(A,B) (star sampling)
N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) E a,B - all edges between node a and nodes in B UISRW Estimating edge weights w(A,B) (star sampling) A, BA, B
N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) E a,B - all edges between node a and nodes in B UISRW Estimating edge weights w(A,B) (star sampling)
16 UIS RW Category size Edge weight inducedstarinducedstar Estimators A B We prove the consistency of all these estimators
Performance evaluation 17
Facebook: Texas sample size |S| Fully known graph
sample size |S| Facebook online Online graph [swrw10] M. Kurant, M. Gjoka, C. T. Butts and A. Markopoulou, “Walking on a Graph with a Magnifying Glass”, SIGMETRICS 2011.
geosocialmap.com 20
geosocialmap.com
Public and private colleges in the USA geosocialmap.com 24
geosocialmap.com The world according to Facebook 25
26 Egypt Saudi Arabia United Arab Emirates Lebanon Jordan Israel Strong clusters among middle-eastern countries
UIS A B Summary Consistent estimators under induced and star sampling Coarse-grained topology Original (unknown) topology RW geosocialmap.com More info: Kiitos!