1 Sampling Massive Online Graphs Challenges, Techniques, and Applications to Facebook Maciej Kurant (UC Irvine) Joint work with: Minas Gjoka (UC Irvine),

Slides:

Advertisements

Similar presentations

Autonomic Scaling of Cloud Computing Resources

Advertisements

Benchmarking traversal operations over graph databases Marek Ciglan 1, Alex Averbuch 2 and Ladialav Hluchý 1 1 Institute of Informatics, Slovak Academy.

LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.

Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.

Analysis and Modeling of Social Networks Foudalis Ilias.

1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

1 2.5K-Graphs: from Sampling to Generation Minas Gjoka, Maciej Kurant ‡, Athina Markopoulou UC Irvine, ETZH ‡

Practical Recommendations on Crawling Online Social Networks

Construction of Simple Graphs with a Target Joint Degree Matrix and Beyond Minas Gjoka, Balint Tillman, Athina Markopoulou University of California, Irvine.

Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.

Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.

Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.

BAYESIAN INFERENCE Sampling techniques

1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.

Random Walk on Graph t=0 Random Walk Start from a given node at time 0

Mining and Searching Massive Graphs (Networks)

1 Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes Yunfeng Lin, Ben Liang, Baochun Li INFOCOM 2007.

Maciej Kurant (EPFL / UCI) Joint work with: Athina Markopoulou (UCI),

Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.

CS246 Search Engine Bias. Junghoo "John" Cho (UCLA Computer Science)2 Motivation “If you are not indexed by Google, you do not exist on the Web” --- news.com.

Geographic Gossip: Efficient Aggregations for Sensor Networks Author: Alex Dimakis, Anand Sarwate, Martin Wainwright University: UC Berkeley Venue: IPSN.

Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian.

Measurement and Analysis of Online Social Networks By Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Attacked.

Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.

Searching in Unstructured Networks Joining Theory with P-P2P.

Minas Gjoka, UC IrvineWalking in Facebook 1 Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka, Maciej Kurant ‡, Carter Butts,

1 Uniform Sampling from the Web via Random Walks Ziv Bar-Yossef Alexander Berg Steve Chien Jittat Fakcharoenphol Dror Weitz University of California at.

Correctness of Gossip-Based Membership under Message Loss Maxim Gurevich, Idit Keidar Technion.

Models of Influence in Online Social Networks

Multigraph Sampling of Online Social Networks Minas Gjoka, Carter Butts, Maciej Kurant, Athina Markopoulou 1Multigraph sampling.

1 Link-Trace Sampling for Social Networks: Advances and Applications Maciej Kurant (UC Irvine) Join work with: Minas Gjoka (UC Irvine), Athina Markopoulou.

Social Networking and On-Line Communities: Classification and Research Trends Maria Ioannidou, Eugenia Raptotasiou, Ioannis Anagnostopoulos.

University of California at Santa Barbara Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy, and Ben Zhao.

+ Offline Optimal Ads Allocation in SNS Advertising Hui Miao, Peixin Gao.

Seasonal Decomposition of Cell Phone Activity Series and Urban Dynamics Blerim Cici, Minas Gjoka, Athina Markopoulou, Carter T. Butts 1.

Popularity versus Similarity in Growing Networks Fragiskos Papadopoulos Cyprus University of Technology M. Kitsak, M. Á. Serrano, M. Boguñá, and Dmitri.

Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.

Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.

WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.

Understanding Cross-site Linking in Online Social Networks Yang Chen 1, Chenfan Zhuang 2, Qiang Cao 1, Pan Hui 3 1 Duke University 2 Tsinghua University.

Poking Facebook: Characterization of OSN Applications Minas Gjoka, Michael Sirivianos, Athina Markopoulou, Xiaowei Yang University of California, Irvine.

Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.

Bruno Ribeiro CS69000-DM1 Topics in Data Mining. Bruno Ribeiro  Reviews of next week’s papers due Friday 5pm (Sunday 11:59pm submission closes) ◦ Assignment.

Thang N. Dinh, Dung T. Nguyen, My T. Thai Dept. of Computer & Information Science & Engineering University of Florida, Gainesville, FL Hypertext-2012,

A Graph-based Friend Recommendation System Using Genetic Algorithm

Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

How far removed are you? Scalable Privacy-Preserving Estimation of Social Path Length with Social PaL Marcin Nagy joint work with Thanh Bui, Emiliano De.

Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.

Challenges and Opportunities Posed by Power Laws in Network Analysis Bruno Ribeiro UMass Amherst MURI REVIEW MEETING Berkeley, 26 th Oct 2011.

Stratified K-means Clustering Over A Deep Web Data Source Tantan Liu, Gagan Agrawal Dept. of Computer Science & Engineering Ohio State University Aug.

Bruno Ribeiro Don Towsley University of Massachusetts Amherst IMC 2010 Melbourne, Australia.

Sampling Techniques for Large, Dynamic Graphs Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon Nick Duffield – AT&T Labs—Research.

Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.

Network Community Behavior to Infer Human Activities.

Community-enhanced De-anonymization of Online Social Networks Shirin Nilizadeh, Apu Kapadia, Yong-Yeol Ahn Indiana University Bloomington CCS 2014.

Stefanos Antaris A Socio-Aware Decentralized Topology Construction Protocol Stefanos Antaris *, Despina Stasi *, Mikael Högqvist † George Pallis *, Marios.

Minas Gjoka, Emily Smith, Carter T. Butts

Classification Ensemble Methods 1

Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.

1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.

1 Coarse-Grained Topology Estimation via Graph Sampling Maciej Kurant 1 Minas Gjoka 2 Yan Wang 2 Zack W. Almquist 2 Carter T. Butts 2 Athina Markopoulou.

GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.

Empirical analysis of Chinese airport network as a complex weighted network Methodology Section Presented by Di Li.

ISP and Egress Path Selection for Multihomed Networks

Dieudo Mulamba November 2017

Effective Social Network Quarantine with Minimal Isolation Costs

Objective of This Course

GANG: Detecting Fraudulent Users in OSNs

Presentation transcript:

1 Sampling Massive Online Graphs Challenges, Techniques, and Applications to Facebook Maciej Kurant (UC Irvine) Joint work with: Minas Gjoka (UC Irvine), Athina Markopoulou (UC Irvine), Carter T. Butts (UC Irvine), Patrick Thiran (EPFL). 14 Nov, 2011, KTH

Why study Online Social Networks (OSNs)? Engineering Search engine accuracy Better spam filters Efficient data centers New apps/Third party services Offload 3G operators … Social Media Predict the spread and importance of information Social filters … Social Sciences Great source of data for studying the structure of the society, online behavior, … Marketing Influential users Recommendations Ad placement … Large scale data mining understand user communication patterns, community structure “human sensors” Privacy …. 2

3 OSNs cover 50% of world’s Internet users > 1 b illion users October million 200 million 66 million 50 million 34 million Active users

Facebook: 800+M users 150 friends each (on average) 8 bytes (64 bits) per user ID The raw connectivity data, with no attributes: 800 x 150 x 8B = 960 GB This is neither feasible nor practical. Solution: Sampling! To get this data, one would have to download: 200 TB of HTML data! 4

Sampling 5 Node attributes Topology Graph size Evolution in time Random node selection … Objective:

Sampling 6 Node attributes Topology Graph size Evolution in time Random node selection … Objective: Nodes What:

Sampling 7 Node attributes Topology Graph size Evolution in time Random node selection … Objective: Nodes Edges What:

Sampling Node attributes Topology Graph size Evolution in time Random node selection … Objective: Nodes Edges Subgraphs What:

Sampling Node attributes Topology Graph size Evolution in time Random node selection … Objective: Nodes Edges Subgraphs What: Directly Often not possible How:

Sampling Node attributes Topology Graph size Evolution in time Random node selection … Objective: Nodes Edges Subgraphs What: Directly Often not possible Exploration How:

OSNs P2P, distributed systems WWW “Offline” social network Nodes Edges Subgraphs What: Directly Often not possible Exploration How: Sampling Node attributes Topology Graph size Evolution in time Random node selection … Objective:

Random Walks in graph sampling: WWW [Henzinger et at. 2000, Baykan et al. 2009] P2P [Gkantsidis et al. 2004, Stutzbach et al. 2006, Rasti et al. 2009] OSN [Rasti et al. 2008, Krishnamurthy et al, 2008] “Offline” social networks [Salganik et al. 2004, Volz et al. 2008] Random Walks mixing improvements: Random jumps [Henzinger et al. 2000, Avrachenkov, et al. 2010] Fastest Mixing Markov Chain [Boyd et al. 2004] Multiple dependent walks [Ribeiro et al. 2010] BFS and other traversals in graph sampling: Najork et al. 2001, Achlioptas et al. 2005, Leskovec et al. 2006, Mislove et al. 2007, Cha 2007, Ahn et al. 2007, Wilson et al. 2009, Viswanath 2009, Ye et al. 2010, Gile and Handcock 2011 Measurement/Characterization studies of OSNs: Cyworld, Orkut, Myspace, Flickr, Youtube [Mislove et al. 2007, …] Facebook [Krishnamurthy et al. ’08, Wilson et al. 2009, …] Independence sampling: Hansen-Hurwitz estimator [Hansen and Hurwitz 1943] Stratified sampling [Neyman 1934] Related work

Outline Introduction Sampling with replacements (Random Walks): MHRW vs RWRW Multigraph Sampling Stratified Weighted Random Walk (S-WRW) Sampling without replacements (Traversals): The bias of BFS (and of DFS/RDS/…) Estimation from a sample Conclusion and Future Directions

Outline Introduction Sampling with replacements (Random Walks): MHRW vs RWRW Multigraph Sampling Stratified Weighted Random Walk (S-WRW) Sampling without replacements (Traversals): The bias of BFS (and of DFS/RDS/…) Estimation from a sample Conclusion and Future Directions

q k - observed node degree distribution p k - real node degree distribution Random Walk in Facebook 15 degree of node v Pr( sampling v) ~ k v

16 Metropolis-Hastings Random Walk (MHRW): DAAC… … C C D D M M J J N N A A B B I I E E K K F F L L H H G G How to get an unbiased sample? S = asymptotically uniform

17 Metropolis-Hastings Random Walk (MHRW): DAAC… … C C D D M M J J N N A A B B I I E E K K F F L L H H G G 17 Re-Weighted Random Walk (RWRW): Collect a classic (biased) RW sample… Now apply the Hansen-Hurwitz estimator: How to get an unbiased sample? S = asymptotically uniform

18 Metropolis-Hastings Random Walk (MHRW):Re-Weighted Random Walk (RWRW): Facebook results Also corrects for the bias of all other metrics: Not corrected:

19 MHRW or RWRW ? ~3.0 19

20 RWRW is better than MHRW RWRW requires 1.5 to 7 times fewer samples to achieve the same Intuition? However: Pathological counter-examples exist. MHRW is easier to use (it does not require reweighting) MHRW or RWRW ? [1] Minas Gjoka, Maciej Kurant, Carter T. Butts and Athina Markopoulou, “Walking in Facebook: A Case Study of Unbiased Sampling of OSNs”, INFOCOM 2010.

Online Convergence Diagnostics Acceptable convergence between 500 and 3000 iterations (depending on property of interest ) Inferences assume that samples are drawn from stationary distribution No ground truth available in practice MCMC literature, online diagnostics [1] Minas Gjoka, Maciej Kurant, Carter T. Butts and Athina Markopoulou, “Walking in Facebook: A Case Study of Unbiased Sampling of OSNs”, INFOCOM 2010.

Outline Introduction Sampling with replacements (Random Walks): MHRW vs RWRW Multigraph Sampling Stratified Weighted Random Walk (S-WRW) Sampling without replacements (Traversals): The bias of BFS (and of DFS/RDS/…) Estimation from a sample Conclusion and Future Directions

C C D D M M J J N N A A B B I I E E K K F F L L H H G G Friends C C D D M M J J N N A A B B I I E E K K F F L L H H G G Events C C D D M M J J N N A A B B I I E E K K F F L L H H G G Groups E.g., in LastFM

C C D D M M J J N N A A B B I I E E K K F F L L H H G G Friends C C D D M M J J N N A A B B I I E E K K F F L L H H G G Events C C D D M M J J N N A A B B I I E E K K F F L L H H G G Groups E.g., in LastFM

J J C C D D M M N N A A B B I I E E G * = Friends + Events + Groups ( G * is a multigraph ) F F L L H H G G K K 25 Multigraph sampling [2] Minas Gjoka, Carter T. Butts, Maciej Kurant, Athina Markopoulou, “Multigraph Sampling of Online Social Networks”, JSAC Efficient implementation (saves bandwidth): 1) Select relation graph G i with probability deg(H,G i ) / deg(H, G * ) 2) Within G i choose an edge uniformly at random, i.e., with probability 1/deg(H, G i ). Applied to LastFM: - better coverage of previously isolated nodes - better estimates of distributions and means

Outline Introduction Sampling with replacements (Random Walks): MHRW vs RWRW Multigraph Sampling Stratified Weighted Random Walk (S-WRW) Sampling without replacements (Traversals): The bias of BFS (and of DFS/RDS/…) Estimation from a sample Conclusion and Future Directions

Not all nodes are equal irrelevant important (equally) important Node categories: e.g. China e.g., Sweden Stratification under Weighted Independence Sampler (WIS) (node size is proportional to its sampling probability) 27

Not all nodes are equal But graph exploration techniques have to follow the links! Trade-off between ideal (WIS) sampling weights fast convergence Enforcing WIS weights may lead to slow (or no) convergence 28 Assumption: On sampling a node, we learn the categories of its neighbors. irrelevant important (equally) important Node categories: Stratification under Weighted Independence Sampler (WIS) (node size is proportional to its sampling probability) Fastest Mixing Markov Chain [Boyd et al. 2004]

Measurement objective E.g., compare the size of red and green categories. 29

Measurement objective Category weights optimal under WIS Stratified sampling theory + Information collected by pilot RW E.g., compare the size of red and green categories. 30

Problem 2: “Black holes” Measurement objective Category weights optimal under WIS Modified category weights Problem 1: Poor or no connectivity Solution: Small weight>0 for irrelevant categories. f* -the fraction of time we plan to spend in irrelevant nodes (e.g., 1%) Solution: Limit the weight of tiny relevant categories. Γ - maximal factor by which we can increase edge weights (e.g., 100 times) E.g., compare the size of red and green categories.

Measurement objective Category weights optimal under WIS Modified category weights Limit the weight of tiny categories (to avoid “black holes”) Allocate small weight to irrelevant node categories Controlled by two intuitive and robust parameters E.g., compare the size of red and green categories. 32

Measurement objective Category weights optimal under WIS Modified category weights Edge weights in G E.g., compare the size of red and green categories. 20 = vol(green), from pilot RW Target edge weights: 22 = 4 =

Measurement objective Category weights optimal under WIS Modified category weights Edge weights in G Resolve conflicts: arithmetic mean, geometric mean, max, … E.g., compare the size of red and green categories. 20 = vol(green), from pilot RW Target edge weights: 22 = 4 =

Measurement objective Category weights optimal under WIS Modified category weights Edge weights in G WRW sample E.g., compare the size of red and green categories.

Measurement objective Category weights optimal under WIS Modified category weights Edge weights in G WRW sample Final result Hansen-Hurwitz estimator E.g., compare the size of red and green categories.

Stratified Weighted Random Walk (S-WRW) Measurement objective Category weights optimal under WIS Modified category weights Edge weights in G WRW sample Final result E.g., compare the size of red and green categories.

Colleges in Facebook versions of S-WRW Random Walk (RW) Samples in colleges: 86% of S-WRW, 9% of RW. This is because S-WRW avoids irrelevant categories. The difference is larger (100x) for small colleges. This is due to S-WRW’s stratification. RW discovered 5’325 colleges. S-WRW: 8’815 (not shown) [3] Maciej Kurant, Minas Gjoka, Carter T. Butts and Athina Markopoulou, “Walking on a Graph with a Magnifying Glass”, SIGMETRICS RW required times more samples than S-WRW to achieve the same accuracy.

Sampling with replacements: Summary RWRW is times more efficient than MHRW counter-examples exists Multigraph Sampling walking on multiple relations improves efficiency Stratified Weighted Random Walk oversamples relevant regions, undersamples irrelevant regions fold gains in sampling costs Online Convergence Diagnostics 39 [1] Minas Gjoka, Maciej Kurant, Carter T. Butts and Athina Markopoulou, “Walking in Facebook: A Case Study of Unbiased Sampling of OSNs”, INFOCOM [2] Minas Gjoka, Carter T. Butts, Maciej Kurant, Athina Markopoulou, “Multigraph Sampling of Online Social Networks”, JSAC [3] Maciej Kurant, Minas Gjoka, Carter T. Butts and Athina Markopoulou, “Walking on a Graph with a Magnifying Glass”, SIGMETRICS 2011.

Outline Introduction Sampling with replacements (Random Walks): MHRW vs RWRW Multigraph Sampling Stratified Weighted Random Walk (S-WRW) Sampling without replacements (Traversals): The bias of BFS (and of DFS/RDS/…) Estimation from a sample Conclusion and Future Directions

41 Sampling without replacements (Traversals)

42 Sampling without replacements (Traversals)

43 Sampling without replacements (Traversals)

44 Sampling without replacements (Traversals) Examples: BFS (Breadth-First Search) DFS (Depth-First Search) Forest Fire RDS (Respondent-Driven Sampling) Snowball sampling … Why sample with BFS? BFS is a well known textbook technique BFS sample is a nice looking graph It is used in practice [Ahn et al. 2007, Mislove et al. 2007, Wilson et al. 2009]

45 BFS in Facebook pkpk qkqk BFS (Breadth First Search) with f=0.5% of nodes sampled (338 for RW) This bias has been empirically observed in the past [Najork et al. 2001]. Our goals: Formally analyze the bias of BFS (challenging due to dependencies) Correct for this bias. (no new sampling method proposed)

46 - real average node degree - real average squared node degree. Goal: Analyze the bias of BFS Graph traversals on RG(p k ): ? BFS q k ( f ) = ? true average node degree

47 Graph model RG(p k ) Random graph RG(p k ) with a given node degree distribution p k (sequence) Can be generated by configuration model Example: ‘stubs’

48 Approach 1: Brute force Remedy: “The Principle of Deferred Decisions” So we can generate the graph ‘on the fly’, while exploring it! Generate all possible graphs, and...No way!

49 Approach 2: The Principle of Deferred Decisions This does not scale! (because of dependencies between stubs) v u ? * we assumed that the generated graph is connected

1 2 1 time t 01 Originally proposed in: J. H. Kim, “Poisson cloning model for random graphs,” International Congress of Mathematicians (ICM), 2006 (preprint in 2004). Developped in: D. Achlioptas, A. Clauset, D. Kempe, and C. Moore, “On the bias of traceroute sampling: or, power-law degree distributions in regular graphs,” in STOC, (both in a different context) Approach 2b: Breaking the stub dependencies V2V v4v4 v3v v1v1

time t 01 Originally proposed in: J. H. Kim, “Poisson cloning model for random graphs,” International Congress of Mathematicians (ICM), 2006 (preprint in 2004). Developped in: D. Achlioptas, A. Clauset, D. Kempe, and C. Moore, “On the bias of traceroute sampling: or, power-law degree distributions in regular graphs,” in STOC, (both in a different context) Approach 2b: Breaking the stub dependencies V2V v4v v1v1 v3v3

f number of nodes of degree k

f Approach 2b: Breaking the stub dependencies number of nodes of degree k

54 Graph traversals on RG(p k ): MHRW, RWRW Main results true average node degree [4] Maciej Kurant, Athina Markopoulou, Patrick Thiran, “On the Bias of BFS”, JSAC Python code available at:

55 Graph traversals on RG(p k ): MHRW, RWRW Main results RDS true average node degree [4] Maciej Kurant, Athina Markopoulou, Patrick Thiran, “On the Bias of BFS”, JSAC Python code available at:

Main results 56 Graph traversals on RG(p k ): For small sample size (for f→0), BFS has the same bias as RW. This bias monotonically decreases with f. We found analytically the shape of this curve. MHRW, RWRW For large sample size (for f→1), BFS becomes unbiased. RDS 56 true average node degree Under RG(pk), all traversals are subject to exactly the same bias. [4] Maciej Kurant, Athina Markopoulou, Patrick Thiran, “On the Bias of BFS”, JSAC Python code available at:

57 What if the graph is not random? [4] Maciej Kurant, Athina Markopoulou, Patrick Thiran, “On the Bias of BFS”, JSAC Python code available at: expected, sampled true, corrected

Sampling without replacements: Summary 58 [4] Maciej Kurant, Athina Markopoulou, Patrick Thiran, “On the Bias of BFS”, JSAC Python code available at: Graph traversals on RG(p k ): MHRW, RWRW A difficult problem Dependencies between samples We computed analytically the bias of BFS in RG(p k ) Initial bias as of RW Same bias for all traversals (BFS, DFS, RDS,…) under RG(p k ) A bias correction procedure Works well for real-life graphs If possible, prefer methods with replacements.

Outline Introduction Sampling with replacements (Random Walks): MHRW vs RWRW Multigraph Sampling Stratified Weighted Random Walk (S-WRW) Sampling without replacements (Traversals): The bias of BFS (and of DFS/RDS/…) Estimation from a sample Conclusion and Future Directions

1) Local properties Node properties: Community membership information Privacy settings Names … Local topology properties: Node degree distribution Assortativity Clustering coefficient … 60

61 Example: Privacy Awareness in Facebook’09 1) Local properties Privacy Awareness - fraction of users that change the default privacy settings. PA =

2) Estimating the graph size Counts repeated nodes – “Reversed Birthday Paradox” Work in progress 62

Probability that a random node in A is a neighbor of a random node in B 63 From a randomly sampled set of nodes we infer a valid topology! 3) Coarse-grained topology A B [5] M. Kurant, M. Gjoka, Y. Wang, Z. W. Almquist, C. T. Butts, A. Markopoulou, “Coarse-Grained Topology Estimation”, arXiv: (estimator)

geosocialmap.com 64 [5] M. Kurant, M. Gjoka, Y. Wang, Z. W. Almquist, C. T. Butts, A. Markopoulou, “Coarse-Grained Topology Estimation”, arXiv:

Public and private colleges in the USA geosocialmap.com 65

geosocialmap.com The world according to Facebook 66

67 Egypt Saudi Arabia United Arab Emirates Lebanon Jordan Israel Strong clusters among middle-eastern countries

Summary 68

C C D D M M J J N N A A B B I I E E K K F F L L H H G G C C D D M M J J N N A A B B I I E E K K F F L L H H G G C C D D M M J J N N A A B B I I E E K K F F L L H H G G J J C C D D M M N N A A B B I I E E F F L L G G K K H H Multigraph sampling [2]Stratified WRW [3] Random Walks (with replacements) RWRW > MHRW [1] Convergence Diagnostics References [1] M. Gjoka, M. Kurant, C. T. Butts and A. Markopoulou, “Walking in Facebook: A Case Study of Unbiased Sampling of OSNs”, INFOCOM [2] M. Gjoka, C. T. Butts, M. Kurant and A. Markopoulou, “Multigraph Sampling of Online Social Networks”, JSAC 2011 [3] M. Kurant, M. Gjoka, C. T. Butts and A. Markopoulou, “Walking on a Graph with a Magnifying Glass”, SIGMETRICS [4] M. Kurant, A. Markopoulou and P. Thiran, “On the bias of BFS (Breadth First Search)”, JSAC, [5] M. Kurant, M. Gjoka, Y. Wang, Z. W. Almquist, C. T. Butts, A. Markopoulou, “Coarse-Grained Topology Estimation”, arXiv: [6] Datasets available from :

Stratified WRW [3] Graph traversals on RG(p k ): MHRW, RWRW [4] Traversals (no replacements) 70 J J C C D D M M N N A A B B I I E E F F L L G G K K H H Multigraph sampling [2] References [1] M. Gjoka, M. Kurant, C. T. Butts and A. Markopoulou, “Walking in Facebook: A Case Study of Unbiased Sampling of OSNs”, INFOCOM [2] M. Gjoka, C. T. Butts, M. Kurant and A. Markopoulou, “Multigraph Sampling of Online Social Networks”, JSAC 2011 [3] M. Kurant, M. Gjoka, C. T. Butts and A. Markopoulou, “Walking on a Graph with a Magnifying Glass”, SIGMETRICS [4] M. Kurant, A. Markopoulou and P. Thiran, “On the bias of BFS (Breadth First Search)”, JSAC, [5] M. Kurant, M. Gjoka, Y. Wang, Z. W. Almquist, C. T. Butts, A. Markopoulou, “Coarse-Grained Topology Estimation”, arXiv: [6] Datasets available from : Random Walks (with replacements) RWRW > MHRW [1] Convergence Diagnostics

Stratified WRW [3] Graph traversals on RG(p k ): MHRW, RWRW A B [4] Coarse-grained topologies [5] Traversals (no replacements) References [1] M. Gjoka, M. Kurant, C. T. Butts and A. Markopoulou, “Walking in Facebook: A Case Study of Unbiased Sampling of OSNs”, INFOCOM [2] M. Gjoka, C. T. Butts, M. Kurant and A. Markopoulou, “Multigraph Sampling of Online Social Networks”, JSAC 2011 [3] M. Kurant, M. Gjoka, C. T. Butts and A. Markopoulou, “Walking on a Graph with a Magnifying Glass”, SIGMETRICS [4] M. Kurant, A. Markopoulou and P. Thiran, “On the bias of BFS (Breadth First Search)”, JSAC, [5] M. Kurant, M. Gjoka, Y. Wang, Z. W. Almquist, C. T. Butts, A. Markopoulou, “Coarse-Grained Topology Estimation”, arXiv: [6] Datasets available from : J J C C D D M M N N A A B B I I E E F F L L G G K K H H Multigraph sampling [2] Thank you mkurant.com Random Walks (with replacements) RWRW > MHRW [1] Convergence Diagnostics