1 2.5K-Graphs: from Sampling to Generation Minas Gjoka, Maciej Kurant ‡, Athina Markopoulou UC Irvine, ETZH ‡

Slides:



Advertisements
Similar presentations
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
Advertisements

1 Greedy Forwarding in Dynamic Scale-Free Networks Embedded in Hyperbolic Metric Spaces Dmitri Krioukov CAIDA/UCSD Joint work with F. Papadopoulos, M.
Power Laws By Cameron Megaw 3/11/2013. What is a Power Law?
Analysis and Modeling of Social Networks Foudalis Ilias.
Dynamic Bayesian Networks (DBNs)
Practical Recommendations on Crawling Online Social Networks
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Construction of Simple Graphs with a Target Joint Degree Matrix and Beyond Minas Gjoka, Balint Tillman, Athina Markopoulou University of California, Irvine.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
SLAW: A Mobility Model for Human Walks Lee et al..
Xiaowei Ying Xintao Wu Univ. of North Carolina at Charlotte 2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada Graph Generation with Prescribed.
Directional triadic closure and edge deletion mechanism induce asymmetry in directed edge properties.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.
Network Topology Julian Shun. On Power-Law Relationships of the Internet Topology (Faloutsos 1999) Observes that Internet graphs can be described by “power.
On Computing Compression Trees for Data Collection in Wireless Sensor Networks Jian Li, Amol Deshpande and Samir Khuller Department of Computer Science,
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Mutual Information Mathematical Biology Seminar
Maciej Kurant (EPFL / UCI) Joint work with: Athina Markopoulou (UCI),
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Global topological properties of biological networks.
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
Advanced Topics in Data Mining Special focus: Social Networks.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
1 Characterizing Selfishly Constructed Overlay Routing Networks March 11, 2004 Byung-Gon Chun, Rodrigo Fonseca, Ion Stoica, and John Kubiatowicz University.
Amos Storkey, School of Informatics. Density Traversal Clustering and Generative Kernels a generative framework for spectral clustering Amos Storkey, Tom.
Minas Gjoka, UC IrvineWalking in Facebook 1 Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka, Maciej Kurant ‡, Carter Butts,
On Distinguishing between Internet Power Law B Bu and Towsley Infocom 2002 Presented by.
Navigability of Networks Dmitri Krioukov CAIDA/UCSD M. Boguñá, M. Á. Serrano, F. Papadopoulos, M. Kitsak, A. Vahdat, kc claffy May, 2010.
Models of Influence in Online Social Networks
Information Networks Power Laws and Network Models Lecture 3.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
A Distributed and Privacy Preserving Algorithm for Identifying Information Hubs in Social Networks M.U. Ilyas, Z Shafiq, Alex Liu, H Radha Michigan State.
Multigraph Sampling of Online Social Networks Minas Gjoka, Carter Butts, Maciej Kurant, Athina Markopoulou 1Multigraph sampling.
1 Link-Trace Sampling for Social Networks: Advances and Applications Maciej Kurant (UC Irvine) Join work with: Minas Gjoka (UC Irvine), Athina Markopoulou.
Neighbourhood Sampling for Local Properties on a Graph Stream A. Pavan, Iowa State University Kanat Tangwongsan, IBM Research Srikanta Tirthapura, Iowa.
+ Offline Optimal Ads Allocation in SNS Advertising Hui Miao, Peixin Gao.
Surface Simplification Using Quadric Error Metrics Michael Garland Paul S. Heckbert.
Popularity versus Similarity in Growing Networks Fragiskos Papadopoulos Cyprus University of Technology M. Kitsak, M. Á. Serrano, M. Boguñá, and Dmitri.
1 Sampling Massive Online Graphs Challenges, Techniques, and Applications to Facebook Maciej Kurant (UC Irvine) Joint work with: Minas Gjoka (UC Irvine),
Adaptive CSMA under the SINR Model: Fast convergence using the Bethe Approximation Krishna Jagannathan IIT Madras (Joint work with) Peruru Subrahmanya.
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
International Workshop on Complex Networks, Seoul (23-24 June 2005) Vertex Correlations, Self-Avoiding Walks and Critical Phenomena on the Static Model.
Emergence of Scaling and Assortative Mixing by Altruism Li Ping The Hong Kong PolyU
Gennaro Cordasco - How Much Independent Should Individual Contacts be to Form a Small-World? - 19/12/2006 How Much Independent Should Individual Contacts.
Analysis of biological networks Part III Shalev Itzkovitz Shalev Itzkovitz Uri Alon’s group Uri Alon’s group July 2005 July 2005.
Xiaowei Ying, Xintao Wu Univ. of North Carolina at Charlotte PAKDD-09 April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.
Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte 2008 SIAM Conference on Data Mining, April 25 th Atlanta, Georgia.
Bruno Ribeiro Don Towsley University of Massachusetts Amherst IMC 2010 Melbourne, Australia.
A Visual and Statistical Benchmark for Graph Sampling Methods Fangyan Zhang 1 Song Zhang 1 Pak Chung Wong 2 J. Edward Swan II 1 T.J. Jankun-Kelly 1 1 Mississippi.
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
Community-enhanced De-anonymization of Online Social Networks Shirin Nilizadeh, Apu Kapadia, Yong-Yeol Ahn Indiana University Bloomington CCS 2014.
Minas Gjoka, Emily Smith, Carter T. Butts
John Lafferty Andrew McCallum Fernando Pereira
Privacy Protection in Social Networks Instructor: Assoc. Prof. Dr. DANG Tran Khanh Present : Bui Tien Duc Lam Van Dai Nguyen Viet Dang.
1 NETWORKING 2012 Parallel and Distributed Systems Group, Delft University of Technology, the Netherlands May 22, 2012 Reducing the History in Decentralized.
Graph Data Management Lab, School of Computer Science Personalized Privacy Protection in Social Networks (VLDB2011)
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Fast Parallel Algorithms for Edge-Switching to Achieve a Target Visit Rate in Heterogeneous Graphs Maleq Khan September 9, 2014 Joint work with: Hasanuzzaman.
ICTP School and Workshop on Structure and Function of complex Networks (16-28 May 2005) Structural correlations and critical phenomena of random scale-free.
1 Coarse-Grained Topology Estimation via Graph Sampling Maciej Kurant 1 Minas Gjoka 2 Yan Wang 2 Zack W. Almquist 2 Carter T. Butts 2 Athina Markopoulou.
Sofus A. Macskassy Fetch Technologies
Department of Computer Science University of York
Pramod Bhatotia, Ruichuan Chen, Myungjin Lee
Presentation transcript:

1 2.5K-Graphs: from Sampling to Generation Minas Gjoka, Maciej Kurant ‡, Athina Markopoulou UC Irvine, ETZH ‡

Motivation Generate synthetic topologies that resemble real ones – development and testing of algorithms, protocols – anonymization Obtaining full topology is not always possible – data owner not willing to release it – exhaustive measurement impractical due to size 2

Our methodology 3 Node Samples Synthetic Graph Real Graph ( unknown ) Crawl using UIS, WIS, RW Estimated Graph Properties Estimate Generate

Which graph properties? dk-series framework [Mahadevan et al, Sigcomm ’06] – 0K specifies the average node degree – 1K specifies the node degree distribution – 2K specifies the joint node degree distribution (JDD) – 3K specifies the distribution of subgraphs of 3 nodes –.. – nK specifies the entire graph Accuracy vs practicality 4 2.5K

2.5K-Graphs Joint Node Degree Distribution (2K): Degree-dependent Average Clustering Coefficient Distribution: 5 all triangles using node u all possible triangles 1a 2a 4a 3b 3a 1b 4b k all nodes of degree k c 3a = 2 3

Related Work Network Modeling – Exponential Random Graph Models [Handcock et al, ‘10] – Kronecker graphs [Kim & Leskovec, ‘11] – dk-Series [Mahadevan et al, Sigcomm ’06] Graph Generation – Simple 1K graphs [Molloy & Reed ’95] – 1K + [Bansal et. al. ’09] – 1K + Triangle Sequence [Newman ’09] – 1K + [Serrano & Boguna, ’05] – 2K [Krioukov et. al. ‘06, Stanton & Pinar ’11] – 3K [Krioukov et. al. ’06] 6 2.5K

7 Node Samples Synthetic Graph Real Graph (unknown) Crawl using UIS, WIS, RW Estimated c(k) & JDD Estimate Generate

Estimation Node Samples 8 UIS – Uniform Independence Sample WIS – Weighted Independence Sample sampling probability w(v) proportional to node degree RW – Random Walk Sample

Estimation c(k) 9 all triangles using node a all possible triangles UIS WIS - all nodes of degree k - neighbors of node a - # shared partners for nodes (a,b) - all sampled nodes of degree k - number of samples for node b - sampling weight of node a when ((RW)

Estimation JDD 10 UIS WIS - all nodes of degree k - all edges - all sampled nodes of degree k - sampling weight of node a when (RW)

Estimation Random Walks RW Traversed Edges Induced Edges Induced Edges with safety margin M=1 Hybrid estimator combines Traversed Edges + Induced Edges with margin M with safety margin M=0 = WIS

12 Node Samples Real Graph (unknown) Crawl using UIS, WIS, RW Estimated c(k) & JDD Generate Estimate Synthetic Graph

Node Samples 2.5K Graph Real Graph (unknown) Crawl using UIS, WIS, RW Estimate Target c(k) Target JDD simple graph exact JDD many triangles 2K Construction Smart Double Edge Swaps Generate

2K Construction (1) Step K k k# K

2K Construction (2) Step k# vk v 1a 1b 2a 3a 3b 4a 4b 1a1a 2a2a 4a4a 3b3b 3a3a 1b1b 4b4b 2K 1K

2K Construction (1) Step a 2a 4a 3b 3a 1b 4b v k v r v 1a 1b 2a 3a 3b 4a 4b a 2a b 4b 90 1b 4a 70 3a k

2K Construction (2) Step a 2a 4a 3b 3a 1b 4b v k v r 1a 1b 2a 3a 3b 4a 4b (3b,4b) (1a,4a) (1a,2a) a 2a b 4b 90 1b 4a 70 3a k (1b,2a) (1b,4b) (2a,4b) (2a,3b) (1b,3a) (2a,4a) (1a,3b) (1b,3b) (4a,4b) (3b,4a)(1a,4b) (2a,3a) (3a,3b) (3a,4b) (3a,4a) (1a,1b)(1b,4a) (1a,3a)

2K Construction Step a 2a 4a 3b 3a 1b 4b v k v r 1a 1b 2a 3a 3b 4a 4b k b 4a 3a 3b k=3 k=4 Degree pair (4,3) Construction stuck. Iterate over all unsaturated degree pairs Iterate over all unsaturated node pairs for the degree pair (4,3) Move 1: Add edge (4b, 3a) Move 2: Add edge (4b,3*) or (3a,4*) Move 3: Add edge (4*, 3*)

Node Samples 2.5K Graph Real Graph (unknown) Crawl using UIS, WIS, RW Estimate Target c(k) Target JDD simple graph exact JDD many triangles 2K Construction Smart Double Edge Swaps Generate

“Smart” double edge swaps 2K-preserving c(k)-targeting 20 degree k 1 Target Destroy triangles Create triangles k1 k2 k1 k3 Double edge swap [ Mahadevan et. al., Sigcomm ‘06 ] How we select edges for double edge swaps – Triangle creation » select edges with low number of shared partners – Triangle destruction » select random edges Current at end of 2K JDD(k1,k2) JDD(k1,k3) unchanged

21 Node Samples Real Graph (unknown) Crawl using UIS, WIS, RW Generate Estimate Evaluation 2.5K Graph Estimated c(k) & JDD

Evaluation Total Number of Triangles Average Clustering Coefficient Number of Nodes Number of Edges Average Degree

Evaluation Efficiency of Estimators (RW) 23 Error Metric : Hybrid combines advantages of IE & TE Facebook New Orleans

Evaluation Speed of 2.5K Graph Generation 24 our 2.5K prior 2.5K Higher gain for graphs with more triangles

Evaluation From Sampling to Generation 25 Facebook New Orleans

Conclusion Complete and practical methodology to generate synthetic graphs from node samples – novel estimators that measure c(k) and JDD – novel and fast 2.5K generator Python implementation – Node Samples Synthetic Graph Real Graph (unknown) Crawl using UIS, WIS, RW c(k) & JDD Estimate Generate