Construction of Simple Graphs with a Target Joint Degree Matrix and Beyond Minas Gjoka, Balint Tillman, Athina Markopoulou University of California, Irvine.

Slides:



Advertisements
Similar presentations
1 Greedy Forwarding in Dynamic Scale-Free Networks Embedded in Hyperbolic Metric Spaces Dmitri Krioukov CAIDA/UCSD Joint work with F. Papadopoulos, M.
Advertisements

CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
Modeling Malware Spreading Dynamics Michele Garetto (Politecnico di Torino – Italy) Weibo Gong (University of Massachusetts – Amherst – MA) Don Towsley.
1 2.5K-Graphs: from Sampling to Generation Minas Gjoka, Maciej Kurant ‡, Athina Markopoulou UC Irvine, ETZH ‡
Information Networks Small World Networks Lecture 5.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
Xiaowei Ying Xintao Wu Univ. of North Carolina at Charlotte 2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada Graph Generation with Prescribed.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
School of Information University of Michigan SI 614 Random graphs & power law networks preferential attachment Lecture 7 Instructor: Lada Adamic.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
Directional triadic closure and edge deletion mechanism induce asymmetry in directed edge properties.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
BAYESIAN INFERENCE Sampling techniques
1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.
Does Topology Control Reduce Interference? Martin Burkhart Pascal von Rickenbach Roger Wattenhofer Aaron Zollinger.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Chapter 9 Graph algorithms. Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
CS Lecture 9 Storeing and Querying Large Web Graphs.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Graphical Models for Protein Kinetics Nina Singhal CS374 Presentation Nov. 1, 2005.
Global topological properties of biological networks.
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Advanced Topics in Data Mining Special focus: Social Networks.
Minas Gjoka, UC IrvineWalking in Facebook 1 Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka, Maciej Kurant ‡, Carter Butts,
Graphs & Graph Algorithms 2 Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Pebble games for rigidity Overview. The game of pebbling was first suggested by Lagarias and Saks, as a tool for solving a particular problem in number.
TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION.
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Multigraph Sampling of Online Social Networks Minas Gjoka, Carter Butts, Maciej Kurant, Athina Markopoulou 1Multigraph sampling.
1 Link-Trace Sampling for Social Networks: Advances and Applications Maciej Kurant (UC Irvine) Join work with: Minas Gjoka (UC Irvine), Athina Markopoulou.
Pebble Game Algorithm Demonstration
Biological Networks Lectures 6-7 : February 02, 2010 Graph Algorithms Review Global Network Properties Local Network Properties 1.
Efficient Gathering of Correlated Data in Sensor Networks
Author: M.E.J. Newman Presenter: Guoliang Liu Date:5/4/2012.
Popularity versus Similarity in Growing Networks Fragiskos Papadopoulos Cyprus University of Technology M. Kitsak, M. Á. Serrano, M. Boguñá, and Dmitri.
1 Milena Mihail Georgia Tech. with Stephen Young, Giorgos Amanatidis, Bradley Green Flexible Models for Complex Networks.
LANGUAGE NETWORKS THE SMALL WORLD OF HUMAN LANGUAGE Akilan Velmurugan Computer Networks – CS 790G.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Emergence of Scaling and Assortative Mixing by Altruism Li Ping The Hong Kong PolyU
Analysis of biological networks Part III Shalev Itzkovitz Shalev Itzkovitz Uri Alon’s group Uri Alon’s group July 2005 July 2005.
3. SMALL WORLDS The Watts-Strogatz model. Watts-Strogatz, Nature 1998 Small world: the average shortest path length in a real network is small Six degrees.
Networks Igor Segota Statistical physics presentation.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte 2008 SIAM Conference on Data Mining, April 25 th Atlanta, Georgia.
Role of Rigid Components in Protein Structure Pramod Abraham Kurian.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
Minas Gjoka, Emily Smith, Carter T. Butts
Scaling Properties of the Internet Graph Aditya Akella, CMU With Shuchi Chawla, Arvind Kannan and Srinivasan Seshan PODC 2003.
1 CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network models Tamer Kahveci.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Fast Parallel Algorithms for Edge-Switching to Achieve a Target Visit Rate in Heterogeneous Graphs Maleq Khan September 9, 2014 Joint work with: Hasanuzzaman.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Response network emerging from simple perturbation Seung-Woo Son Complex System and Statistical Physics Lab., Dept. Physics, KAIST, Daejeon , Korea.
Scaling Properties of the Internet Graph Aditya Akella With Shuchi Chawla, Arvind Kannan and Srinivasan Seshan PODC 2003.
1 Coarse-Grained Topology Estimation via Graph Sampling Maciej Kurant 1 Minas Gjoka 2 Yan Wang 2 Zack W. Almquist 2 Carter T. Butts 2 Athina Markopoulou.
Network (graph) Models
Semi-Supervised Clustering
Hiroki Sayama NECSI Summer School 2008 Week 2: Complex Systems Modeling and Networks Network Models Hiroki Sayama
Data Center Network Architectures
Minimum Dominating Set Approximation in Graphs of Bounded Arboricity
Network analysis.
Community detection in graphs
The Watts-Strogatz model
Graphs & Graph Algorithms 2
Peer-to-Peer and Social Networks Fall 2017
Department of Computer Science University of York
Introduction Wireless Ad-Hoc Network
Presentation transcript:

Construction of Simple Graphs with a Target Joint Degree Matrix and Beyond Minas Gjoka, Balint Tillman, Athina Markopoulou University of California, Irvine

Graphs Social Networks Protein interactions World Wide Web Autonomous Systems DNS 2

Motivation  Measurements/sampling OSNs [INFOCOM 2010],[ SIGMETRICS 2011], 3x[JSAC 2011], [WOSN 2012]… ~3500 researchers have requested our Facebook datasets  Generate synthetic graphs that resemble real social networks to use in simulations for anonymization  Q1: resemble in terms of what?  Q2: generate how? 3 Social Networks

dK-Series  dK-series framework [Mahadevan et al, Sigcomm ’06] “A set of graph properties that describe and constrain random graphs, using degree correlations, in successively finer detail” a 2b

dK-Series  dK-series framework [Mahadevan et al, Sigcomm ’06] 0K specifies the average node degree a 2b

dK-Series  dK-series framework [Mahadevan et al, Sigcomm ’06] 0K specifies the average node degree 1K specifies the node degree sequence 1K a 2b k D(k) 1 2 3

dK-Series  dK-series framework [Mahadevan et al, Sigcomm ’06] 0K specifies the average node degree 1K specifies the node degree sequence 2K specifies the joint node degree matrix (JDM) 2K 7 (k,l) a 2b

 dK-series framework [Mahadevan et al, Sigcomm ’06] 0K specifies the average node degree 1K specifies the node degree sequence 2K specifies the joint node degree matrix (JDM) 3K specifies the number of induced subgraphs of 3 nodes o nodes are labeled by their degree k dK-Series 8 3K a 2b (k,l,m) 2 #Wedges 1,3,2 2 (k,l,m) 1 #Triangles 2,2,3 2

dK-Series  dK-series framework [Mahadevan et al, Sigcomm ’06] 0K specifies the average node degree 1K specifies the node degree sequence 2K specifies the joint node degree matrix (JDM) 3K specifies the number of induced subgraphs of 3 nodes … nK specifies the entire graph  Nice properties Inclusion Convergence Tradeoff : accuracy vs. complexity OSNs “2K+” 9

Related Work  Graph Construction Approaches: Stochastic: reproduces dk-distribution in expectation. Configuration (“pseudograph”): reproduces dk-distribution exactly. o Deterministic algorithms up to d=2. MCMC for d>=2.  1K Construction Configuration: 1K multigraphs [Molloy’95] 1K+ [Bansal ’09, Newman’09, Serrano & Boguna’05, …]  2K Construction Configuration model for 2K multigraphs [Mahadevan’06] Balance Degree Invariant: simple graphs [Amanatidis’08], [Stanton’ 12]  2K+ Construction 2K preserving, 3K targeting using edge rewiring: [Mahadevan’ 06] 2.5K heuristic: JDM+degree dependent clustering coefficient: [Gjoka’13] 10

2K Construction Configuration Model 3a 3b 2b Free stub 2a 4a JDM current JDM target k l k l 11

2K Construction Configuration Model 3a 3b 2b Used stub Free stub 2a 4a JDM current JDM target (2a,3a) Edges added (2b,4a) (2b,3a)(3b,4a) (2a,2b)(3a,3b) k l k l Construction stuck! 2/8 (25%) of the edges cannot be added 12

2K Construction Balanced Degree Invariant 3a 3b 4b4a k =3 l =4 Used stub Free stub 3a 3b 4b4a k =3 l =4 3a 3b 4b4a k =3 l =4 Construction constrained!  JDM(3, 4) < JDM (3, 4)  JDM(3, 4) = 1 target JDM (3, 4) = 2   13

Our Contributions  New 2K Construction Algorithm  can produce any simple graph  Main benefit: no constraints in constructed graphs  with the exact JDM target  in O(|E|d max )  2K+ Framework : JDM target + Additional Properties  2K + Node Attributes (exactly)  2K + Avg Clustering (approx)  Main benefit: orders of magnitude faster than 2K+MCMC 14

2K Construction JDM target  Input: Joint Degree Matrix JDM target must be graphical  Goal: Construct a simple graph with exactly JDM target 15

2K Construction 0/1 0/4 0/1 0/40/ JDM/JDM target 1a 2a 4a 3b 3a 1b 4b Initialize: 1K: create nodes and stubs JDM(k,l)=0 for all k,l Pick (k, l) degree pair, in any order While JDM(k, l) < JDM target (k, l) Pick (x, y) any pair of disconnected nodes with degrees k and l … … … … add edge between (x, y) 16

2K Construction 0/11/1 0/1 0/4 1/10/10/40/ JDM/JDM target 1a 2a 4a 3b 3a 1b 4b Initialize: 1K: create nodes and stubs JDM(k,l)=0 for all k,l Pick (k, l) degree pair, in any order While JDM(k, l) < JDM target (k, l) Pick (x, y) any pair of disconnected nodes with degrees k and l … add edge between (x, y) JDM(k, l)++ 17

2K Construction 0/11/1 0/1 0/4 1/10/10/40/ JDM/JDM target 1a 2a 4a 3b 3a 1b 4b Initialize: 1K: create nodes and stubs JDM(k,l)=0 for all k,l Pick (k, l) degree pair, in any order While JDM(k, l) < JDM target (k, l) Pick (x, y) any pair of disconnected nodes with degrees k and l if x does not have free stubs neighbor switch for x if y does not have free stubs neighbor switch for y add edge between (x, y) JDM(k, l)++ 18

Case 1 x, y both have free stubs JDM(k, l) < JDM target (k, l) node x has degree k node y has degree l x y Add edge between x and y k=3 l=4 19

Case 2 x has free stubs but y does not x y k=3 l=4 t Neighbor switch between y and b using t b Add edge between x and y JDM(k, l) < JDM target (k, l) node x has degree k node y has degree l 20

Case 3 neither x nor y have free stubs xb2b2 y k=3 l=4 t1t1 Neighbor switch between y and b 1 using t 1 b1b1 Neighbor switch between x and b2 using t2 t2t2 Add edge between x and y JDM(k, l) < JDM target (k, l) node x has degree k node y has degree l 21

Properties of 2K Algorithm 22  Terminates with exact JDM target in O(|E|d max ) It adds 1 edge at a time, while staying below JDM target  It can produce ALL graphs with the JDM target  Output graph depends on the order of adding edges

Our Contributions  New 2K Construction Algorithm  can produce any simple graph  Main benefit: no constraints in constructed graphs  with the exact JDM target  in O(|E|d max )  2K+ Framework : JDM target + Additional Properties  2K + Node Attributes (exactly)  2K + Avg Clustering (approx)  Main benefit: orders of magnitude faster than 2K+MCMC 23

Flexibility of 2K Algorithm 24  Family of algorithms: add one edge at a time, while staying below JDM target any order of degree pairs (k,l) any order of node pairs (x,y), even before completing a degree pair Can start with an empty or partially built graph  2K+: can target additional properties fast  Previously known: space of graphs with JDM target is connected; but slow MCMC mixing  Property 1: clustering  Property 2: attribute correlation

Extension 1: Target JDM + Clustering JDM k l Intuition: by controlling the order we add edges we can control clustering. 0 triangles1 triangles2 triangles 25

2a 2c 3b 3a 2b 2d 2a 2b 3b 3a 2d 2c JDM k l triangles2 triangles b 3a 12 3b 85 2d 2a 63 2c 2b 3a 3b 2d 2a 2c Extension 1: Target JDM + Clustering [INFOCOM 2013]: add edges in increasing distance  high clustering nodes randomly on a circle, consider node pairs’ distance 26

“Sortedness” of node pairs’ list controls clustering Example: JDM target of Facebook Caltech Network Consider many orders of node pairs  create graphs with JDM target  compute avg clustering c. 27 2b 3a 3b 2d 2a 2c [INFOCOM 2015]: control order of node pairs  control clustering

2K+ Avg Clustering Input: target JDM, avg clustering coefficient c Stage 1 E’ = list of node pairs s.t. sortedness(E’)≈S(c) FOR each candidate node pair (v,w) in E’: IF both nodes v and w have free stubs and the corresponding JDM(k, l) < JDM target (k, l): add edge (v,w) Stage 2 If not all |E| edges have been added: Add remaining edges using 2K_Simple Extension 1: Target JDM + Clustering 28

Real world examples target JDM+avg clustering Average Clustering Coefficient Average Node Shortest Path Length Average Node Closeness 29

2K+MCMC did not finish after several days Real world examples target JDM+avg clustering 30

Extension 2: Node Attributes JDM k l k l JAM Joint Attribute Matrix (or Attribute Mixing Matrix)

Extension 2: Node Attributes Mixing JDM JAM k l JDM JAM k l Joint Attribute Matrix (or Attribute Mixing Matrix) 32

JDM JAM k l JDM JAM k l Joint Degree and Attribute Matrix (JDAM) Extension 2: Degree+Attribute Mixing 33

Joint Degree and Attribute Matrix (JDAM) Extension 2: target JDAM 2K Algorithm also works for target JDAM 34

Real world examples graphs with node attributes Average Clustering Coefficient Average Node Shortest Path Length Average Node Closeness 35

Real world examples small graphs with node attributes Simulation takes ~1 day to target 2K and c = 0.24 with MCMC (using double edge swaps) 36

Construction of 2K+ Graphs  New 2K Construction Algorithm can produce any simple graph with exact JDM target in O(|E|d max )  2K+ Framework : JDM target + Additional Properties  Extension 1: 2K (exactly) + Avg Clustering (approx)  Extension 2: 2K (exactly) + Node Attributes (exactly)  Future directions  Construction: target attributes + structure (towards 3K) 37

Construction of 2K+ Graphs  New 2K Construction Algorithm can produce any simple graph with exact JDM target in O(|E|d max )  2K+ Framework : JDM target + Additional Properties  Extension 1: 2K (exactly) + Avg Clustering (approx)  Extension 2: 2K (exactly) + Node Attributes (exactly) 38 2b 3a 3b 2d 2a 2c Questions?