Construction of Simple Graphs with a Target Joint Degree Matrix and Beyond Minas Gjoka, Balint Tillman, Athina Markopoulou University of California, Irvine
Graphs Social Networks Protein interactions World Wide Web Autonomous Systems DNS 2
Motivation Measurements/sampling OSNs [INFOCOM 2010],[ SIGMETRICS 2011], 3x[JSAC 2011], [WOSN 2012]… ~3500 researchers have requested our Facebook datasets Generate synthetic graphs that resemble real social networks to use in simulations for anonymization Q1: resemble in terms of what? Q2: generate how? 3 Social Networks
dK-Series dK-series framework [Mahadevan et al, Sigcomm ’06] “A set of graph properties that describe and constrain random graphs, using degree correlations, in successively finer detail” a 2b
dK-Series dK-series framework [Mahadevan et al, Sigcomm ’06] 0K specifies the average node degree a 2b
dK-Series dK-series framework [Mahadevan et al, Sigcomm ’06] 0K specifies the average node degree 1K specifies the node degree sequence 1K a 2b k D(k) 1 2 3
dK-Series dK-series framework [Mahadevan et al, Sigcomm ’06] 0K specifies the average node degree 1K specifies the node degree sequence 2K specifies the joint node degree matrix (JDM) 2K 7 (k,l) a 2b
dK-series framework [Mahadevan et al, Sigcomm ’06] 0K specifies the average node degree 1K specifies the node degree sequence 2K specifies the joint node degree matrix (JDM) 3K specifies the number of induced subgraphs of 3 nodes o nodes are labeled by their degree k dK-Series 8 3K a 2b (k,l,m) 2 #Wedges 1,3,2 2 (k,l,m) 1 #Triangles 2,2,3 2
dK-Series dK-series framework [Mahadevan et al, Sigcomm ’06] 0K specifies the average node degree 1K specifies the node degree sequence 2K specifies the joint node degree matrix (JDM) 3K specifies the number of induced subgraphs of 3 nodes … nK specifies the entire graph Nice properties Inclusion Convergence Tradeoff : accuracy vs. complexity OSNs “2K+” 9
Related Work Graph Construction Approaches: Stochastic: reproduces dk-distribution in expectation. Configuration (“pseudograph”): reproduces dk-distribution exactly. o Deterministic algorithms up to d=2. MCMC for d>=2. 1K Construction Configuration: 1K multigraphs [Molloy’95] 1K+ [Bansal ’09, Newman’09, Serrano & Boguna’05, …] 2K Construction Configuration model for 2K multigraphs [Mahadevan’06] Balance Degree Invariant: simple graphs [Amanatidis’08], [Stanton’ 12] 2K+ Construction 2K preserving, 3K targeting using edge rewiring: [Mahadevan’ 06] 2.5K heuristic: JDM+degree dependent clustering coefficient: [Gjoka’13] 10
2K Construction Configuration Model 3a 3b 2b Free stub 2a 4a JDM current JDM target k l k l 11
2K Construction Configuration Model 3a 3b 2b Used stub Free stub 2a 4a JDM current JDM target (2a,3a) Edges added (2b,4a) (2b,3a)(3b,4a) (2a,2b)(3a,3b) k l k l Construction stuck! 2/8 (25%) of the edges cannot be added 12
2K Construction Balanced Degree Invariant 3a 3b 4b4a k =3 l =4 Used stub Free stub 3a 3b 4b4a k =3 l =4 3a 3b 4b4a k =3 l =4 Construction constrained! JDM(3, 4) < JDM (3, 4) JDM(3, 4) = 1 target JDM (3, 4) = 2 13
Our Contributions New 2K Construction Algorithm can produce any simple graph Main benefit: no constraints in constructed graphs with the exact JDM target in O(|E|d max ) 2K+ Framework : JDM target + Additional Properties 2K + Node Attributes (exactly) 2K + Avg Clustering (approx) Main benefit: orders of magnitude faster than 2K+MCMC 14
2K Construction JDM target Input: Joint Degree Matrix JDM target must be graphical Goal: Construct a simple graph with exactly JDM target 15
2K Construction 0/1 0/4 0/1 0/40/ JDM/JDM target 1a 2a 4a 3b 3a 1b 4b Initialize: 1K: create nodes and stubs JDM(k,l)=0 for all k,l Pick (k, l) degree pair, in any order While JDM(k, l) < JDM target (k, l) Pick (x, y) any pair of disconnected nodes with degrees k and l … … … … add edge between (x, y) 16
2K Construction 0/11/1 0/1 0/4 1/10/10/40/ JDM/JDM target 1a 2a 4a 3b 3a 1b 4b Initialize: 1K: create nodes and stubs JDM(k,l)=0 for all k,l Pick (k, l) degree pair, in any order While JDM(k, l) < JDM target (k, l) Pick (x, y) any pair of disconnected nodes with degrees k and l … add edge between (x, y) JDM(k, l)++ 17
2K Construction 0/11/1 0/1 0/4 1/10/10/40/ JDM/JDM target 1a 2a 4a 3b 3a 1b 4b Initialize: 1K: create nodes and stubs JDM(k,l)=0 for all k,l Pick (k, l) degree pair, in any order While JDM(k, l) < JDM target (k, l) Pick (x, y) any pair of disconnected nodes with degrees k and l if x does not have free stubs neighbor switch for x if y does not have free stubs neighbor switch for y add edge between (x, y) JDM(k, l)++ 18
Case 1 x, y both have free stubs JDM(k, l) < JDM target (k, l) node x has degree k node y has degree l x y Add edge between x and y k=3 l=4 19
Case 2 x has free stubs but y does not x y k=3 l=4 t Neighbor switch between y and b using t b Add edge between x and y JDM(k, l) < JDM target (k, l) node x has degree k node y has degree l 20
Case 3 neither x nor y have free stubs xb2b2 y k=3 l=4 t1t1 Neighbor switch between y and b 1 using t 1 b1b1 Neighbor switch between x and b2 using t2 t2t2 Add edge between x and y JDM(k, l) < JDM target (k, l) node x has degree k node y has degree l 21
Properties of 2K Algorithm 22 Terminates with exact JDM target in O(|E|d max ) It adds 1 edge at a time, while staying below JDM target It can produce ALL graphs with the JDM target Output graph depends on the order of adding edges
Our Contributions New 2K Construction Algorithm can produce any simple graph Main benefit: no constraints in constructed graphs with the exact JDM target in O(|E|d max ) 2K+ Framework : JDM target + Additional Properties 2K + Node Attributes (exactly) 2K + Avg Clustering (approx) Main benefit: orders of magnitude faster than 2K+MCMC 23
Flexibility of 2K Algorithm 24 Family of algorithms: add one edge at a time, while staying below JDM target any order of degree pairs (k,l) any order of node pairs (x,y), even before completing a degree pair Can start with an empty or partially built graph 2K+: can target additional properties fast Previously known: space of graphs with JDM target is connected; but slow MCMC mixing Property 1: clustering Property 2: attribute correlation
Extension 1: Target JDM + Clustering JDM k l Intuition: by controlling the order we add edges we can control clustering. 0 triangles1 triangles2 triangles 25
2a 2c 3b 3a 2b 2d 2a 2b 3b 3a 2d 2c JDM k l triangles2 triangles b 3a 12 3b 85 2d 2a 63 2c 2b 3a 3b 2d 2a 2c Extension 1: Target JDM + Clustering [INFOCOM 2013]: add edges in increasing distance high clustering nodes randomly on a circle, consider node pairs’ distance 26
“Sortedness” of node pairs’ list controls clustering Example: JDM target of Facebook Caltech Network Consider many orders of node pairs create graphs with JDM target compute avg clustering c. 27 2b 3a 3b 2d 2a 2c [INFOCOM 2015]: control order of node pairs control clustering
2K+ Avg Clustering Input: target JDM, avg clustering coefficient c Stage 1 E’ = list of node pairs s.t. sortedness(E’)≈S(c) FOR each candidate node pair (v,w) in E’: IF both nodes v and w have free stubs and the corresponding JDM(k, l) < JDM target (k, l): add edge (v,w) Stage 2 If not all |E| edges have been added: Add remaining edges using 2K_Simple Extension 1: Target JDM + Clustering 28
Real world examples target JDM+avg clustering Average Clustering Coefficient Average Node Shortest Path Length Average Node Closeness 29
2K+MCMC did not finish after several days Real world examples target JDM+avg clustering 30
Extension 2: Node Attributes JDM k l k l JAM Joint Attribute Matrix (or Attribute Mixing Matrix)
Extension 2: Node Attributes Mixing JDM JAM k l JDM JAM k l Joint Attribute Matrix (or Attribute Mixing Matrix) 32
JDM JAM k l JDM JAM k l Joint Degree and Attribute Matrix (JDAM) Extension 2: Degree+Attribute Mixing 33
Joint Degree and Attribute Matrix (JDAM) Extension 2: target JDAM 2K Algorithm also works for target JDAM 34
Real world examples graphs with node attributes Average Clustering Coefficient Average Node Shortest Path Length Average Node Closeness 35
Real world examples small graphs with node attributes Simulation takes ~1 day to target 2K and c = 0.24 with MCMC (using double edge swaps) 36
Construction of 2K+ Graphs New 2K Construction Algorithm can produce any simple graph with exact JDM target in O(|E|d max ) 2K+ Framework : JDM target + Additional Properties Extension 1: 2K (exactly) + Avg Clustering (approx) Extension 2: 2K (exactly) + Node Attributes (exactly) Future directions Construction: target attributes + structure (towards 3K) 37
Construction of 2K+ Graphs New 2K Construction Algorithm can produce any simple graph with exact JDM target in O(|E|d max ) 2K+ Framework : JDM target + Additional Properties Extension 1: 2K (exactly) + Avg Clustering (approx) Extension 2: 2K (exactly) + Node Attributes (exactly) 38 2b 3a 3b 2d 2a 2c Questions?