Multivariate Heavy Tails and Structural Properties of Complex Networks

Slides:

Advertisements

Similar presentations

1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University

Advertisements

Power Laws By Cameron Megaw 3/11/2013. What is a Power Law?

Analysis and Modeling of Social Networks Foudalis Ilias.

Week 5 - Models of Complex Networks I Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.

Maximizing the Spread of Influence through a Social Network

The multi-layered organization of information in living systems

Kronecker Graphs: An Approach to Modeling Networks Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani Presented.

VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.

Hierarchy in networks Peter Náther, Mária Markošová, Boris Rudolf Vyjde : Physica A, dec

Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.

Network Statistics Gesine Reinert. Yeast protein interactions.

Statistics & Modeling By Yan Gao. Terms of measured data Terms used in describing data –For example: “mean of a dataset” –An objectively measurable quantity.

CS Lecture 6 Generative Graph Models Part II.

Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no

Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.

On Distinguishing between Internet Power Law B Bu and Towsley Infocom 2002 Presented by.

(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.

Lecture 6 - Models of Complex Networks II Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.

Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial

Exploring the dynamics of social networks Aleksandar Tomašević University of Novi Sad, Faculty of Philosophy, Department of Sociology

Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.

Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.

Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.

Emergence of Scaling and Assortative Mixing by Altruism Li Ping The Hong Kong PolyU

Soon-Hyung Yook, Sungmin Lee, Yup Kim Kyung Hee University NSPCS 08 Unified centrality measure of complex networks: a dynamical approach to a topological.

KPS 2007 (April 19, 2007) On spectral density of scale-free networks Doochul Kim (Department of Physics and Astronomy, Seoul National University) Collaborators:

How Do “Real” Networks Look?

RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.

Properties of Growing Networks Geoff Rodgers School of Information Systems, Computing and Mathematics.

Hierarchical Organization in Complex Networks by Ravasz and Barabasi İlhan Kaya Boğaziçi University.

Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.

Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.

Network (graph) Models

Random Walk for Similarity Testing in Complex Networks

Structures of Networks

Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst

Multivariate Heavy Tails and Structural Properties of Networks

Groups of vertices and Core-periphery structure

Gyan Ranjan University of Minnesota, MN

Topics In Social Computing (67810)

by Hyunwoo Park and Kichun Lee Knowledge-Based Systems 60 (2014) 58–72

Statistical Data Analysis

Empirical analysis of Chinese airport network as a complex weighted network Methodology Section Presented by Di Li.

Random walks on complex networks

CJT 765: Structural Equation Modeling

Greedy Algorithm for Community Detection

Modeling, sampling, generating Networks with MRV

How Do “Real” Networks Look?

Section 8.6: Clustering Coefficients

Centralities (2) Ralucca Gera,

Degree and Eigenvector Centrality

Section 8.6 of Newman’s book: Clustering Coefficients

How Do “Real” Networks Look?

How Do “Real” Networks Look?

Postdoc, School of Information, University of Arizona

Lecture 13 Network evolution

How Do “Real” Networks Look?

Graph and Tensor Mining for fun and profit

Clustering Coefficients

Modelling and Searching Networks Lecture 3 – ILT model

Ilan Ben-Bassat Omri Weinstein

Statistical Data Analysis

Properties and applications of spectra for networks

Lecture 21 Network evolution

Modelling and Searching Networks Lecture 2 – Complex Networks

Modelling and Searching Networks Lecture 6 – PA models

Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst

Network Models Michael Goodrich Some slides adapted from:

Discrete Mathematics and its Applications Lecture 6 – PA models

MGS 3100 Business Analysis Regression Feb 18, 2016

Presentation transcript:

Multivariate Heavy Tails and Structural Properties of Complex Networks MURI Oct 8 Annual Review Multivariate Heavy Tails and Structural Properties of Complex Networks Zhi-Li Zhang & Golshan Golnari {zhzhang, golnari}@cs.umn.edu Dept. of Computer Science & Eng., University of Minnesota Reporting Collaborative Research with Sid, Don, Bo, et al

Outline Motivation: Central Questions & Overview Structural Properties of Complex Networks Multivariate Analysis and Multivariate Heavy Tails Network Data Analysis: Search for MRVs? Existing studies focus on univariate heavy tails in one or more metrics (i.e., “power laws” in marginal distributions only) Developing “visual” analysis tool for uncovering MRVs MRVs better characterization for complex & diverse structures Generative & Analytical Models for Growing Networks with Multivariate Heavy Tails (MRVs) General recursive recipes for constructing networks w/ MRVs (Stochastic) Kronecker graphs as a specific (but powerful) family of mathematical models for networks with MRVs

Central Questions Networks are “by nature” multivariate entities formed by n nodes and m edges connected in various manners edges may be directed, and/or have weights, signs, … Objective: bring a “multivariate analysis” perspective to study structural properties of complex networks In particular, understand roles and impact of multivariate heavy tails What are (possible) manifestations of multivariate “heavy tail” phenomena in complex networks? beyond (univariate or multivariate) “heavy-tailed” degree distributions And what can they tell us about the structural properties of complex networks?

Complex Networks from the Real World Network datasets arising from many real-world social, natural or engineered systems manifest complex and diverse structures! Cannot be uniquely or sufficiently characterized by a single metric e.g., (power law) degree distribution:

Complex Networks from the Real World Network datasets arising from many real-world social, natural or engineered systems manifest complex and diverse structures! Cannot be uniquely or sufficiently characterized by a single metric e.g., (power law) degree distribution:

Complex Networks from the Real World Network datasets arising from many real-world social, natural or engineered systems manifest complex and diverse structures! Cannot be uniquely or sufficiently characterized by a single metric e.g., (power law) degree distribution: Nor is using multiple metrics independently sufficient to capture such complex and diverse network structures and their formations

Complex Networks from the Real World Network datasets arising from many real-world social, natural or engineered systems manifest complex and diverse structures! Cannot be uniquely or sufficiently characterized by a single metric Nor is using multiple metrics independently sufficient to capture such complex and diverse network structures and their formations These metrics or “characterizations” (“laws”) include “scale-free” or power law degree distribution directed networks: in/out-degree distributions, reciprocal edges & reciprocity small network diameter (“small world”) & skewed distri. of hop-counts power-law eigenvalue spectrum & skewed “network value” distri. (eigenvector components assoc. w/ largest eigenvalue of adj. matrix) node triads or clustering coefficients (network transitivity & communities)

Complex Networks from the Real World Network datasets arising from many real-world social, natural or engineered systems manifest complex and diverse structures! Cannot be uniquely or sufficiently characterized by a single metric Nor is using multiple metrics independently sufficient to capture such complex and diverse network structures and their formations These metrics or “characterizations” (“laws”) include “scale-free” or power law degree distribution directed networks: in/out-degree distributions, reciprocal edges & reciprocity small network diameter (“small world”) & skewed distri. of hop-counts power-law eigenvalue spectrum & skewed “network value” distri. (eigenvector components assoc. w/ largest eigenvalue of adj. matrix) node triads or clustering coefficients (network transitivity & communities) core-periphery, or hierarchical (“self-similar”) structures Evolution of networks: densification power law: shrinking or stabilizing network diameter, & constant avg. clustering coefficient

Characterizing Structural Properties of Complex Networks & (Generative) Models? Existing preferential attachment (PA) types of generative models inadequate in capturing complex, diverse real network structures produce “power law” degree distribution, but avg. clustering coefficient decays fast as size N grows  mostly components are “tree-like”, not enough # of “triads” many “real” networks have average clustering coefficients independent of N Several mostly deterministic, a few stochastic constructive methods or “models” have been proposed in the literature, trying to match two or more metrics of real complex networks, e.g., degree & clustering coeff. but still focus only on marginal distribution of each metrics! can’t distinguish complex network structures with the same marginals We believe multivariate heavy-tail (MRV) analysis provides a better tool to characterize & distinguish diverse dependence structures in complex networks!

Quest for Multivariate “Heavy Tails” in Complex Networks: Two-Pronged Approach Two intertwined threads of research: Empirical network analysis of real-world networks & simulated datasets generated from (generative) network models Develop (visual) analysis tools that are amenable for network data analysis a new “visual” bivariate heavy tail analysis tool using a sequence of log-log plots (similar to univariate log-log plot) Search and devise metrics of interest, compute & plot them to look for (univariate & multivariate) “heavy tails” in network data Compare real networks w/ simulated data for structural similarity analysis Constructive, analytical methods to develop “generative” models for growing networks with multivariate heavy tails explicit construction of growing networks with “interesting” properties analytical models & proofs for existence of (multivariate) “heavy-tails” Q: structural properties (e.g., “hierarchical structure”) thus constructed “essential” or “universal” for networks with multivariate “heavy tails”?

Multivariate Heavy Tails in Complex Networks: Empirical Analysis MRV (of joint metrics distributions) as a distinctive network characterization of complex net structures e.g., degree-degree, degree-clustering coefficient inverse Visualization tool for detecting MRV in networks similar to log-log plot for univariate heavy tails, with a sequence of log-log plots in various scales for bivariate distributions

Multivariate Heavy-Tail (MRV) Definition Hard to say MRV…

Multivariate Heavy-Tail (MRV) The math for the visualization Bivariate case (sequential form):

Multivariate Heavy-Tail (MRV) The visualization method 

Multivariate Heavy Tail (MRV) The visualization method

Hierarchical Generative Models Empirical Analysis One intuition: Networks are generated hierarchically modules/communities recursively combine/connect to each other to form the larger communities But this “combining” process could happen in many different ways  resulting in different generative models

Examples The modules combine by connecting the central nodes to peripheral nodes (low-degree nodes) of other modules. Or the modules combine by connecting/merging the central nodes (high-degree nodes) to central nodes of other modules. Ravasz and Barabasi’s model Dorogovtsev et al.’s model

Marginals alone are not Helpful… Marginals alone cannot distinguish the distinctive network structures generated from these two different generative models. Ravasz and Barabasi’s model Their correlation/dependenices give us much more information Dorogovtsev et al.’s model

MRVs are more Informative! MRVs (of different joint metrics distributions) are able to distinguish these networks generated from different generative models. Ravasz and Barabasi’s model MRV gives us an indication of dependency structures.. Dorogovtsev et al.’s model

“Self-Similar” Networks? Many generative models have been proposed to capture “self-similar” structures of networks. These models are completely different in their structure and the way they are generated: Ravasz and Barabasi: based on the relation between degree and clustering coefficient: Dorogovtsev et al.: power-law degree dist. With exponent: Chen et al.: zero clustering coefficient Zhongzhi Zhang et al.: based on the relation between the exponent of degree dist. and strength dist.:

“Self-similarity” But what is “self-similarity”? Is it a good characterization for networks? How can it distinguish networks with different generative models from each other? Among the real networks, Internet (at AS level) and protein interaction network are two popular ones known to have “self-similar” structure. But is “self-similarity” a sufficient characterization to describe both networks with? How their differences can be captured?

Multivariate Heavy Tails Distinctive characterization for real networks MRV provides better characterization of real network structures

Multivariate Heavy Tails a distinctive characterization for networks Based on the MRV results, we suggest: Ravasz’s model as a good generative model for PIN dataset (MRV of deg-CCI) Dorogovtsev’s model to be a better fit for Internet AS dataset (MRV of deg-CCI and deg-deg) We speculate that networks with the same MRVs tend to have similar structures and perhaps undergo similar formation process. Ravasz’s model Dorogovtsev et al.’s model By studying different MRVs (using joint distributions of different metrics), we are able to better reveal and distinguish diverse and complex network structures!

Quest for Multivariate “Heavy Tails” in Complex Networks: Two-Pronged Approach Two intertwined threads of research: Empirical network analysis of real-world networks & simulated datasets generated from (generative) network models Develop (visual) analysis tools that are amenable for network data analysis a new “visual” bivariate heavy tail analysis tool using a sequence of log-log plots (similar to univariate log-log plot) Search and devise metrics of interest, compute & plot them to look for (univariate & multivariate) “heavy tails” in network data Compare real networks w/ simulated data for structural similarity analysis Constructive, analytical methods to develop “generative” models for growing networks with multivariate heavy tails explicit construction of growing networks with “interesting” properties analytical models & proofs for existence of (multivariate) “heavy-tails” Q: structural properties (e.g., “hierarchical structure”) thus constructed “essential” or “universal” for networks with multivariate “heavy tails”?

Growing Networks w/ Multivariate Heavy Tails A General (Deterministic) Recipe Quick recap of the approach & research started last year Basic idea: instead of adding one node/edge at a time, growing at network level, via a recursive construction starting with a base graph G1 properties of G1 & construction process  whether/what MRVs emerge Basic Recursive Recipe for “Self-Similar” Growing Networks Step 1 (t=1): start with a base network G1 Step t (t=2, …): take c replicas of the network, Gt-1, constructed at step t-1, and “glue” them through a sequence of basic operations: node identification: select one node from two replicas, identify them edge addition: add an edge between two replicas Depending on metrics of interest, we can generate “self-similar” growing networks with certain interesting structural properties e.g., growing networks with “heavy-tailed” distance & betweenness metrics, “heavy-tailed” degree & clustering coefficient distr., …

“Toy” Examples: Sierpinksi Gaskets & Variants Sierpinksi gaskets as a family of growing networks start with a 3-node triangle (a triad) self-similar structure: from t to t+1: replicate itself three times, and “merge” at two of the three corner nodes O1 O2 O3 t =3 O1 O2 O3 t =2 O1 O2 O3 t =1 O1 O2 O3 g1 g2 g3 Properties of Sierpinski Gasket of order t can be derived analytically thru recursion ….. …… Sierpinksi gaskets of order t

Variants of Sierpinksi Gasket Network Models From the basic Sierpinksi gasket network construction, we can also construct growing networks with heavy-tail degrees/cluster coeff.’s start with a 3-node triangle, but at step t (t=2, …), introduce edge additions step t: take three replicas of the network at step t-1, “merge” at two of the three corner nodes; then add edges from o1 to all nodes on the bottom line t =2 O1 O2 O3 O1 O2 O3 t =1 t =3 O1 O2 O3 t =2 O1 O2 O3 MRV in joint degree, CCI and distance metrics O1 O2 O3 t =1 t =3 O1 O2 O3 ……

Key Observations/Insights: Whence come (Multivariate) “Heavy Tails”? We need to ensure the following properties hold: Emergence of (univariate & multivariate) “heavy tails” Justification: “Characterization of Multivariate Heavy-tailed Distribution Families via Copula” by Chengguo Weng & Yi Zhang

Generalized Deterministic (Probabilistic) Recipes for Growing Networks w/ (Multivariate) Heavy Tails Generalized Recursive Recipe for “Hierarchical” Growing Networks: node identification: select one node from two replicas, identify them edge addition: add an edge between two replicas Metrics of interest determine how node identification and edge addition operations should be performed operations e.g., using “preferential attachment”  power law degree distr. step t from step t-1: conditionally independent; we can derive system “evolutional” equations for metrics of interest recursively -- can also be applied to generate directed or signed growing networks

Quest for Generative Graph Models for Complex Networks w/ MRVs The general (deterministic) recursive recipe for growing networks: Advantages: flexible and full control of construction process to obtain desired MRVs, useful for generating synthetic network datasets for numerical analysis analytical results straightforward through recursion Disadvantages: hard to handle once we go from deterministic to probabilistic versions analytical results specific to each explicit construction, hard to use to match against real-world network datasets Current ongoing research: (stochastic) Kronecker product networks A specific (& mathematically well-defined) framework for recursive construction of families of growing network models based on Kronecker product of (adjacency) matrices  tensor algebra deterministic Kronecker product graphs has a long history (since 60s) stochastic Kronecker product graphs: proposed by Leskovec et al ‘05 claimed to match (average & marginal) statistics of many real networks

(Deterministic) Kronecker Graphs Definition Recall: Kronecker (tensor) product of matrices:

Stochastic Kronecker Graphs Lesskovec et al: focus on Kronecker power graphs esp. with a 2-node base graph (thus a 2x2 base matrix with four parameters) developed ML method for parameter estimation fitted to a number of real world network datasets And compared (average & marginal) statistics of real networks with generated Kronecker power graphs Mathematically proved that (evolution of) Kronecker power graphs satisfy “edge densification” laws: have bounded (effective) network diameters

Kronecker Graphs and MRVs? Kronecker power graphs with 2x2 base graph can help explain & distinguish core-periphery and hierarchical (“community”) structures with power-law degree distribution with different b & c values, degree-degree MRV may also emerge core-periphery: with jerry-fish or onion-like structures

Kronecker Graphs and MRVs? Kronecker power graphs with 2x2 base graph can help explain & distinguish core-periphery and hierarchical (“community”) structures with power-law degree distribution with different b & c values, degree-degree MRV may also emerge a d c b

Kronecker Graphs and MRVs? Kronecker power graphs with 2x2 base graph can help explain & distinguish core-periphery and hierarchical (“community”) structures with power-law degree distribution with different b & c values, degree-degree MRV may also emerge a d c b

Kronecker Graphs and MRVs? Kronecker power graphs with 2x2 base graph can help explain & distinguish core-periphery and hierarchical (“community”) structures with power-law degree distribution with different b & c values, degree-degree MRV may also emerge a d c b

Kronecker Graphs and MRVs? Kronecker power graphs with 2x2 base graph can help explain & distinguish core-periphery and hierarchical (“community”) structures with power-law degree distribution with different b & c values, degree-degree MRV may also emerge but we have difficulty in obtaining power-law degree distribution! base graph with communities

Knonecker Graphs & MRVs Ongoing and Future Work Investigating Kronecker product graphs of the form: matrix A: replicate the base graph (and kth-step graph) and add additional edges to inter-connect the replicas => mirror our earlier general recipe both B and A determine what MRVs may emerge Tuning the parameters, able to “regenerate” previously mentioned models e.g., connection between Ravasz & Barabasi’s model & base star shape graph We are studying what kind of base graphs with what properties result in different MRVs  provide insight to & interpretation of structure formation Examples: star-shape base graph results in deg-CCI, and triangle-shape base graph results in deg-deg MRV Attempt to analytically establish stochastic Kronecker network models with MRVs, and quantify the conditions under which they emerge introduce node identification via “quotient” (or projection) operator?

Summary: Quest for Multivariate “Heavy Tails” in Complex Networks: Two-Pronged Approach Two intertwined threads of research: Empirical network analysis of real-world networks & simulated datasets generated from (generative) network models Develop (visual) analysis tools that are amenable for network data analysis a new “visual” bivariate heavy tail analysis tool using a sequence of log-log plots (similar to univariate log-log plot) Search and devise metrics of interest, compute & plot them to look for (univariate & multivariate) “heavy tails” in network data Compare real networks w/ simulated data for structural similarity analysis Constructive, analytical methods to develop “generative” models for growing networks with multivariate heavy tails explicit construction of growing networks with “interesting” properties analytical models & proofs for existence of (multivariate) “heavy-tails” Q: structural properties (e.g., “hierarchical structure”) thus constructed “essential” or “universal” for networks with multivariate “heavy tails”?

Thank You! Questions?