Ricard V. Solè and Sergi Valverde Prepared by Amaç Herdağdelen Information Theory of Complex Networks: on evolution and architectural constraints Ricard V. Solè and Sergi Valverde Prepared by Amaç Herdağdelen
Introduction Complex systems as complex networks of interactions Metabolic networks, software class diagrams, electronic circuits Describing complex networks by quantitative measures: Degree distribution (exponential, power law, “normal”) Statistical properties (average degree, clustering, diameter)
Problem The space of possible networks are much more complex Average statistics lack capturing all essential features and providing insight Need for additional measures to analyze and classify complex networks
Possible Measures Heterogeneity Randomness Modularity How heterogeneous the nodes are (based on degree) Randomness Is there an underlying order? Modularity Is there a hierarchical organization
Zoo of Complex Networks
Notation G = (V,E), classical graph representation k(i) degree of node i P(k): Degree distribution (as probability, summing to 1) q(k): “Remaining degrees”: Choose a random edge. q(k) is the probability that the edge goes out of a node with (k+1) degree. <k> = Average degree
Notation k P(k) (out) q(k) (out) 0.5 = 8/16 1 0.89 = 8/9 2 3 4 5 6 7 8 0.5 = 8/16 1 0.89 = 8/9 2 3 4 5 6 7 8 0.11 = 1/9 q(k) = [(k + 1) * P(k+1)] / <k>
Degree vs. Remaining Degree Classical Degree Remaining Degree Random Graph Evenly distributed degrees
An Example Measure Assortative Mixing (AM) Disassortative Mixing (DM) High degree nodes tend to link to high degree nodes Found in social networks Disassortative Mixing (DM) The reverse, high degree nodes tend to link to low degree nodes Found in biological networks
An Example Measure qc(i,j) = The probability that a randomly chosen edge will be between two nodes with “remaining degrees” i and j. For no assortative case (no AM/DM): qc(i,j) = q(i) * q(j) (Both degrees are independent) Assortativeness measure r: Related to the value E(qc(i,j)) – E(q(i) * q(j)) Normalized such that -1 < r < 1 -1 means highly DM, +1 means highly AM
An Example Measure High AM (r > 0) High DM (r < 0) b q(a) = i q(b) = ? High AM (r > 0) High DM (r < 0) No AM/DM (r = 0) k(b) With high probability ~i With high probability different than i (either higher or lower) No conclusion can be drawn
Entropy and Information Entropy is defined in several domains. The relevant ones are: Thermodynamic Entropy (Clausius): Measure of the amount of energy in a physical system which cannot be used to do work Statistical Entropy (Boltzmann): A measure of how ordered a system is: Information Entropy (Shannon): A measure of how random a signal or random event is
Information Information of a message is a measure of the decrease of uncertainty at the receiver: Receiver M = ? (1? 2? .. 100?) Sender (M = 5) Message: [M = 5] Sender (M = 5) Receiver M = 5!
Information Entropy The more uncertainty the more information Let x be the result of a toss (x = H or x = T) Unbiased coin (P(H)=½, P(T)=½), x carries 1 bit of information (knowing x gives 1 bit information) Biased coin (P(H)=0.9, P(T)=0.1) x does not contain that much information. (the decrease of uncertainty at the receiver is low, compare it with the possible values for M in the previous example) The more uncertain (random) a message to the outsider is, the more information it carries!
Information Entropy Information ~ Uncertainty and Information Entropy is a measure of randomness of an event Entropy = Information carried by an event High entropy corresponds to more informative, random events Low entropy corresponds to less informative, ordered events Consider Turkish. A Turkish text of 10 letters does not contain “10-letters of information” (try your fav. compression algorithm on a Turkish text, for English it is found that a letter carries out 1.5 bits of information)
Information Entropy Formally: H(x) = Entropy of an event x (eg. a message) i = [1..n] all possible outcomes for x p(i): Probability that i. outcome will occur The more random the event (probabilities are equal) the higher entropy Highest possible entropy = log(n)
Information Entropy For a Bernoulli trial (X = {0,1}) the graph of entropy vs. Pr(X = 1). The highest H(X) = 1 = log(2)
Example Entropy Calculations H = 3.3219 H = 3.1036 H = 2.7251 H = 1.2764
So What? Any questions so far?
So What? Apply the information theory and the entropy as a measure of the “orderedness” of a graph Remember assortativeness? It is a measure of correlation but only works when there is a linear relation between two variables (qc(i,j)). Mutual Information between two variables is a more general measure which captures non-linear relation When I know about X, how much do I know about Y?
Measures (Network Entropy) Heterogeneity of the degrees of nodes (Noise) Entropy of the probability distribution of observing a node with remaining degree k given that the node at the other end of the chosen edge has k’ leaving edges (Information Transfer) Mutual information between degrees of two neighbor nodes
Results Noise versus network entropy, the line consists of points where information transfer is 0 (H(q) = H(q|q’))
Results Low information transfer means knowing a degree of a node does not tell us much about the degrees of its neighbors: Small assortativeness Looks like many (if not all) complex networks are heterogeneous (high entropy) and have low degree correlations Are degree correlations irrelevant? Or are they non-existent for some reason?
Results Maybe there is a selective pressure that favors the networks with heterogeneous distribution and low assortativeness when a complexity limit is reached A Monte Carlo search by simulated annealing is performed to provide evidence which suggests this is NOT the case
Monte Carlo Search Search is done in the multi-dimensional space of all networks with N nodes and E edges 2 dimensional parameter space for networks: H: Entropy and Hc: Noise For every random sample. Find corresponding point (H,Hc) Perfomr a Monte Carlo search to minimize following potential function for a candidate graph Ω: By looking at the error for Ω (ε(Ω)) we can calculate a likelihood for Ω. This value gives us a measure of how likely it is to reach Ω from a given random point.
Results The candidate graphs that occupy the same region with the observed real graphs appeared as the most likely graphs. Note that for a very large portion of the theoretically possible network space it is almost impossible to obtain graphs located in that area. The area with high likelihood is the place where the scale-free networks reside.
Discussion The authors claim that the observed lack of degree correlation and high heterogeneity is not a result of adaptation or parameter selection but a result of higher-level limitations (!) on network architectures Without assuming a particular network growth model they showed that a very specific domain of all possible networks are attainable by an optimization algorithm, outside this domain it is not feasible find graphs that satisfies the complexity constraints These results and formulation might be a step towards explaining why so different networks operating/evolving under so different conditions have many common properties
Thank you for your attention ?
"In many different domains with different constraints, the systems usually end up with networks that fall in the specific area we found in the monte-carlo simulations. This is not only because of some evolutionary (or other) constraints which favor the networks in that area but also because most of the networks actually reside in that area. We mean, even if there was a constraint in the system that favors the networks whose entropy/noise values fall outside of the mentioned area, the system would be unsuccesful most of the time in its search/evolution/development for such a network (just as our monte-carlo search did)".