Statistical physics of complex networks

Slides:



Advertisements
Similar presentations
The Architecture of Complexity: Structure and Modularity in Cellular Networks Albert-László Barabási University of Notre Dame title.
Advertisements

An Intro To Systems Biology: Design Principles of Biological Circuits Uri Alon Presented by: Sharon Harel.
The multi-layered organization of information in living systems
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Information Networks Generative processes for Power Laws and Scale-Free networks Lecture 4.
Advanced Topics in Data Mining Special focus: Social Networks.
4. PREFERENTIAL ATTACHMENT The rich gets richer. Empirical evidences Many large networks are scale free The degree distribution has a power-law behavior.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
School of Information University of Michigan SI 614 Random graphs & power law networks preferential attachment Lecture 7 Instructor: Lada Adamic.
Complex Networks Third Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
Transcription Networks And The Cell’s Functional Organization Presenter: Roni Sharf.
Scale-free networks Péter Kómár Statistical physics seminar 07/10/2008.
Detecting topological patterns in complex networks Sergei Maslov Brookhaven National Laboratory.
Detecting topological patterns in protein networks Sergei Maslov Brookhaven National Laboratory.
Extracting hidden information from knowledge networks Sergei Maslov Brookhaven National Laboratory, New York, USA.
Biological Networks Feng Luo.
1 Complex systems Made of many non-identical elements connected by diverse interactions. NETWORK New York Times Slides: thanks to A-L Barabasi.
Network Statistics Gesine Reinert. Yeast protein interactions.
Regulatory networks 10/29/07. Definition of a module Module here has broader meanings than before. A functional module is a discrete entity whose function.
Network Motifs: simple Building Blocks of Complex Networks R. Milo et. al. Science 298, 824 (2002) Y. Lahini.
Network Motifs Zach Saul CS 289 Network Motifs: Simple Building Blocks of Complex Networks R. Milo et al.
Advanced Topics in Data Mining Special focus: Social Networks.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Protein Classification A comparison of function inference techniques.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Models and Algorithms for Complex Networks Networks and Measurements Lecture 3.
Network Analysis and Application Yao Fu
Network Biology Presentation by: Ansuman sahoo 10th semester
School of Information University of Michigan SI 614 Network subgraphs (motifs) Biological networks Lecture 11 Instructor: Lada Adamic.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Network Clustering Experimental network mapping Graph theory and terminology Scale-free architecture Integrating with gene essentiality Robustness Lecturer:
Complex Networks First Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Analysis of biological networks Part III Shalev Itzkovitz Shalev Itzkovitz Uri Alon’s group Uri Alon’s group July 2005 July 2005.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Networks Igor Segota Statistical physics presentation.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Network Evolution Statistics of Networks Comparing Networks Networks in Cellular Biology A. Metabolic Pathways B. Regulatory Networks C. Signaling Pathways.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Percolation and diffusion in network models Shai Carmi, Department of Physics, Bar-Ilan University Networks Percolation Diffusion Background picture: The.
LECTURE 2 1.Complex Network Models 2.Properties of Protein-Protein Interaction Networks.
Introduction to biological molecular networks
Bioinformatics Center Institute for Chemical Research Kyoto University
Network resilience.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
1 Lesson 12 Networks / Systems Biology. 2 Systems biology  Not only understanding components! 1.System structures: the network of gene interactions and.
Network Analysis Goal: to turn a list of genes/proteins/metabolites into a network to capture insights about the biological system 1.Types of high-throughput.
Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.
Response network emerging from simple perturbation Seung-Woo Son Complex System and Statistical Physics Lab., Dept. Physics, KAIST, Daejeon , Korea.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Lecture II Introduction to complex networks Santo Fortunato.
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
Structures of Networks
Lecture 1: Introduction CS 765: Complex Networks
Biological Networks Analysis Degree Distribution and Network Motifs
Modelling Structure and Function in Complex Networks
Network Models Michael Goodrich Some slides adapted from:
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

Statistical physics of complex networks Sergei Maslov Brookhaven National Laboratory

Short history: complex systems before & after networks Statistical physics of complex systems was active in 80’s-90’s (following the chaos boom of 70’s) Fractals (Mandelbrot and many others) Self-Organized Criticality (Per Bak and co-authors)  sandpiles  granular systems Complex==multiple time and length scales (e.g. avalanches)  Cult of power-laws Cellular automata (mostly in real space+time) Examples: earthquakes disordered moving interfaces (co)-evolution of species agent-based modeling (“ants”) By the end of 90’s breakup of the community and specialization Biology Economics and finance Internet Social sciences

Networks in complex systems Large number of components interacting with each other All components and/or interactions are different from each other (unlike in traditional physics where 1023 electrons are all the same!) Paradigms: 104 types of proteins in an organism, 106 routers in the Internet 109 web pages in the WWW 1011 neurons in a human brain The simplest property: who interacts with whom? can be visualized as a network Complex networks are just a backbone for complex dynamical processes

Why study the topology of complex networks? Lots of easily available data: that’s where the state of the art information is (at least in biology) Large networks may contain information about basic design principles and/or evolutionary history of the complex system This is similar to paleontology: learning about an animal from its backbone

Inside single cells

A small part of a metabolic network: the citric acid cycle

Metabolic pathway chart by ExPASy

Protein binding networks Baker’s yeast S. cerevisiae (only nuclear proteins shown) Nematode worm C. elegans

Transcription regulatory networks Single-celled eukaryote: S. cerevisiae Bacterium: E. coli

protein-gene interactions GENOME protein-gene interactions PROTEOME protein-protein interactions METABOLISM bio-chemical reactions slide after Reka Albert

Between cells in a multi-cellular organism

Sea urchin embryonic development (endomesoderm up to 30 hours) by Davidson’s lab

C. elegans neurons

Between organisms

Freshwater food web by Neo Martinez and Richard Williams

Sexual contacts: M. E. J. Newman, The structure and function of complex networks, SIAM Review 45, 167-256 (2003).

Social

High school dating: Data drawn from Peter S High school dating: Data drawn from Peter S. Bearman, James Moody, and Katherine Stovel visualized by Mark Newman

Network of actor co-starring in movies

Networks of scientists’ co-authorship of papers

Webpages connected by hyperlinks on the AT&T website circa 1996 visualized by Mark Newman Citation networks are similar to the WWW but time-ordered

Technological

Internet as measured by Hal Burch and Bill Cheswick's Internet Mapping Project.

transportation networks: airlines

transportation networks: railway maps Tokyo rail map

Lecture 1: General introduction into networks Node degrees, its distribution, and correlations Simple models preferential attachment and Simon model Growth model for protein families Percolation transition on networks Clustering coefficient Lectures 2-3: Biomolecular (mostly protein) networks Regulatory and signaling networks How many regulators? Bureaucratic collapse Network motifs in directed (e.g. regulatory) networks Protein binding networks Broad degree distributions in protein binding networks and possible explanations Evolutionary (duplication-divergence) Biophysical (stickiness) Functional Beyond degree distributions: How it all is wired together? Correlations in degrees Randomization of networks Law of Mass Action and propagation of perturbations Lecture 4: Technological and information networks Diffusion and modules in the Internet, WWW, and scientific citations Predicting opinions of customers on products (e.g. movies) using knowledge networks

Degree (or connectivity) of a node – the # of neighbors K=2 Degree K=4

Directed networks have in- and out-degrees In-degree Kin=2 Out-degree Kout=5

Degree distributions in random and real networks

Degree distribution in a random network Poisson distribution Randomly throw E edges among N nodes Solomonoff, Rapaport, Bull. Math. Biophysics (1951) Erdos-Renyi (1960) Degree distribution – Binominal  Poisson K~ with no hubs (fast decay of N(K))

Degree distribution in real protein binding network Histogram N(K) is broad: most nodes have low degree ~ 1, few nodes – high degree ~100 Can be approximately fitted with N(K)~K- functional form with ~=2.5

Many real world networks have broad degree distributions exponent  film actors 2.3 telephone call graph 2.1 email networks 1.5/2.0 sexual contacts 3.2 WWW 2.3/2.7 internet 2.5 peer-to-peer metabolic network 2.2 protein interactions 2.4

Basic BA-model Very simple algorithm to implement 1 2 3 start with an initial set of m0 fully connected nodes e.g. m0 = 3 now add new vertices one by one, each one with exactly m edges each new edge connects to an existing vertex in proportion to the number of edges that vertex already has → preferential attachment easiest if you keep track of edge endpoints in one large array and select an element from this array at random the probability of selecting any one vertex will be proportional to the number of times it appears in the array – which corresponds to its degree 1 2 3 1 1 2 2 2 3 3 4 5 6 6 7 8 ….

generating BA graphs – cont’d 1 2 3 1 1 2 2 3 3 To start, each vertex has an equal number of edges (2) the probability of choosing any vertex is 1/3 We add a new vertex, and it will have m edges, here take m=2 draw 2 random elements from the array – suppose they are 2 and 3 Now the probabilities of selecting 1,2,3,or 4 are 1/5, 3/10, 3/10, 1/5 Add a new vertex, draw a vertex for it to connect from the array etc. 1 2 3 1 1 2 2 2 3 3 3 4 4 4 1 2 3 4 1 1 2 2 2 3 3 3 3 4 4 4 5 5 5

The tale of linear vs exponential growth Linear growth: Barabasi-Albert model with =3 is a version of the Simon’s word usage model: =2+ dnk/dt=(k-1)nk-1/(t+t)-knk/(t+t) Exponential growth: Protein duplication-deletion model: =2+/(dup-del) dnk/dt=dup (k-1)nk-1- (dup+del )knk+ +del (k+1)nk+1; NF=knk also grows exponentially: dNF/dt=  NG=  kknk

Preferential attachment with fitness Bianconi-Barabasi (2001) Attractiveness of a node to new edges is given by fiki/rfrkr For uniform (f): Pk ~ k-(1+C*)/ln(k), where C*=1.255 Generally C depends on (f) Some (f) result in “Bose-Einstein condensation” in which super-hubs emerge

Percolation transition in networks

Why should we care? The most important property of a network. It quantifies how broken-up is a network Below the percolation threshold: many small components At the percolation threshold: scale-free distribution of component sizes: P(S)=S-2.5 Above the percolation threshold: giant connected component and a few small ones? Determines the propagation of perturbations which affect neighbors with probability p (e.g. infections)

Naïve (and wrong) argument An average node has <K> first neighbors, <K><K-1> second neighbors, <K><K-1><K-1> third neighbors We neglect overlap between e.g. second and first neighbors: in random networks a small effect ~1/N If <K-1>  1 a single node is connected to a finite fraction of all nodes in the network

Where is it wrong? Probability to arrive at a node with K neighbors is proportional to K! All averages have to be modified <F(K)>  <F(K) K>/<K> The right answer: <K(K-1)>/<K>  1 a perturbation would spread In directed networks it is <KinKout>/<Kin>  1 Correlations between degrees of neighbors and an abnormally large number of triangles (clustering) would affect the answer

How many clusters? If <K(K-1)>/<K> << 1 there are only small clusters If <K(K-1)>/<K>  1 cluster sizes S have a scale-free distribution: P(S)~S-2.5. If <K(K-1)>/<K> >> 1 there is one “giant” cluster and a few small ones Perturbation which affects neighbors with probability p propagates if p<K(K-1)>/<K>  1 For scale-free networks P(K)~K- with <3, <K2>=  perturbation always spreads in a large enough network

Diameter and mean cluster size are determined by <k(k-1)>/<k> Mean diameter L: 1+<k>+ <k><k(k-1)>/<k>+ <k>(<k(k-1)>/<k>)L= =N  L  log(N/<k>)/log(<k(k-1)>/<k>)+1 Mean cluster size below pc: <S>=1+<k>/(1-<k(k-1)>/<k>)

Amplification ratios A(dir): 1.08 - E. Coli, 0.58 - Yeast A(undir): 10.5 - E. Coli, 13.4 – Yeast A(PPI): ? - E. Coli, 26.3 - Yeast

Clustering coefficient C C=3 N/knk k(k-1)/2 Could be defined for individual nodes or as a function of k: C(k)=3 N(k)/nk k(k-1)/2 C=1 could not be realized if k is heterogeneous Needs to be compared to its value in randomized networks with the same degree sequence

End lecture 1

Lecture 2

Protein networks

Places to learn molecular biology Molecular Biology of the Cell. Fourth Edition. Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, Peter Walter. Garland Science. 2002. DNA from the beginning. http://www.dnaftb.org/ Online Biology Book. http://gened.emc.maricopa.edu/bio/bio181/BIOBK/BioBookTOC.html Kimball’s Biology Pages. http://www.ultranet.com/~jkimball/BiologyPages/ Gene expression. http://vlib.org/Science/Cell_Biology/gene_expression.shtml Human Genome Project. http://www.ornl.gov/hgmis/ Microarrays. http://www.gene-chips.com/ From Prof. Michael Hallett (McGill) online lectures

Protein networks Nodes – proteins Edges – interactions between proteins Metabolic (protein enzymes on sharing common metabolites are connected) Physical (binding interactions) Regulatory and signaling (transcriptional regulation, protein modifications) Co-expression networks from microarray data (connect genes with similar expression (abundance) patterns under many conditions) Genetic interactions e.g. synthetic lethal protein pairs (removal of any one of the two proteins doesn’t kill the cell, but removal of both proteins does) Etc, etc, etc.

Sources of data on protein networks Genome-wide experiments Binding – two-hybrid (Y2H) and mass-spec (MS) high-throughput techniques Transcriptional regulation – ChIP-on-chip, or ChIP-then-SAGE Expression, disruption networks – microarrays Lethality of genes (including synthetic lethals): Gene knockout – yeast RNAi –worm, fly Many small or intermediate-scale experiments All stored in public databases: BIOGRID, DIP, BIND, YPD (no longer public), SGD, Flybase, Ecocyc, etc.

Pathway  network paradigm shift

Inhibition of apoptosis MAPK signaling Images from ResNet3.0 by Ariadne Genomics Inhibition of apoptosis MAPK signaling

Transcription regulatory networks

Transcription factors bind DNA

Activators and repressors Depending on the position of the binding site (operator) with respect to the RNA-polymerase binding site (promoter) Transcription Factors could either activate or repress the production of mRNA from a given gene (transcription) and thus affect the abundance of a protein product

Transcription regulatory networks Single-celled eukaryote: S. cerevisiae; 3:1 ratio Bacterium: E. coli 3:2 ratio

Sea urchin embryonic development (endomesoderm up to 30 hours) by Davidson’s lab

How many transcriptional regulators are out there?

Fraction of transcriptional regulators in bacteria from Stover et al., Nature (2000)

Figure from Erik van Nimwegen, TIG 2003

Complexity of regulation grows with complexity of organism NR<Kout>=N<Kin>=number of edges NR/N= <Kin>/<Kout> increases with N <Kin> grows with N In bacteria NR~N2 (Stover, et al. 2000) In eucaryots NR~N1.3 (van Nimwengen, 2002) Networks in more complex organisms are more interconnected then in simpler ones

Complexity is manifested in Kin distribution E. coli vs H. sapiens

Table from Erik van Nimwegen, TIG 2003

Toolbox model NTF=AN2  dNTF=2ANdN  dN/dNTF=2A/N In small genomes ~100 genes per TF. In large ones only 4! A toolbox (e.g. metabolic network) grows linearly with N. To handle a new condition (NTFNTF+1) one needs fewer and fewer new tools. S. Maslov, S. Krishna, K. Sneppen in preparation

How is it all connected? (beyond degree distribution)

What is unusual about topology of a given network? Look for a number of occurrences of a certain topological pattern Compare with a randomized network What patterns to look for? Number of edges connecting nodes with given degrees (degree-degree correlations) Motifs – small subgraphs of 3-4 nodes (in undirected networks clustering or the triangles) Overrepresentation – Nature needs them for some function Underrepresentation – they are detrimental and nature avoids them

How to construct a proper random network?

Randomization of a network given complex network random

Stub reconnection algorithm Break every edge into two halves (“stubs”) Randomly reconnect stubs Watch for multiple edges! For example, in the AS-Internet two largest hubs would end up being connected with 50 edges (sic!) Not adaptable to conserve other low-level topological properties of the network

Local rewiring algorithm R. Kannan, P. Tetali, and S. Vempala, Random Structures and Algorithms (1999) SM, K. Sneppen, Science (2002) Randomly select and rewire two edges Repeat many times

Metropolis rewiring algorithm “energy” E “energy” E+E SM, K. Sneppen: cond-mat preprint (2002), Physica A (2004) Randomly select two edges Calculate change E in “energy function” E=(Nactual-Ndesired)2/Ndesired Rewire with probability p=exp(-E/T)

Degree-degree correlations

Central vs peripheral network architecture (anti-hierarchical) central (hierarchical) random A. Trusina, P. Minnhagen, SM, K. Sneppen, Phys. Rev. Lett. 92, 17870, (2004)

What is the case for protein interaction network SM, K. Sneppen, Science 296, 910 (2002)

Correlation profile Count N(k0,k1) – the number of links between nodes with connectivities k0 and k1 Compare it to Nr(k0,k1) – the same property in a random network Qualitative features are very noise-tolerant with respect to both false positives and false negatives

Correlation profile of the protein interaction network R(k0,k1)=N(k0,k1)/Nr(k0,k1) Z(k0,k1) =(N(k0,k1)-Nr(k0,k1))/Nr(k0,k1) Similar profile is seen in the yeast regulatory network

Hubs may act within a module, or connect modules Party hub: simultaneous interactions tends to be within the same module Date hub: sequential interactions connect different modules Han et al, Nature 443, 88 (2004)

Correlation profile of the yeast regulatory network R(kout, kin)=N(kout, kin)/Nr(kout,kin) Z(kout,kin)=(N(kout,kin)-Nr(kout,kin))/ Nr(kout,kin)

Some scale-free networks may appear similar In both networks the degree distribution is scale-free P(k)~ k- with ~2.2-2.5

But: correlation profiles give them unique identities Protein interactions Internet

Small network motifs (Uri Alon and his group)

All 3 node motifs

Motifs can overlap in the network motif to be found graph motif matches in the target graph http://mavisto.ipk-gatersleben.de/frequency_concepts.html

Detection of important network motifs Technique: construct many random graphs with the same number of nodes and degree distribution count the number of motifs in those graphs calculate the Z score: the probability that the same or larger number of motifs in the real world network could have occurred in a random one Software available: http://www.weizmann.ac.il/mcb/UriAlon/

What the Z score means x - mx zx sx = m = mean number of times the motif appeared in the random graph the probability observing a Z score of 2 is 0.02275 In the context of motifs: Z > 0, motif occurs more often than for random graphs Z < 0, motif occurs less often than in random graphs |Z| > 1.65, only a 5% chance of random occurrence s standard deviation # of times motif appeared in random graph x - mx zx = sx

Examples of network motifs (3 nodes) X Y Z Feed forward loop Found in many transcriptional regulatory networks X Y Z coherent incoherent

Possible functional role of a coherent feed-forward loop Noise filtering: short pulses in input do not result in turning on of the Z To function needs time-delay (about 0.5hrs for bacterial transcription)

All 4 node subgraphs (computational expense increases with the size of the graph!)

Higher-order motifs 4-node motifs contain some 3-node motifs One needs to be careful when calculating over-representation Alon & co-authors use our Metropolis algorithm to generate networks with a given number of low-level motifs

Table 1 from R Milo, S Shen-Orr, S Itzkovitz, N Kashtan, D Chklovskii & U Alon, Network Motifs: Simple Building Blocks of Complex Networks Science, 298:824-827 (2002)

Examples of network motifs (4 nodes) Y Z Parallel paths are over represented Neural networks Food webs

Finding classes on graphs based on their motif “profiles”

THE END