Why bacteria run Linux while eukaryotes run Windows? Sergei Maslov Brookhaven National Laboratory New York.

Slides:



Advertisements
Similar presentations
Complex Networks: Complex Networks: Structures and Dynamics Changsong Zhou AGNLD, Institute für Physik Universität Potsdam.
Advertisements

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Escherichia coli, strain CFT073, uropathogenic Escherichia coli, strain EDL933, enterohemorrhagic Escherichia coli K12, strain MG1655, laboratory strain,
Analysis and Modeling of Social Networks Foudalis Ilias.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Chapter 5 Operating Systems. 5 The Operating System When working with multimedia, the operating system is perhaps the most important, the most complex,
Information Networks Generative processes for Power Laws and Scale-Free networks Lecture 4.
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Parkinson’s Law in bacterial regulation Sergei Maslov Brookhaven National Laboratory.
"Home Depot" Model of Evolution of Prokaryotic Metabolic Networks and Their Regulation Sergei Maslov Brookhaven National Laboratory In collaboration with.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
School of Information University of Michigan SI 614 Random graphs & power law networks preferential attachment Lecture 7 Instructor: Lada Adamic.
The Complex Dynamics of Collaborative Tagging Harry Halpin University of Edinburgh Valentin Robu CWI, Netherlands Hana Shepherd Princeton University WWW.
Emergence of Scaling in Random Networks Barabasi & Albert Science, 1999 Routing map of the internet
Directional triadic closure and edge deletion mechanism induce asymmetry in directed edge properties.
Scale Free Networks Robin Coope April Abert-László Barabási, Linked (Perseus, Cambridge, 2002). Réka Albert and AL Barabási,Statistical Mechanics.
Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Regulatory networks 10/29/07. Definition of a module Module here has broader meanings than before. A functional module is a discrete entity whose function.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
Sequence and structure databanks can be divided into many different categories. One of the most important is Supervised databanks with gatekeeper. Examples:
Description of Group B Streptococcus Pan-genome Genome comparisons of 8 closely related GBS strains Tettelin, Fraser et al., PNAS 2005 Sep 27;102(39)
Mathematical Modelling of Phage Dynamics: Applications in STEC studies Tom Evans.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
The structure of the Internet. How are routers connected? Why should we care? –While communication protocols will work correctly on ANY topology –….they.
The structure of the Internet. The Internet as a graph Remember: the Internet is a collection of networks called autonomous systems (ASs) The Internet.
Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.
Subgoal: conduct an in-depth study of critical representation, operator and other choices used for evolutionary program repair at the source code level.
Measurement and Evolution of Online Social Networks Review of paper by Ophir Gaathon Analysis of Social Information Networks COMS , Spring 2011,
Peer-to-Peer and Social Networks Random Graphs. Random graphs E RDÖS -R ENYI MODEL One of several models … Presents a theory of how social webs are formed.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
The Erdös-Rényi models
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Computing For Biology An online course for A-level students Runs 18 th to 29 th August 2014 TCGATTCCAGAACTAGGCATTATAGATAGATTCAG ATAGGACATAGATCGATTCAGATAGGATATAATCG.
Nothing in (computational) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970) Power laws, scalefree networks, the.
Popularity versus Similarity in Growing Networks Fragiskos Papadopoulos Cyprus University of Technology M. Kitsak, M. Á. Serrano, M. Boguñá, and Dmitri.
The importance of enzymes and their occurrences: from the perspective of a network W.C. Liu 1, W.H. Lin 1, S.T. Yang 1, F. Jordan 2 and A.J. Davis 3, M.J.
Is the living cell simple or complex?
School of Information University of Michigan SI 614 Network subgraphs (motifs) Biological networks Lecture 11 Instructor: Lada Adamic.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
1. How does conjugation work? Sex in Bacteria How do bacteria exchange DNA.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
COLOR TEST COLOR TEST. Social Networks: Structure and Impact N ICOLE I MMORLICA, N ORTHWESTERN U.
Complex Networks First Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
Emergent Robustness in Software Systems through Decentralized Adaptation: an Ecologically-Inspired ALife Approach Franck Fleurey, Benoit Baudry, Benoit.
Shi Zhou University College London Second-order mixing in networks Shi Zhou University College London.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Analysis of biological networks Part III Shalev Itzkovitz Shalev Itzkovitz Uri Alon’s group Uri Alon’s group July 2005 July 2005.
Networks Igor Segota Statistical physics presentation.
I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.
Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Class 2: Graph Theory IST402. Can one walk across the seven bridges and never cross the same bridge twice? Network Science: Graph Theory THE BRIDGES OF.
Genome analysis. Genome – the sum of genes and intergenic sequences of a haploid cell.
Class 2: Graph Theory IST402.
Hierarchical Organization in Complex Networks by Ravasz and Barabasi İlhan Kaya Boğaziçi University.
Netlogo demo. Complexity and Networks Melanie Mitchell Portland State University and Santa Fe Institute.
The simultaneous evolution of author and paper networks
Chapter 5 Operating Systems.
Lecture 23: Structure of Networks
Topics In Social Computing (67810)
Welch RA, et al. Proc Natl Acad Sci U S A. 2002; 99:
Lecture 23: Structure of Networks
Building and Analyzing Genome-Wide Gene Disruption Networks
Peer-to-Peer and Social Networks Fall 2017
Lecture 23: Structure of Networks
Viruses.
Higher Biology Unit 1: 1.7 Evolution.
Network Models Michael Goodrich Some slides adapted from:
Presentation transcript:

Why bacteria run Linux while eukaryotes run Windows? Sergei Maslov Brookhaven National Laboratory New York

Physical vs. Biological Laws Physical Laws are often discovered by finding simple common explanation for very different phenomena Newton’s Law: Apples fall to the ground Planets revolve around the Sun Discovery of Biological Laws is slowed down by us having cookie-cutter explanation in terms of natural selection: 2

Drawing from Facebook group: Trust me, I'm a "Biologist"'

Genes encoded in bacterial genomes Packages installed on Linux computers 4 ~

Complex systems have many components Genes (Bacteria) Software packages (Linux OS) Components do not work alone: they need to be assembled to work In individual systems only a subset of components is installed Genome (Bacteria) – collection of genes Computer (Linux OS) – collection of software packages Components have vastly different frequencies of installation 5

Justin Pollard, 6 IKEA kits have many components

Justin Pollard, 7 They need to be assembled to work

Different frequencies of use vs CommonRare 8

What determines the frequency of installation/use of a gene/package? Popularity : AKA preferential attachment Frequency ~ self-amplifying popularity Relevant for social systems: WWW links, facebook friendships, scientific citations Functional role : Frequency ~ breadth or importance of the functional role Relevant for biological and technological systems where selection adjusts undeserved popularity 9

Empirical data on component frequencies Bacterial genomes (eggnog.embl.de): 500 sequenced prokaryotic genomes 44,000 Orthologous Gene families Linux packages (popcon.ubuntu.com): 200,000 Linux packages installed on 2,000,000 individual computers Binary tables: component is either present or not in a given system 10

Frequency distributions P(f)~ f -1.5 except the top √N “universal” components with f~1 11 Cloud Shell Core ORFans TY Pang, S. Maslov, PNAS (2013)

How to quantify functional importance? We want to check Frequency ~ Importance Usefulness=Importance ~ Component is needed for proper functioning of other components Dependency network A  B means A depends on B for its function Formalized for Linux software packages For metabolic enzymes given by upstream- downstream positions in pathways Frequency ~ dependency degree, K dep K dep = the total number of components that directly or indirectly depend on the selected one 12

13 TY Pang, S. Maslov, PNAS (2013)

Correlation coefficient ~0.4 for both Linux and genes Could be improved by using weighted dependency degree Frequency is positively correlated with functional importance 14 TY Pang, S. Maslov, PNAS (2013)

Warm-up: tree-like metabolic network 15 K dep =5 K dep =15 TCA cycle TY Pang, S. Maslov, PNAS (2013)

Dependency degree distribution on a critical branching tree P(K)~K -1.5 for a critical branching tree Paradox: K max -0.5 ~ 1/N  K max =N 2 >N Answer: parent tree size imposes a cutoff: there will be √N “core” nodes with K max =N present in almost all systems (ribosomal genes or core metabolic enzymes) Need a new model: in a tree D=1, while in real systems D~2>1 16

Bottom-down model of dependency network evolution Components added gradually over evolutionary time New component directly depends on D previously existing components selected randomly Versions: D is drawn from some distribution same as above Recent components are preferentially selected citations There is a fixed probability to connect to any previously existing components food webs 17

18 p(t,T) –probability that component added at time T directly or indirectly depends on one added at time t

19

K dep and K out degree distributions 20

K dep decreases layer number 21 Linux Model with D=2 TY Pang, S. Maslov, PNAS (2013)

Zipf plot for K dep distributions 22 Metabolic enzymes vs Model Linux vs Model TY Pang, S. Maslov, PNAS (2013)

Frequency distributions P(f)~ f -1.5 except the top √N “universal” components with f~1 23 Shell Core ORFans Cloud TY Pang, S. Maslov, PNAS (2013)

What experiments does P(f) help to interpret? 24

Pan-genome of E. coli strains M Touchon et al. PLoS Genetics (2009)

Metagenomes 26 The Human Microbiome Project Consortium, Nature (2012)

Pan-genome scaling 27

Pan-genome of all bacteria Slope=-0.4 predictions of the toolbox model (-0.5) P. Lapierre JP Gogarten TIG 2009 (# of genes in pan-genome) ~ (# of sequenced genomes) 0.5 (# of new genes added to pan-genome) ~ (# of sequenced genomes) -0.5  28

Bacterial genome evolution happens in cooperation with phages +=

Comparative genomics of E. coli implicates phages for BitTorrent Phage capacity: 20kb Other strains up to 40kb K-12 to B comparison 1kb: gene length

Phage-Bacteria Infection Network Data from Flores et al 2011 experiments by Moebus,Nattkemper,1981 WWW from AT&T website circa 1996 visualized by Mark Newman

Why eukaryotes run windows? Dependency network = reuse of components Bacteria do not keep redundant genes after HGT Linux developers rely on previous efforts Pros: smaller genomes, open source, economies of scale Cons: less specialized, potentially unstable, “dependency hell” Eukaryotes are like Windows or Mac OS X Keep redundant components Proprietary software 32

33 Figure adapted from S. Maslov, TY Pang, K. Sneppen, S. Krishna, PNAS (2009) # of genes # of pathways (or their regulators)

N selected packages ~ N installed packages 1.7 Software packages for Linux 34

35 Collaborators: Tin Yau Pang, Stony Brook University Support: Office of Biological and Environmental Research

Thank you!