Download presentation
Presentation is loading. Please wait.
Published byNickolas Eaton Modified over 9 years ago
1
Why bacteria run Linux while eukaryotes run Windows? Sergei Maslov Brookhaven National Laboratory New York
2
Physical vs. Biological Laws Physical Laws are often discovered by finding simple common explanation for very different phenomena Newton’s Law: Apples fall to the ground Planets revolve around the Sun Discovery of Biological Laws is slowed down by us having cookie-cutter explanation in terms of natural selection: 2
3
Drawing from Facebook group: Trust me, I'm a "Biologist"'
4
Genes encoded in bacterial genomes Packages installed on Linux computers 4 ~
5
Complex systems have many components Genes (Bacteria) Software packages (Linux OS) Components do not work alone: they need to be assembled to work In individual systems only a subset of components is installed Genome (Bacteria) – collection of genes Computer (Linux OS) – collection of software packages Components have vastly different frequencies of installation 5
6
Justin Pollard, http://www.designboom.com 6 IKEA kits have many components
7
Justin Pollard, http://www.designboom.com 7 They need to be assembled to work
8
Different frequencies of use vs CommonRare 8
9
What determines the frequency of installation/use of a gene/package? Popularity : AKA preferential attachment Frequency ~ self-amplifying popularity Relevant for social systems: WWW links, facebook friendships, scientific citations Functional role : Frequency ~ breadth or importance of the functional role Relevant for biological and technological systems where selection adjusts undeserved popularity 9
10
Empirical data on component frequencies Bacterial genomes (eggnog.embl.de): 500 sequenced prokaryotic genomes 44,000 Orthologous Gene families Linux packages (popcon.ubuntu.com): 200,000 Linux packages installed on 2,000,000 individual computers Binary tables: component is either present or not in a given system 10
11
Frequency distributions P(f)~ f -1.5 except the top √N “universal” components with f~1 11 Cloud Shell Core ORFans TY Pang, S. Maslov, PNAS (2013)
12
How to quantify functional importance? We want to check Frequency ~ Importance Usefulness=Importance ~ Component is needed for proper functioning of other components Dependency network A B means A depends on B for its function Formalized for Linux software packages For metabolic enzymes given by upstream- downstream positions in pathways Frequency ~ dependency degree, K dep K dep = the total number of components that directly or indirectly depend on the selected one 12
13
13 TY Pang, S. Maslov, PNAS (2013)
14
Correlation coefficient ~0.4 for both Linux and genes Could be improved by using weighted dependency degree Frequency is positively correlated with functional importance 14 TY Pang, S. Maslov, PNAS (2013)
15
Warm-up: tree-like metabolic network 15 K dep =5 K dep =15 TCA cycle TY Pang, S. Maslov, PNAS (2013)
16
Dependency degree distribution on a critical branching tree P(K)~K -1.5 for a critical branching tree Paradox: K max -0.5 ~ 1/N K max =N 2 >N Answer: parent tree size imposes a cutoff: there will be √N “core” nodes with K max =N present in almost all systems (ribosomal genes or core metabolic enzymes) Need a new model: in a tree D=1, while in real systems D~2>1 16
17
Bottom-down model of dependency network evolution Components added gradually over evolutionary time New component directly depends on D previously existing components selected randomly Versions: D is drawn from some distribution same as above Recent components are preferentially selected citations There is a fixed probability to connect to any previously existing components food webs 17
18
18 p(t,T) –probability that component added at time T directly or indirectly depends on one added at time t
19
19
20
K dep and K out degree distributions 20
21
K dep decreases layer number 21 Linux Model with D=2 TY Pang, S. Maslov, PNAS (2013)
22
Zipf plot for K dep distributions 22 Metabolic enzymes vs Model Linux vs Model TY Pang, S. Maslov, PNAS (2013)
23
Frequency distributions P(f)~ f -1.5 except the top √N “universal” components with f~1 23 Shell Core ORFans Cloud TY Pang, S. Maslov, PNAS (2013)
24
What experiments does P(f) help to interpret? 24
25
Pan-genome of E. coli strains M Touchon et al. PLoS Genetics (2009)
26
Metagenomes 26 The Human Microbiome Project Consortium, Nature (2012)
27
Pan-genome scaling 27
28
Pan-genome of all bacteria Slope=-0.4 predictions of the toolbox model (-0.5) P. Lapierre JP Gogarten TIG 2009 (# of genes in pan-genome) ~ (# of sequenced genomes) 0.5 (# of new genes added to pan-genome) ~ (# of sequenced genomes) -0.5 28
29
Bacterial genome evolution happens in cooperation with phages +=
30
Comparative genomics of E. coli implicates phages for BitTorrent Phage capacity: 20kb Other strains up to 40kb K-12 to B comparison 1kb: gene length
31
Phage-Bacteria Infection Network Data from Flores et al 2011 experiments by Moebus,Nattkemper,1981 WWW from AT&T website circa 1996 visualized by Mark Newman
32
Why eukaryotes run windows? Dependency network = reuse of components Bacteria do not keep redundant genes after HGT Linux developers rely on previous efforts Pros: smaller genomes, open source, economies of scale Cons: less specialized, potentially unstable, “dependency hell” Eukaryotes are like Windows or Mac OS X Keep redundant components Proprietary software 32
33
33 Figure adapted from S. Maslov, TY Pang, K. Sneppen, S. Krishna, PNAS (2009) # of genes # of pathways (or their regulators)
34
N selected packages ~ N installed packages 1.7 Software packages for Linux 34
35
35 Collaborators: Tin Yau Pang, Stony Brook University Support: Office of Biological and Environmental Research
36
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.