Download presentation
Presentation is loading. Please wait.
Published byHubert Stokes Modified over 9 years ago
2
Nothing in (computational) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970) Power laws, scalefree networks, the structure of the Protein Universe and genome evolution
4
The Protein Universe Total number of potential protein sequences - ~20 200 Total number of existing protein sequences: 10 10 -10 11 GenBank2002: ~10 6 What is the distribution of these sequences in the sequence and structure spaces?
11
The distribution of folds by the number of families in the protein structural database (PDB).
12
There are many folds with 1-3 families but only a few folds with numerous families Altogether, there might be as many as 5,000-10,000 folds but >90% of the families belong to <1,000 common folds Mapping the Protein Universe is feasible!
13
Thermotoga maritima C. elegans Size distributions of domain families in two genomes - 2-log plot
14
The size distributions of folds and families are approximated by a power law: f(i) ~ i -k (k ~1-3) Power laws describe distributions of a number of quantities in biological and other contexts, e.g., the node degrees (number of connections) in metabolic and protein interactions networks, the Internet and social networks, citations of scientific papers, population of cities, personal wealth… Networks described by power laws are known as scale-free - they look the same at different scales. The existence of a small number of highly connected nodes (hubs) in scalefree networks determines their small-world properties and error tolerance
22
Scale-free networks evolve through preferential attachment: the rich get richer or the fit get fitter
23
C1 C2C2 Zk C1 C2C2 Zk C3 C1 C2C2 Zk C3Ub C1 C2C2 Zk C3 Br Yeasts C. elegans A. thaliana D. melanogaster Br Domain accretion in the evolution of orthologous sets of eukaryotic genes
24
Distribution of proteins by the number of domains follows is exponential! (if repeats do not count) However, we get a power law if repeats are included
25
Domain connectivity network
26
The domain connectivity graph is roughly approximated by a power law
27
Evolution of protein domain families in genomes can be described by simple models which involve domain birth, death and innovation (“invention”) as elementary events
29
BDIM: elementary events Death Innovation BDIM – Birth, Death and Innovation Model Birth
30
BDIM: the layout of the model 1 22 2 33 3 44 i-1 ii i i+1 N-1 NN … … 11 1d11d1 2d22d2 3d33d3 idiidi NdNNdN domain family size class number of families in a size class maximum number of domains in a family per-family birth rate per-family death rate innovation rate
31
BDIM: the basic equations df i (t)/dt = i-1 f i-1 - i f i - i f i + i+1 f i+1 rate of change for d i Gain: birth in class i-1 Loss: birth in class i Loss: death in class i Gain: death in class i+1 … df N (t)/dt = N-1 f N-1 - N f N df 1 (t)/dt = - 1 f 1 - 1 f 1 + 2 f 2 … innovation (instead of "class 0" birth) no birth into and death from class N+1 F(t) = f i (t) - the total number of families
32
Power Approximation vs Power Asymptote under the linear BDIM asymptote (k = a-b-1) approximation Linear BDIM
33
Linear BDIM: Size Does Matter? i /i = (1+a 1 /i) per domain birth rate i /i = (1+b 1 /i) per domain death rate i /i i/ii/i i Family size
34
Conclusions I.The world, including biology, is full of power law distributions and scalefree networks I.The emergence of these seems to be explained by relatively simple evolutionary models
35
“ There are two kinds of science: physics and stamp collection” Attributed to Ernest Rutherford Genomics today Tomorrow??
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.