1 Dan Graur Rates of Nucleotide Substitution
2 r = Rate of substitution per site per year K = Number of substitutions per site per year
3 Less than substitutions/site/year Mean Rate of Nucleotide Substitution in Mammalian Nuclear Genomes 3 Evolution is a very slow process at the molecular level. Not much happens in evolution.
4 synonymous larger nonsynonymous The rate of synonymous substitution is much larger than the nonsynonymous rate. Substitutions Rates in Protein-Coding Regions
5
6 A lot A little
7 Synonymous nonsynonymous Synonymous substitutions are more frequent than nonsynonymous ones.
8 Mean nonsynonymous rate = 0.75 10 –9 substitutions per site per year Mean synonymous rate = 3.65 10 –9 substitutions per site per year Coefficient of variation of nonsynonymous rate = 95% Coefficient of variation of synonymous rate = 31% The synonymous substitution rate is 5 times higher than the nonsynonymous substitution rate
9 The distribution of K A to K S ratios in >13,000 orthologous protein-coding genes from human and chimpanzee
10 58 nucleotide differences 3 amino acid differences In a comparison of human and yeast ubiquitin genes, the inferred number of synonymous substitutions per synonymous site is ~6 (almost certainly indicative of saturation). The inferred number of nonsynonymous substitutions per nonsynonymous site is Thus, synonymous substitutions have accumulated at least 200 times faster than nonsynonymous substitutions.
11 Ratio
12 Substitution Rates of in Noncoding Regions
13
14 Divergence between cow and goat - and -globin genes and between cow and goat -globin pseudogenes ______________________________________________ RegionK ______________________________________________ 5’ Flanking region 5.3 1.2 5’ Untranslated region 4.0 fold degenerate sites8.6 2.5 Introns8.1 0.7 3’ Untranslated region8.8 2.2 3’ Flanking region 8.0 1.5 Pseudogenes9.1 0.9 ______________________________________________ 5’ 3’
15
16 Coding regions slower noncodingregions Coding regions evolve slower than noncoding regions.
17 Evolutionary Rate Profiles
18 Xenopus MALWMQCLP-LVLVLLFSTPNTEALANQHL Bos MALWTRLRPLLALLALWPPPPARAFVNQHL **** : * *.*: *:..* :. *:**** Xenopus CGSHLVEALYLVCGDRGFFYYPKIKRDIEQ Bos CGSHLVEALYLVCGERGFFYTPKARREVEG ***************:***** ** :*::* Xenopus AQVNGPQDNELDG-MQFQPQEYQKMKRGIV Bos PQVG---ALELAGGPGAGGLEGPPQKRGIV.**. ** * * ***** Xenopus EQCCHSTCSLFQLENYCN Bos EQCCASVCSLYQLENYCN **** *.***:******* Alignment preproinsulin
19
20 Functional regions slower nonfunctional regions Functional regions evolve slower than nonfunctional regions.
21
22
23 Rates of amino acid replacement in different proteins
Fibrinogen to Fibrin Fibrinogen consists of 6 chains: 2 , 2 , 2 Fibrinopeptides are very negatively charged Fibrinopeptides A are cleaved first (to allow polymerization of fibrins) Fibrinopeptides B are cleaved second (to enhance crosslinking)
25
26 Important proteins slower unimportantones Important proteins evolve slower than unimportant ones.
27
29 Can we explain the different rates of substitution by the selectionist model? deleterious advantageous 1.Mutations can be either deleterious or advantageous. fractionadvantageouslarge ratehighfraction advantageoussmallrate low 2.If the fraction of advantageous mutations is large, the rate of evolution will be high. If the fraction of advantageous mutations is small, the rate of evolution will be low. functional site nonfunctional site 3.A mutation occurring at a functional site has a higher probability of being advantageous than a mutation occurring at a nonfunctional site. Expectation: Important entities less important ones Expectation: Important entities should evolve faster than less important ones.
30 Can we explain the different rates of substitution by the selectionist model? deleterious advantageous 1.Mutations can be either deleterious or advantageous. fractionadvantageouslarge ratehighfraction advantageoussmallrate low 2.If the fraction of advantageous mutations is large, the rate of evolution will be high. If the fraction of advantageous mutations is small, the rate of evolution will be low. functional site nonfunctional site 3.A mutation occurring at a functional site has a higher probability of being advantageous than a mutation occurring at a nonfunctional site. Expectation: Important entities less important ones Expectation: Important entities should evolve faster than less important ones.
31 Can we explain the different rates of substitution by the neutralist model? deleteriousneutral 1.Mutations can be either deleterious or neutral. fractiondeleteriouslarge ratelowfraction deleterioussmallrate high 2.If the fraction of deleterious mutations is large, the rate of evolution will be low. If the fraction of deleterious mutations is small, the rate of evolution will be high. functional site nonfunctional site 3.A mutation occurring at a functional site has a higher probability of being deleterious than a mutation occurring at a nonfunctional site. Expectation: Important entities less important ones Expectation: Important entities should evolve slower than less important ones.
32 Can we explain the different rates of substitution by the neutralist model? deleteriousneutral 1.Mutations can be either deleterious or neutral. fractiondeleteriouslarge ratelowfraction deleterioussmallrate high 2.If the fraction of deleterious mutations is large, the rate of evolution will be low. If the fraction of deleterious mutations is small, the rate of evolution will be high. functional site nonfunctional site 3.A mutation occurring at a functional site has a higher probability of being deleterious than a mutation occurring at a nonfunctional site. Expectation: Important entities less important ones Expectation: Important entities should evolve slower than less important ones.
33 Kimura’s First Law of Molecular Evolution
34 Functional entities evolve slower than entities devoid of function.
35 Functional constraint = Degree of intolerance towards mutations at a genomic location. The functional constraint defines the range of alternative residues that are acceptable at a site without affecting negatively the fitness of the organism.
36 K = v For neutral mutations: Rate of substitutionMutation rate
37 Kimura’s model of functional constraint Suppose that a fraction, f 0, of all mutations are selectively neutral and the rest (1 − f 0 ) are deleterious. Advantageous mutations are assumed to occur only very rarely, such that their relative frequency is effectively zero. If we denote by v T the total mutation rate per unit time, then the rate of neutral mutation, v 0, is
38 According to the neutral theory, the rate of substitution is: Hence, The highest substitution rate is expected in sequences that do not have any function, such that all mutations are neutral
39
40 An evolutionary experiment Spalax ehrenberghi
41 A-crystallin
42 In Spalax, A-crystallin lost its functional role more than 25 million years ago, when the mole rat became subterranean and presumably lost use of its eyes. The A-crystallin of Spalax evolves 20 times faster than the A-crystallins in other rodents, such as rats, mice, hamsters, gerbils and squirrel.
43 Additional Facts: (1) The A-crystallin of Spalax possess all the prerequisites for normal function and expression, including the proper signals for alternative splicing. (2) The A-crystallin of Spalax evolves slower than pseudogenes.
44 Explanation 1: The A-crystallin gene may not have lost all of its vision-related functions, such as photoperiod perception and adaptation to seasonal changes. Contradicting evidence: The atrophied eye of Spalax does not respond to light.
45 Explanation 2: The blind mole rat lost its vision more recently than 25 million years ago. The rate of nonsynonymous substitution after nonfunctionalization has been underestimated. Contradicting evidence: The A-crystallin gene is still an intact gene as far as the essential molecular structures for its expression are concerned.
46 Explanation 3: The A-crystallin-gene product serves another function (unrelated to that of the eye). A- crystallin is a multifunctional protein Supporting evidence: A crystallin has been found in other tissues. A crystallin also functions as a chaperonin that binds denaturing proteins and prevents their aggregation. 3.The regions within A crystallin responsible for chaperonin activity are conserved in the mole rat. 4.The protein has viable secondary and quarternary structures as well as normal thermostability.
47 Genetic nonfunctionalization or partial nonfunctionalization accelerates evolution. Most evolutionary “action” occurs after death.
The Concept of Functional Constraint The intensity of purifying selection is determined by the degree of intolerance characteristic of a site or a genomic region towards mutations. The functional or selective constraint defines the range of alternative nucleotides that is acceptable at a site without affecting negatively the function or structure of the gene or the gene product. DNA regions, in which a mutation is likely to affect function, have a more stringent functional constraint than regions devoid of function
The stronger the functional constraints on a macromolecule are, the slower its rate of substitution will be.
Functional density (Zuckerkandl 1976) The functional density, F, of a gene is defined as n s /N, where n s is the number of sites committed to specific functions and N is the total number of sites. F, therefore, is the proportion of amino acids that are subject to stringent functional constraints.
Functional density (Zuckerkandl 1976) The higher the functional density, the lower the rate of substitution is expected to be. Thus, a protein in which the active sites constitute only 1% of its sequence will be less constrained, and therefore will evolve more quickly than a protein that devotes 50% of its sequence to performing specific biochemical or physiological tasks.
According to the neutral theory of evolution, the rate of substitution (as inferred from between-species comparisons) should positively correlate with the degree of genetic polymorphism (as inferred from comparisons among individuals within one species). An interesting corollary of this hypothesis is that we should observe very little or no variation at the population level at evolutionary conserved positions. The variation observed at conserved positions should be mostly deleterious (i.e., associated with disease).
Gaucher disease is an autosomal recessive lysosomal storage disorder due to deficient activity of an enzyme called acid -glucosidase. There are many subtypes of Gaucher disease with fitness effects ranging from slight reduction in fitness to perinatally lethal, in which death occurs during the period between 154 days of gestation to seven days after birth. Substitution rates and disease: The case of Gaucher disease
We aligned the amino acid sequences of acid -glucosidase from nine placental mammals (human, chimpanzee, Sumatran orangutan, bovine, pig, dog, horse, rat, and mouse). The length of the alignment (excluding one gap due to a codon deletion in the ancestor of mouse and rat) was 496 amino-acids, of which 387 (78%) were identical in all nine species and 109 (22%) were variable.. -glucosidase
Thirty-six single amino-acid replacements (at 34 amino-acid positions) resulting in Gaucher disease are described in the literature. Perinatal lethal mutations are shown in red.
All 36 deleterious mutations occur at completely conserved sites (below asterisks). The expectation under a random model is that only 36 × 0.78 = 28 mutations should occur at completely conserved sites. This statistically significant non-random association between disease and evolutionary conservation (p = ) indicates that invariable sites are conserved because they evolve under extremely stringent functional constraints and cannot tolerate change.
Q: What determines functional constraint? A: Many factors. Q: Example? A: Interactions.
A network (or graph) is an abstract representation of a set of objects, where some objects are connected to one another. The objects are represented by vertices (or nodes), and the links that connect the vertices are called edges (or branches). Edges can be polarized
Edges can be polarized to indicate directionality and type of interaction (e.g., activation, inhibition). Edges can also be quantified to denoted extent of effect.
Protein-protein interaction networks (a) A simple example of a protein-protein interaction network consisting of five proteins (A-E), represented by the nodes, each of which interacts with at least one other protein. There are five interactions, denoted by the links. In biological networks, three variables are usually studied: (b) degree centrality or connectedness = the number of interactions for a protein. (c) betweenness centrality = the number of times that a node appears on the shortest path between all pairs of nodes. (d) closeness centrality = the mean number of links connecting a protein to all other proteins in the network.
Proteins with high connectedness evolve slowly. Proteins with low connectedness evolve fast. Proteins with high betweenness evolve slowly. Proteins with low betweenness evolve fast. Proteins with high closeness evolve slowly. Proteins with low closeness evolve fast.
Why do the rates of synonymous substitution vary from gene to gene? (1) The variation represents stochastic fluctuations. (2) The variation is due to deterministic factors on top of stochastic fluctuations. (2.1) Variation in the rate of mutation among different regions of the genome. (2.2) Selection operating on synonymous mutations.
Fact: There is a positive correlation between synonymous and nonsynonymous substitution rates in a gene. Explanations: (1)The rate of mutation varies along the genome and among genes (and hence some genes will have both high synonymous and nonsynonymous rates of substitution) (2) The extent of selection at synonymous sites is affected by the nucleotide composition at adjacent nonsynonymous positions. (3) (1) and (2).
In the absence of positive Darwinian selection, the universal observation is that important sequences tend to evolve slower than less important ones. The opposite, however, is not always true. That is, conserved regions in the genome may not always be important. Defining “importance” is not a trivial undertaking.
Hurst and Smith (1999) tested the relationship between rate of substitution and dispensability (a proxy for importance). Approximately two thirds of all knockouts of individual mouse genes give rise to viable fertile mice. These genes have been termed “non-essential,” in contrast to “essential” genes, the knockouts of which result in death or infertility. It is predicted that non-essential genes will subject to lesser intensities of purifying selection, and should therefore evolve faster than essential genes.
In a comparison of 74 non-essential genes with 64 essential ones, the rate of substitution was found not to correlate with the severity of the knockout phenotype. To account for differences in function, Hurst and Smith (1999) restricted their analysis exclusively to neuron-specific genes, which have significantly lower rates of substitution than other genes. They could find no difference in the rate of substitution between 16 essential neuron-specific genes and 18 non-essential ones.
The functional role (if any) of ~98% of mammalian genomes remains undetermined. Nóbrega et al. (2004) deleted ~2 Mb-long sequences from the mouse genome, a 1,817,000 region mapping to mouse chromosome 3 and a 983,000 region mapping to chromosome 19. (Orthologous regions of about the same size are present on human chromosomes 1 and 10, respectively.) Viable mice homozygous for the deletions were generated and were indistinguishable from wild-type littermates with regard to morphology, reproductive fitness, growth, longevity, and general homeostasis. Further analysis of the expression of multiple genes bracketing the deletions revealed only minor expression differences between homozygous-deletion mice and wild-type mice.
The two deleted segments harbor 1,243 non-coding sequences conserved between humans and rodents (more than 100 base pairs, 70% identity). Yet, the deletion of so many sequences that have been conserved for such long period of time (mouse-human divergence ≈ 100 million years) resulted in no reduction in fitness. Conclusion I: There are potentially ‘disposable DNA’ in the genomes of mammals. Conclusion II: Sequence conservation may not necessarily indicate constraint.
Ahituv et al. (2007) removed from the mouse genome four ultraconserved elements— sequences of 200 base pairs or longer that are 100% identical among human, mouse, and rat.
Remarkably, lines of mice homozygous for the four deletions were viable and fertile, and failed to reveal any developmental or phenotypic abnormalities.
These results indicate that extreme sequence conservation may not necessarily reflect extreme evolutionary constraint. There must be forces other than selection that promote sequence conservation.