1 Dan Graur Rates of Nucleotide Substitution. 2 r = Rate of substitution per site per year K = Number of substitutions per site per year.

Slides:



Advertisements
Similar presentations
Genetica per Scienze Naturali a.a prof S. Presciuttini Homologous genes Genes with similar functions can be found in a diverse range of living things.
Advertisements

CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
The Concept of Functional Constraint. The intensity of purifying selection is determined by the degree of intolerance characteristic of a site or a genomic.
R ATES OF P OINT M UTATION. The rate of mutation = the number of new sequence variants arising in a predefined target region per unit time. Target region.
Chapter 19 Evolutionary Genetics 18 and 20 April, 2004
Unit 7: Evolution.
Random fixation and loss of heterozygosity
Number of mitoses in females = 22 Number of mitoses in males is age dependent.
1 … and what about positive Darwinian selection?.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
14 Molecular Evolution and Population Genetics
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
BIOE 109 Summer 2009 Lecture 6- Part II Molecular evolution.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Molecular Clocks, Base Substitutions, & Phylogenetic Distances.
1 Functional prediction in proteins (purifying and positive selection)
Genetica per Scienze Naturali a.a prof S. Presciuttini Mutation Rates Ultimately, the source of genetic variation observed among individuals in.
Adaptive Molecular Evolution Nonsynonymous vs Synonymous.
Lecture 12 Splicing and gene prediction in eukaryotes
1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Mutations. The picture shows a human genome Karyotype. Look at it carefully and discuss.
Topics covered Overview of similarities between the genetic makeup of humans and chimpanzees. Comparison of brain and speech genes between humans and.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
1 Patterns of Substitution and Replacement. 2 3.
Chapter 3 Substitution Patterns Presented by: Adrian Padilla.
Molecular phylogenetics
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
Origins and impact of constraints in evolution of gene families Boris E. Shakhnovich and Eugene V.Koonin Genome Research 2006, October 19 Stella Veretnik.
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
Molecular Clock. Rate of evolution of DNA is constant over time and across lineages Resolve history of species –Timing of events –Relationship of species.
Genes Within Populations
Evidence for Evolution ORGANIZE YOUR THOUGHTS! EVIDENCE FOR EVOLUTION  The Fossil Record  Radiometric Dating  Morphology  Homology  Molecular Biology.
Models of Molecular Evolution I Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.1 – 7.2.
The Molecular Clock? By: T. Michael Dodson. Hypothesis For any given macromolecule (a protein or DNA sequence) the rate of evolution is approximately.
The Biology and Genetic Base of Cancer. 2 (Mutation)
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
1 Evolutionary Change in Nucleotide Sequences Dan Graur.
Genetics and Speciation
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Introduction to History of Life. Biological evolution consists of change in the hereditary characteristics of groups of organisms over the course of generations.
Using blast to study gene evolution – an example.
Cédric Notredame (08/12/2015) Molecular Evolution Cédric Notredame.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Selectionist view: allele substitution and polymorphism
Evolution of Populations. The Smallest Unit of Evolution Natural selection acts on individuals, but only populations evolve – Genetic variations contribute.
Chapter 3 The Interrupted Gene.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
NEW TOPIC: MOLECULAR EVOLUTION.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Please feel free to chat amongst yourselves until we begin at the top of the hour. 1.
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
1 What forces constrain/drive protein evolution? Looking at all coding sequences across multiple genomes can shed considerable light on which forces contribute.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Gene structure and function
Evolution of gene function
Genetics and Evolutionary Biology
Causes of Variation in Substitution Rates
The neutral theory of molecular evolution
Distances.
What makes a mutant?.
What are the Patterns Of Nucleotide Substitution Within Coding and
Gene Density and Noncoding DNA
The Evolution of Populations
Coral Reef Conservation
Presentation transcript:

1 Dan Graur Rates of Nucleotide Substitution

2 r = Rate of substitution per site per year K = Number of substitutions per site per year

3 Less than substitutions/site/year Mean Rate of Nucleotide Substitution in Mammalian Nuclear Genomes 3 Evolution is a very slow process at the molecular level. Not much happens in evolution.

4 synonymous larger nonsynonymous The rate of synonymous substitution is much larger than the nonsynonymous rate. Substitutions Rates in Protein-Coding Regions

5

6 A lot A little

7 Synonymous nonsynonymous Synonymous substitutions are more frequent than nonsynonymous ones.

8 Mean nonsynonymous rate = 0.75  10 –9 substitutions per site per year Mean synonymous rate = 3.65  10 –9 substitutions per site per year Coefficient of variation of nonsynonymous rate = 95% Coefficient of variation of synonymous rate = 31% The synonymous substitution rate is 5 times higher than the nonsynonymous substitution rate

9 The distribution of K A to K S ratios in >13,000 orthologous protein-coding genes from human and chimpanzee

10 58 nucleotide differences 3 amino acid differences In a comparison of human and yeast ubiquitin genes, the inferred number of synonymous substitutions per synonymous site is ~6 (almost certainly indicative of saturation). The inferred number of nonsynonymous substitutions per nonsynonymous site is Thus, synonymous substitutions have accumulated at least 200 times faster than nonsynonymous substitutions.

11 Ratio

12 Substitution Rates of in Noncoding Regions

13

14 Divergence between cow and goat  - and  -globin genes and between cow and goat  -globin pseudogenes ______________________________________________ RegionK ______________________________________________ 5’ Flanking region 5.3  1.2 5’ Untranslated region 4.0  fold degenerate sites8.6  2.5 Introns8.1  0.7 3’ Untranslated region8.8  2.2 3’ Flanking region 8.0  1.5 Pseudogenes9.1  0.9 ______________________________________________ 5’ 3’

15

16 Coding regions slower noncodingregions Coding regions evolve slower than noncoding regions.

17 Evolutionary Rate Profiles

18 Xenopus MALWMQCLP-LVLVLLFSTPNTEALANQHL Bos MALWTRLRPLLALLALWPPPPARAFVNQHL **** : * *.*: *:..* :. *:**** Xenopus CGSHLVEALYLVCGDRGFFYYPKIKRDIEQ Bos CGSHLVEALYLVCGERGFFYTPKARREVEG ***************:***** ** :*::* Xenopus AQVNGPQDNELDG-MQFQPQEYQKMKRGIV Bos PQVG---ALELAGGPGAGGLEGPPQKRGIV.**. ** * * ***** Xenopus EQCCHSTCSLFQLENYCN Bos EQCCASVCSLYQLENYCN **** *.***:******* Alignment preproinsulin

19

20 Functional regions slower nonfunctional regions Functional regions evolve slower than nonfunctional regions.

21

22

23 Rates of amino acid replacement in different proteins

Fibrinogen to Fibrin Fibrinogen consists of 6 chains: 2 , 2 , 2  Fibrinopeptides are very negatively charged Fibrinopeptides A are cleaved first (to allow polymerization of fibrins) Fibrinopeptides B are cleaved second (to enhance crosslinking)

25

26 Important proteins slower unimportantones Important proteins evolve slower than unimportant ones.

27

29 Can we explain the different rates of substitution by the selectionist model? deleterious advantageous 1.Mutations can be either deleterious or advantageous. fractionadvantageouslarge ratehighfraction advantageoussmallrate low 2.If the fraction of advantageous mutations is large, the rate of evolution will be high. If the fraction of advantageous mutations is small, the rate of evolution will be low. functional site nonfunctional site 3.A mutation occurring at a functional site has a higher probability of being advantageous than a mutation occurring at a nonfunctional site. Expectation: Important entities less important ones Expectation: Important entities should evolve faster than less important ones.

30 Can we explain the different rates of substitution by the selectionist model? deleterious advantageous 1.Mutations can be either deleterious or advantageous. fractionadvantageouslarge ratehighfraction advantageoussmallrate low 2.If the fraction of advantageous mutations is large, the rate of evolution will be high. If the fraction of advantageous mutations is small, the rate of evolution will be low. functional site nonfunctional site 3.A mutation occurring at a functional site has a higher probability of being advantageous than a mutation occurring at a nonfunctional site. Expectation: Important entities less important ones Expectation: Important entities should evolve faster than less important ones.

31 Can we explain the different rates of substitution by the neutralist model? deleteriousneutral 1.Mutations can be either deleterious or neutral. fractiondeleteriouslarge ratelowfraction deleterioussmallrate high 2.If the fraction of deleterious mutations is large, the rate of evolution will be low. If the fraction of deleterious mutations is small, the rate of evolution will be high. functional site nonfunctional site 3.A mutation occurring at a functional site has a higher probability of being deleterious than a mutation occurring at a nonfunctional site. Expectation: Important entities less important ones Expectation: Important entities should evolve slower than less important ones.

32 Can we explain the different rates of substitution by the neutralist model? deleteriousneutral 1.Mutations can be either deleterious or neutral. fractiondeleteriouslarge ratelowfraction deleterioussmallrate high 2.If the fraction of deleterious mutations is large, the rate of evolution will be low. If the fraction of deleterious mutations is small, the rate of evolution will be high. functional site nonfunctional site 3.A mutation occurring at a functional site has a higher probability of being deleterious than a mutation occurring at a nonfunctional site. Expectation: Important entities less important ones Expectation: Important entities should evolve slower than less important ones.

33 Kimura’s First Law of Molecular Evolution

34 Functional entities evolve slower than entities devoid of function.

35 Functional constraint = Degree of intolerance towards mutations at a genomic location. The functional constraint defines the range of alternative residues that are acceptable at a site without affecting negatively the fitness of the organism.

36 K = v For neutral mutations: Rate of substitutionMutation rate

37 Kimura’s model of functional constraint Suppose that a fraction, f 0, of all mutations are selectively neutral and the rest (1 − f 0 ) are deleterious. Advantageous mutations are assumed to occur only very rarely, such that their relative frequency is effectively zero. If we denote by v T the total mutation rate per unit time, then the rate of neutral mutation, v 0, is

38 According to the neutral theory, the rate of substitution is: Hence, The highest substitution rate is expected in sequences that do not have any function, such that all mutations are neutral

39

40 An evolutionary experiment Spalax ehrenberghi

41  A-crystallin

42 In Spalax,  A-crystallin lost its functional role more than 25 million years ago, when the mole rat became subterranean and presumably lost use of its eyes. The  A-crystallin of Spalax evolves 20 times faster than the  A-crystallins in other rodents, such as rats, mice, hamsters, gerbils and squirrel.

43 Additional Facts: (1) The  A-crystallin of Spalax possess all the prerequisites for normal function and expression, including the proper signals for alternative splicing. (2) The  A-crystallin of Spalax evolves slower than pseudogenes.

44 Explanation 1: The  A-crystallin gene may not have lost all of its vision-related functions, such as photoperiod perception and adaptation to seasonal changes. Contradicting evidence: The atrophied eye of Spalax does not respond to light.

45 Explanation 2: The blind mole rat lost its vision more recently than 25 million years ago. The rate of nonsynonymous substitution after nonfunctionalization has been underestimated. Contradicting evidence: The  A-crystallin gene is still an intact gene as far as the essential molecular structures for its expression are concerned.

46 Explanation 3: The  A-crystallin-gene product serves another function (unrelated to that of the eye).  A- crystallin is a multifunctional protein Supporting evidence:  A crystallin has been found in other tissues.  A crystallin also functions as a chaperonin that binds denaturing proteins and prevents their aggregation. 3.The regions within  A crystallin responsible for chaperonin activity are conserved in the mole rat. 4.The protein has viable secondary and quarternary structures as well as normal thermostability.

47 Genetic nonfunctionalization or partial nonfunctionalization accelerates evolution. Most evolutionary “action” occurs after death.

The Concept of Functional Constraint The intensity of purifying selection is determined by the degree of intolerance characteristic of a site or a genomic region towards mutations. The functional or selective constraint defines the range of alternative nucleotides that is acceptable at a site without affecting negatively the function or structure of the gene or the gene product. DNA regions, in which a mutation is likely to affect function, have a more stringent functional constraint than regions devoid of function

The stronger the functional constraints on a macromolecule are, the slower its rate of substitution will be.

Functional density (Zuckerkandl 1976) The functional density, F, of a gene is defined as n s /N, where n s is the number of sites committed to specific functions and N is the total number of sites. F, therefore, is the proportion of amino acids that are subject to stringent functional constraints.

Functional density (Zuckerkandl 1976) The higher the functional density, the lower the rate of substitution is expected to be. Thus, a protein in which the active sites constitute only 1% of its sequence will be less constrained, and therefore will evolve more quickly than a protein that devotes 50% of its sequence to performing specific biochemical or physiological tasks.

According to the neutral theory of evolution, the rate of substitution (as inferred from between-species comparisons) should positively correlate with the degree of genetic polymorphism (as inferred from comparisons among individuals within one species). An interesting corollary of this hypothesis is that we should observe very little or no variation at the population level at evolutionary conserved positions. The variation observed at conserved positions should be mostly deleterious (i.e., associated with disease).

Gaucher disease is an autosomal recessive lysosomal storage disorder due to deficient activity of an enzyme called acid  -glucosidase. There are many subtypes of Gaucher disease with fitness effects ranging from slight reduction in fitness to perinatally lethal, in which death occurs during the period between 154 days of gestation to seven days after birth. Substitution rates and disease: The case of Gaucher disease

We aligned the amino acid sequences of acid  -glucosidase from nine placental mammals (human, chimpanzee, Sumatran orangutan, bovine, pig, dog, horse, rat, and mouse). The length of the alignment (excluding one gap due to a codon deletion in the ancestor of mouse and rat) was 496 amino-acids, of which 387 (78%) were identical in all nine species and 109 (22%) were variable..  -glucosidase

Thirty-six single amino-acid replacements (at 34 amino-acid positions) resulting in Gaucher disease are described in the literature. Perinatal lethal mutations are shown in red.

All 36 deleterious mutations occur at completely conserved sites (below asterisks). The expectation under a random model is that only 36 × 0.78 = 28 mutations should occur at completely conserved sites. This statistically significant non-random association between disease and evolutionary conservation (p = ) indicates that invariable sites are conserved because they evolve under extremely stringent functional constraints and cannot tolerate change.

Q: What determines functional constraint? A: Many factors. Q: Example? A: Interactions.

A network (or graph) is an abstract representation of a set of objects, where some objects are connected to one another. The objects are represented by vertices (or nodes), and the links that connect the vertices are called edges (or branches). Edges can be polarized

Edges can be polarized to indicate directionality and type of interaction (e.g., activation, inhibition). Edges can also be quantified to denoted extent of effect.

Protein-protein interaction networks (a) A simple example of a protein-protein interaction network consisting of five proteins (A-E), represented by the nodes, each of which interacts with at least one other protein. There are five interactions, denoted by the links. In biological networks, three variables are usually studied: (b) degree centrality or connectedness = the number of interactions for a protein. (c) betweenness centrality = the number of times that a node appears on the shortest path between all pairs of nodes. (d) closeness centrality = the mean number of links connecting a protein to all other proteins in the network.

Proteins with high connectedness evolve slowly. Proteins with low connectedness evolve fast. Proteins with high betweenness evolve slowly. Proteins with low betweenness evolve fast. Proteins with high closeness evolve slowly. Proteins with low closeness evolve fast.

Why do the rates of synonymous substitution vary from gene to gene? (1) The variation represents stochastic fluctuations. (2) The variation is due to deterministic factors on top of stochastic fluctuations. (2.1) Variation in the rate of mutation among different regions of the genome. (2.2) Selection operating on synonymous mutations.

Fact: There is a positive correlation between synonymous and nonsynonymous substitution rates in a gene. Explanations: (1)The rate of mutation varies along the genome and among genes (and hence some genes will have both high synonymous and nonsynonymous rates of substitution) (2) The extent of selection at synonymous sites is affected by the nucleotide composition at adjacent nonsynonymous positions. (3) (1) and (2).

In the absence of positive Darwinian selection, the universal observation is that important sequences tend to evolve slower than less important ones. The opposite, however, is not always true. That is, conserved regions in the genome may not always be important. Defining “importance” is not a trivial undertaking.

Hurst and Smith (1999) tested the relationship between rate of substitution and dispensability (a proxy for importance). Approximately two thirds of all knockouts of individual mouse genes give rise to viable fertile mice. These genes have been termed “non-essential,” in contrast to “essential” genes, the knockouts of which result in death or infertility. It is predicted that non-essential genes will subject to lesser intensities of purifying selection, and should therefore evolve faster than essential genes.

In a comparison of 74 non-essential genes with 64 essential ones, the rate of substitution was found not to correlate with the severity of the knockout phenotype. To account for differences in function, Hurst and Smith (1999) restricted their analysis exclusively to neuron-specific genes, which have significantly lower rates of substitution than other genes. They could find no difference in the rate of substitution between 16 essential neuron-specific genes and 18 non-essential ones.

The functional role (if any) of ~98% of mammalian genomes remains undetermined. Nóbrega et al. (2004) deleted ~2 Mb-long sequences from the mouse genome, a 1,817,000 region mapping to mouse chromosome 3 and a 983,000 region mapping to chromosome 19. (Orthologous regions of about the same size are present on human chromosomes 1 and 10, respectively.) Viable mice homozygous for the deletions were generated and were indistinguishable from wild-type littermates with regard to morphology, reproductive fitness, growth, longevity, and general homeostasis. Further analysis of the expression of multiple genes bracketing the deletions revealed only minor expression differences between homozygous-deletion mice and wild-type mice.

The two deleted segments harbor 1,243 non-coding sequences conserved between humans and rodents (more than 100 base pairs, 70% identity). Yet, the deletion of so many sequences that have been conserved for such long period of time (mouse-human divergence ≈ 100 million years) resulted in no reduction in fitness. Conclusion I: There are potentially ‘disposable DNA’ in the genomes of mammals. Conclusion II: Sequence conservation may not necessarily indicate constraint.

Ahituv et al. (2007) removed from the mouse genome four ultraconserved elements— sequences of 200 base pairs or longer that are 100% identical among human, mouse, and rat.

Remarkably, lines of mice homozygous for the four deletions were viable and fertile, and failed to reveal any developmental or phenotypic abnormalities.

These results indicate that extreme sequence conservation may not necessarily reflect extreme evolutionary constraint. There must be forces other than selection that promote sequence conservation.