Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4.

Slides:



Advertisements
Similar presentations
Genetica per Scienze Naturali a.a prof S. Presciuttini Homologous genes Genes with similar functions can be found in a diverse range of living things.
Advertisements

EVIDENCE OF EVOLUTION.
D3.7 Evidence for evol part III jackie. Biochemical evidence provided by the universality of DNA and protein structures for the common ancestry of living.
R ATES OF P OINT M UTATION. The rate of mutation = the number of new sequence variants arising in a predefined target region per unit time. Target region.
Chapter 19 Evolutionary Genetics 18 and 20 April, 2004
 Species evolve with significantly different morphological and behavioural traits due to genetic drift and other selective pressures.  Example – Homologous.
Duplication, rearrangement, and mutation of DNA contribute to genome evolution Chapter 21, Section 5.
Molecular Clock I. Evolutionary rate Xuhua Xia
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
The origins & evolution of genome complexity Seth Donoughe Lynch & Conery (2003)
BIOE 109 Summer 2009 Lecture 6- Part II Molecular evolution.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Molecular Clocks, Base Substitutions, & Phylogenetic Distances.
Adaptive Molecular Evolution Nonsynonymous vs Synonymous.
The Evidence for Evolution. Problem: How did the great diversity of life originate? Alternative Solutions: A. All living things were created at the same.
Introduction to Biological Sequences. Background: What is DNA? Deoxyribonucleic acid Blueprint that carries genetic information from one generation to.
Scientific FieldsScientific Fields  Different fields of science have contributed evidence for the theory of evolution  Anatomy  Embryology  Biochemistry.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Chapter 3 Substitution Patterns Presented by: Adrian Padilla.
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
Molecular Clock. Rate of evolution of DNA is constant over time and across lineages Resolve history of species –Timing of events –Relationship of species.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Models of Molecular Evolution I Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.1 – 7.2.
Evolution Chapters Evolution is both Factual and the basis of broader theory What does this mean? What are some factual examples of evolution?
Is heterozygosity common or rare? Sewell Wright Theodosius Dobzhansky.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
The Molecular Clock? By: T. Michael Dodson. Hypothesis For any given macromolecule (a protein or DNA sequence) the rate of evolution is approximately.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Questions. 09_12_Mutation.jpg Gene Evolution Pages
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
1 Genome Composition Dan Graur 2 Genome Composition in Bacteria.
Chapter 24: Molecular and Genomic Evolution CHAPTER 24 Molecular and Genomic Evolution.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Cédric Notredame (08/12/2015) Molecular Evolution Cédric Notredame.
Selectionist view: allele substitution and polymorphism
Table 8.3 & Alberts Fig.1.38 EVOLUTION OF GENOMES C-value paradox: - in certain cases, lack of correlation between morphological complexity and genome.
Chapter 3 The Interrupted Gene.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
NEW TOPIC: MOLECULAR EVOLUTION.
Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2.
1 Codon Usage. 2 Discovering the codon bias 3 In the year 1980 Four researchers from Lyon analyzed ALL published mRNA sequences of more than about 50.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
Ch.10: Principles of Evolution
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
Genes in ActionSection 1 Section 1: Mutation and Genetic Change Preview Bellringer Key Ideas Mutation: The Basis of Genetic Change Several Kinds of Mutations.
Modelling evolution Gil McVean Department of Statistics TC A G.
5.4 Cladistics The images above are both cladograms. They show the statistical similarities between species based on their DNA/RNA. The cladogram on the.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
Chapter 5 Evolution M13.  M13.1 Know that DNA IS UNIVERSAL TO MOST LIVING THINGS  DNA is the fundamental chemical of all living things  All living.
Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. By Chris Paine
1. 2 Discovering the codon bias 3 Il codice genetico è DEGENERATO.
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are.
Section 2: Modern Systematics
Discovering the codon bias
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are both.
Causes of Variation in Substitution Rates
The neutral theory of molecular evolution
Section 2: Modern Systematics
5.4 Cladistics.
Ch.10: Principles of Evolution
What are the Patterns Of Nucleotide Substitution Within Coding and
1. "HARD" Selection can 'cost' a population individuals:
Evolution of eukaryote genomes
Coral Reef Conservation
Evolutionary genetics
Phylogenics & Molecular Clocks
Chapter 6 Clusters and Repeats.
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are both.
Unit Genomic sequencing
Presentation transcript:

Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Isochore structure of vertebrate genomes Why do patterns of base composition – the frequencies of the four bases and of codons used to specify amino acids – differ between genomes? Mean G + C content in bacteria ranges from 25% to 75%, but there is little intragenome variation Genomes of vertebrates have a much greater range of G + C values: Caused by continuous sections (> 300kb) each of which has a uniform G + C content (isochores) G + C content of isochores also varies between species

Properties of vertebrate isochores G + C rich isochores Correlate with reverse Giesma (R) bands Early replicating High density of genes SINEs present CpG islands in genes High G + C content at third codon position High frequency of retroviral sequences High frequency of chiasmata A + T rich isochores Correlate with Giesma (G) bands Late replicating Low density of genes (only tissue specific) LINEs present No CpG islands High A + T content at third codon position Low frequency of retroviral sequences Low frequency of chiasmata

Theories on the existence of isochores Selectionist hypothesis of Bernardi et al. suggests that GC-rich isochores predominantly found in warm- blooded vertebrates are an adaptation to higher body temperature: Extra hydrogen bond in G-C pair may lessen possibility of thermal damage to DNA Desert plants also have higher GC contents Evidence for independent occurrence of isochores since birds and mammals do not share an immediate ancestor However, some thermophilic bacteria are AT-rich

Theories on the existence of isochores Neutralist explanation for the existence of isochores is that they simply reflect variation in the process of mutation across the genome Studies on argininosuccinate synthetase processed pseudogenes from anthropoid primates: Pseudogenes were derived from same functional ancestral gene but then inserted into different parts of the genome Despite their common ancestry, they now differ in base composition Because pseudogenes are not subject to selection, differences in base composition must have been due to regional variation in mutation patterns

Why should mutation patterns vary across genomes? Replication hypothesis suggests that genes which replicate earlier in the cell cycle are more GC-rich than those which replicate later: Believed to be due to the fact that G and C precursor pools of dNTPs are larger at this time – errors are more likely to incorporate G or C Repair hypothesis is based on assumption that efficiency of DNA repair varies across genome: May be an outcome of transcriptionally active areas being repaired more efficiently CpG islands are maintained by a special repair system – efficiency of DNA replication may be dependent on location

Why should mutation patterns vary across genomes? Recombination hypothesis claims that isochore structure of vertebrate genomes is the outcome of differences in the pattern and frequency of recombination: Low GC localities will be associated with regions of reduced recombination: — Genes with low rates of recombination have low GC values — The large, non-recombining region of the Y-chromosome has a low GC composition Fact that recombination plays such a large part in the structuring of eukaryote genomes makes this an attractive hypothesis Although the relative contributions of these hypotheses are still unclear, the neutralist interpretation seems more likely

Codon usage CGA CGC CGG CGU AGA AGG CUA CAC CUG CUU UUA UUG E. coli Human ARG LEU

What determines codon usage? Degeneracy of genetic code: Null hypothesis is that all codons for a particular amino acid are used with equal frequency Refuted when nucleotide sequences became available for a wide range of organisms Selectionist argument: Highly expressed genes show most codon bias because they require more translational efficiency: coevolution of tRNAs and codons Also supports the neutralist prediction of a relationship between functional constraint and substitution rate

Gene expression and codon bias Highly expressed genes Strong selection for translational efficiency Restricted tRNAs used Strong codon bias Low rate of synonymous substitution (few neutral mutations) Lowly expressed genes Weak selection for translational efficiency More tRNAs used Weak codon bias High rate of synonymous substitution (many neutral mutations)

The molecular clock Idea of a molecular clock is central to the neutralist theory, since it demonstrates the constancy of the underlying neutral mutation rate Previous example of  -globin Does not imply that all genes and proteins evolve at the same rate: Great variation between proteins (fibrinonectins vs. histones) Variation in rate among genes and proteins is compatible with the neutral theory if the underlying cause is changes in selective constraint Key question concerning the validity of a molecular clock is whether rates of substitution are constant within genes across evolutionary time

Neutral theory and the molecular clock Rate of nucleotide substitution (fixation) at any site per year, k, in a diploid population of size 2N is equal to the number of new mutations (neutral, deleterious or advantageous) arising per year, , multiplied by their probability of fixation, u: k = 2N  u For a neutral mutation, probability of fixation is reciprocal of population size: u = 1/2N So substitution rate for a neutral mutation is: k = (2N )(1/2N ) 

Neutral theory and the molecular clock (continued) Parameters for population size (2N) cancel out, leaving: k =  One of the most important formulae in molecular evolution – means that rate of substitution in neutral mutations is dependent only on underlying mutation rate and is independent of other factors such as population size Also holds for mutants with a very weak selective advantage e.g. s < 1/2N e

Substitution of selectively advantageous mutations Probability of fixation is roughly twice the selection coefficient: u = 2sN e /N Substituting this into the original equation, we get: k = 4N e s  In this case, substitution rate for an advantageous mutation also depends on population size and magnitude of selective advantage For natural selection to produce a molecular clock, it is necessary for N e, s and  (combination of ecological, mutational and selective events) to be the same across evolutionary time – highly unlikely!

Constancy of the molecular clock Neutral theory predicted a molecular clock and first protein sequence data appeared to confirm this: led Kimura to cite this as the best evidence for neutrality As more comparative sequence data became available, particularly from mammals, examples of rate variation began to appear Debate arose concerning the constancy of the molecular clock

Testing the molecular clock Dispersion index R(t): test whether there is more rate variation between lineages than expected under a Poisson process: If the data fit a Poisson process, variance in number of substitutions between lineages should be no greater than the mean number If the data fit a Poisson process then R(t) = 1.0, if not then R(t) > 1.0 and the clock is said to be overdispersed A star phylogeny should be used, since any phylogenetic structure will complicate the calculations (e.g. placental mammals)

Testing the molecular clock Mammalian protein data presented a serious problem for neutralists Problems most likely due to inaccuracies in phylogenies: “Outlier” in data was guinea pig Guinea pig is much more divergent than previously thought Protein Haemoglobin  Haemoglobin  Myoglobin Cytochrome c Ribonuclease  -Crystallin Species (n) Amino acids R(t)

The relative rate test The relative rate test compares the difference between the numbers of substitutions between two closely related taxa in comparison with a third, more distantly related outgroup If A and B have evolved according to a molecular clock, both should be equidistant from C d AC = d BC A and B must be closest relatives and C must not be too far removed ABC X

The relative rate test Synonymous sites in nine nuclear genes (3520 bp): d 12 = 6.7 d 13 – d 23 = 2.3 ± 0.6  -globin pseudogene (1827 bp): d 12 = 7.9 d 13 – d 23 = 1.5 ± 0.4 Three introns (3376 bp): d 12 = 6.9 d 13 – d 23 = 1.0 ± 0.5 Two flanking regions (936 bp): d 12 = 7.9 d 13 – d 23 = 3.1 ± Old World monkeyHuman New World monkey

Lineage effects and the molecular clock Substitution rate varies with underlying neutral mutation rate: k =  Three ways for rates to vary between species: Differences in generation time Differences in metabolic rate Differences in efficiency of DNA repair These are known as lineage effects: neutralists believe that lineage effects alone can account for all variation in molecular clock Selectionists believe that genes also show rate variation due to other, selection-driven factors (residue effects)

Generation time and the molecular clock Time

At the molecular level, generation time (g) can be defined as time it takes for germ-line DNA to replicate i.e. from one gamete to the next Since most mutations occur at this point, rate of substitution under neutral theory is a function of both mutation rate and generation time: k =  /g General conclusion from molecular data is that the clock is generation time dependent at silent sites and in non-coding DNA: Silent rates in orang-utan, gorilla and chimp are 1.3-, 2.2- and 1.2-fold faster than in humans, which matches differences in generation times

The metabolic rate hypothesis In sharks, rate of silent change is five- to sevenfold lower than in primates and ungulates which have similar generation times: Led to the hypothesis that differences in molecular rate are a better explanation for differences in mutation rates than differences in generation time (metabolic rate hypothesis) States that organisms with high metabolic rates have higher levels of DNA synthesis Two pieces of mitochondrial DNA evidence support this: Small bodied animals, which have higher metabolic rates, tend to have higher mutation rates Warm-blooded animals also have higher mutation rates than cold-blooded animals

Relationship between body mass and sequence evolution ,000100, % sequence divergence per Myr Body mass (kg) Rodents Geese Dogs Primates Horses Bears Whales Newts Frogs Tortoises Salmon Sea turtles Sharks

DNA repair and mutation DNA DirectdamageReplicationerrors Repair Incorrectlyrepaired Correctlyrepaired Mutation

DNA repair and mutation Repair mechanisms are extremely complex and there are many repair pathways There is some evidence supporting the hypothesis that DNA repair influences mutation rate: Evidence that highly transcribed genes are more efficiently repaired Base composition and substitution rates at silent sites in mammalian genes tends to be gene- rather than species- specific: suggests that homologous genes are transcribed and repaired in a similar manner Conversely, closely related species such as hominind primates, which share very similar repair mechanisms, can exhibit greatly differing substitution rates