Molecular Clocks, Base Substitutions, & Phylogenetic Distances.

Slides:



Advertisements
Similar presentations
Evolution and proteins You can see the effects of evolution, not only in the whole organism, but also in its molecules - DNA and protein For a mutation.
Advertisements

Neutral Theory of Molecular Evolution most base substitutions are selectively neutral drift dominates evolution at the molecular level Under drift, rate.
Evolution of genomes.
1 Number of substitutions between two protein- coding genes Dan Graur.
Genetica per Scienze Naturali a.a prof S. Presciuttini Homologous genes Genes with similar functions can be found in a diverse range of living things.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Chapter 19 Evolutionary Genetics 18 and 20 April, 2004
Classification of Living Things. 2 Taxonomy: Distinguishing Species Distinguishing species on the basis of structure can be difficult  Members of the.
Phylogenetic reconstruction
MOLECULAR EVOLUTION Molecular evolution examines DNA and proteins, addressing two types of questions: How do DNA and proteins evolve? How are genes and.
MAT 4830 Mathematical Modeling 4.4 Matrix Models of Base Substitutions II
Molecular Clock I. Evolutionary rate Xuhua Xia
Molecular Evolution Revised 29/12/06
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
Review of cladistic technique Shared derived (apomorphic) traits are useful in understanding evolutionary relationships Shared primitive (plesiomorphic)
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
From population genetics to variation among species: Computing the rate of fixations.
Some basics: Homology = refers to a structure, behavior, or other character of two taxa that is derived from the same or equivalent feature of a common.
BIOE 109 Summer 2009 Lecture 6- Part II Molecular evolution.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
We have shown that: To see what this means in the long run let α=.001 and graph p:
Genetica per Scienze Naturali a.a prof S. Presciuttini Mutation Rates Ultimately, the source of genetic variation observed among individuals in.
Adaptive Molecular Evolution Nonsynonymous vs Synonymous.
Information theoretic interpretation of PAM matrices Sorin Istrail and Derek Aguiar.
1 Patterns of Substitution and Replacement. 2 3.
Chapter 3 Substitution Patterns Presented by: Adrian Padilla.
Molecular phylogenetics
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
Molecular Clock. Rate of evolution of DNA is constant over time and across lineages Resolve history of species –Timing of events –Relationship of species.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Lecture 3: Markov models of sequence evolution Alexei Drummond.
Models of Molecular Evolution I Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.1 – 7.2.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
The Molecular Clock? By: T. Michael Dodson. Hypothesis For any given macromolecule (a protein or DNA sequence) the rate of evolution is approximately.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
1 Evolutionary Change in Nucleotide Sequences Dan Graur.
Comp. Genomics Recitation 3 The statistics of database searching.
Calculating branch lengths from distances. ABC A B C----- a b c.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
MAT 4830 Mathematical Modeling
Cédric Notredame (08/12/2015) Molecular Evolution Cédric Notredame.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Selectionist view: allele substitution and polymorphism
Phylogeny Ch. 7 & 8.
Lesson Overview 17.4 Molecular Evolution.
NEW TOPIC: MOLECULAR EVOLUTION.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2.
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
Modelling evolution Gil McVean Department of Statistics TC A G.
Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.
Evolutionary Change in Sequences
Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection.
Linkage and Linkage Disequilibrium
Maximum likelihood (ML) method
Lesson Overview 17.4 Molecular Evolution.
In-Text Art, Ch. 16, p. 316 (1).
Distances.
Models of Sequence Evolution
What are the Patterns Of Nucleotide Substitution Within Coding and
Molecular Clocks Rose Hoberman.
Molecular Evolution.
Coral Reef Conservation
Pedir alineamiento múltiple
Chapter 19 Molecular Phylogenetics
5.4 Cladistics.
But what if there is a large amount of homoplasy in the data?
Presentation transcript:

Molecular Clocks, Base Substitutions, & Phylogenetic Distances

Definition: A mutation is a either an exchange within a DNA sequence of one nucleotide for another or indel events. In effect it is a mistake in the replication and repair of DNA. Mutations are divided into three categories: 1. Deleterious – disadvantageous to the survival of the organism. 2. Advantageous – contribute to the continued survival of the organism. 3. Neutral – for example, a third nucleotide change in the coding for valine. Advantageous changes are in the minority. Also, some changes can greatly affect an organism

A deceptively simple, important equation: Where: r = the rate at which substitutions occur K = the number of substitutions two sequences have undergone since they last shared a common ancestor expressed in substitutions per site. T = the divergence time Unfortunately, none of these variables are known. T can be estimated by archaeological evidence, if it exists. K can be approximated by sequence comparison.

Different portions of genes accumulate changes at widely varying rates:

Amino Acids experience different substitution rates. Four-fold Degenerate Sites, those sites where a substitution for one nucleotide by any one of the other three nucleotides does not result in a change of the amino acid, occur most rapidly, i.e. the third site of glycine. Two-fold Degenerate Sites, those where two of the nucleotides result in one amino acid and two result in another, i.e. aspartic acid and glutamic acid, occur less frequently. Nondegenerate Sites, those where a change in this site always results in a change in the amino acid, i.e. almost any of the middle sites in Table 1.1 on p11 of K&R, are the least common.

Natural selection makes it difficult to assess mutation rates for the obvious fact that it has a tendency to eliminate deleterious mutations. Substitutions are mutations that have been filtered through selection. We consider two types of substitutions: Synonymous – those that do not result in a change of the amino acid. Nonsynonymous – those that result in a change of the amino acid. Synonymous changes are less affected by selection and thus are more reflective of the true mutation rate than nonsynonymous changes

Table of synonymous and non synonymous substitution rates for various genes in four mammalian species. See Table 3.3 on page 64 of K&R for identification of the genes.

Because of differences in the selectivity constraints for various substitutions in individual proteins, differences in amino acid replacement between nuclear genes can be quite striking. On the other hand, rates of molecular evolution for loci with similar functional constraints can be quite uniform over long periods of evolutionary time. This observation caused Zukerkandl and Pauling in the 1960’s to suggest that within homologous proteins the substitution rates were so constant that they were like the ticking of a Molecular Clock. While the clock may run at different rates for different proteins, the number of differences between two homologous proteins correlated well with the time since speciation caused them to diverge.

This hypothesis is controversial. Classical evolutionists maintain that the erratic tempo of morphological evolution is inconsistent with a steady rate of molecular change. Furthermore, disagreements regarding the divergence times have also placed in question any uniformity in evolution rates that are promised by a “molecular clock.” See as one example the article on the time of divergence of the human and the chimp. One of the hypotheses there is that humans, because of their longer life span, have a ‘slower’ molecular clock. On the other hand these varying rates can be explained in several different ways and much useful information has been obtained from sequence comparison.

For the moment we will proceed with the assumption of a molecular clock for highly conserved sequences. However, we are not yet out of the woods. For sequences with relatively few substitutions a simple count will provide a reasonable approximation of K. On the other hand, simple counting in sequences with many differences may cause a significant underestimation of the actual number of substitutions. Why? Jukes and Cantor in 1969 developed the first, and most simple, model of nucleotide substitution that will account for the underestimate of simple counting of differences and give a more accurate accounting for the number of substitutions since two sequences last shared a common ancester. In 1980 Kimura developed a more sophisticated model that took into account different rates for transitions and transversions.

To begin, we will investigate the ramifications of the Jukes- Cantor model. This model assumes that a certain proportion of any of the given nucleotides will change during any one evolutionary period and that any one of them is likely to change to any of the other nucleotides without restriction, i.e. with equal probability. This assumption leads to a table that can be expressed in the following way: α = the proportion of a particular nucleotide that changes during any one evolutionary time period.

Reiterating the formula for p implied by the Jukes-Cantor model: We can solve for the elapsed time, t, based on α and p: p can be approximated by the number of observed differences in the two sequences. However, that still leaves us with one equation in two unknowns, α and t. This is not good! Or is it? If we look at a the product αt and think about its meaning for a minute, we see that this product is the number of time steps times the mutation rate or the expected number of substitutions per site during the elapsed time. This includes even those that do not appear in the count of differences, i.e. the “hidden substitutions” (those that eventually resulted in a position once again being occupied by its original nucleotide occupant. We define a new variable d = αt which is called the Jukes-Cantor distance. Notice that this distance is proportional to t.

We are almost where we want to be. We make one last observation: If x is small ln(1 – x)  -x. For example, ln( ) = Thus, since α is very small, we have: This approximation allows us to solve for d, the Jukes-Cantor distance. Multiplying both sides by α, Thus, given two sequences, S 0 and S 1

We conclude with an example: Consider two sequences with 40 sites S 0 : AGCTTCCGATCCGCTATAATCGTTAGTTGTTACACCTCTG S 1 : AGCTTCTGATACGCTATAATCGTGAGTTGTTACATCTCCG Five sites have undergone substitution. Thus p = 5/40 = 1/8 =.125 Thus, This is the expected percentage of changes, i.e. 5.5 is the expected number of substitutions based on the observed differences between the two sequences.