Evolutionary Change in Sequences Lecture 2 (though L1 slightly too long ... a few slides included in beginning of L2) Changes over long evolutionary times e.g., mouse and human diverged 100 mya (million years ago) Diverged = last shared a common ancestor Study how nucleotide sequences change over time Require (mathematical) models Mostly based on simple probability Have many of the ‘same’ genes, but with slightly different sequences e.g., compare mouse myoglobin to human myoglobin
Jukes and Cantor one-parameter model Simplest model of DNA sequence evolution All substitutions occur with equal probability Only one parameter the rate of change of one nucleotide to another, 3 = rate of change of nucleotide to any other
A G After one unit of time C T purines Probability that A stays as A = 1-3 Probability that A changes to T = pyrimidines C T
Compare number of nucleotide differences between sequences Fewer differences = shorter time Only four possible states (A,C,T,G) Under this simple model, all sequences will retain 25% identity at equilibrium After some time can no longer estimate relationships
Kimura’s two-parameter model Two kinds of nucleotide Purine: A, G Pyrimidine: C, T Model says that a purine is more likely to be replaced with another purine than by a pyrimidine Transitions : replace like with like Transversions: replace purine with pyrimidine, or vice versa Supported by observation: transitions are more common than transversions
Transversion = rate of C T purines A G Transition = rate of Transversion = rate of pyrimidines C T
Number of substitutions between two DNA sequences For two sequences of length N, count the number of differences n n/N = percent identity of the sequences BUT, there is a chance that the same position of the sequence was changed more than once e.g., observe A at position 10 in one sequence, and T in the other A T A C T
Multiple Hits 8 substitution events (arrows) Only 6 differences (dots) Time 8 substitution events (arrows) Only 6 differences (dots)
Multiple Hits When degree of divergence is high: Observe n differences, but these are the result of >> n changes By simply counting the differences one can greatly underestimate the amount of divergence of the sequences
Expected (Molecular clock) Number of differences Observed time
Violations of assumptions of models Rate of substitution is not always the same for all sites Some sites are not independent Interacting sites may require complementary mutations (e.g., in a hairpin structure, in protein 3D structure)
CpG dinucleotides Susceptible to methylation Easily deaminated to give Thymine Results in GT mismatch Example of particular sites with a high mutation rate CpG = C followed by G in the 5’ – 3’ direction This dinucleotide is particularly susceptible to methylation (addition of MH3 group) of the C The methylated C is easily deaminated to T Results in TG mismatch May be corrected to CG, or to TA 5’ ...T G... 3’ 3’ ...G C... 5’
Molecular Coevolution (Inter-protein)
Molecular Coevolution (Intra-protein) Amino acids within proteins do not evolve in isolation but they rather form part of complex intra-protein evolutionary units
Coevolution leads to phylogenetic mirroring
Coevolution results from coadaptation
Models of coevolution Models of molecular covariation help understand how proteins evolve and identify residues that may be functionally or structurally linked