Download presentation
Presentation is loading. Please wait.
Published byCharity Griffith Modified over 9 years ago
1
Transition Bias and Substitution models Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca
2
Xuhua Xia Transition bias refers to the degree by which the s/v ratio deviates from the expected 1/2. The observed s/v ratio is almost always much larger than 1/2. A G C T A G C T A G C T Transitions and Transversions Transition: the substitution of a purine for a purine or a pyrimidine for a pyrimidine. Symbolized by s. Transversion: the substitution of a purine for a pyrimidine or vice versa. Symbolized by v. What is transition bias? Purine Pyrimidine
3
Xuhua Xia Transition Bias is Ubiquitous. Why? For both invertebrate and vertebrate genes: What causes transition bias? –Mutation bias –Selection bias Selection bias in fixation probability Protein-coding genes RNA genes Mutation bias
4
Xuhua Xia Mitochondrial Genetic Code Synonymous and nonsynonymous Degeneracy: –Non-degenerate –Two-fold degenerate –Four-fold degenerate Transitions are synonymous and transversions are nonsynonymous at two-fold degenerate sites.
5
Xuhua Xia RNA secondary structure Seq1: CACGA ||||| GUGCU Seq2: CAUGA ||||| GUGCU Seq1: CACGA ||||| GUGCU Seq2: CGCGA ||||| GUGCU G/U pair, although not as strong as A/U or C/G pair, generally does not disrupt RNA secondary structure (and occurs frequently in RNA secondary structure).
6
Xuhua Xia Causes of transition bias I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of Science, whatever the matter may be." Lord Kelvin: Phys. Letter A, vol. 1, "Electrical Units of Measurement", 1883-05-03
7
Xuhua Xia At Four-fold Degenerate Sites At four-fold degenerate sites, all nucleotide substitutions are synonymous and subject to roughly the same selection pressure (similar fixation probabilities) Glycine codon: GGA GGC GGG GGT Four-fold degenerate site Gly Asn Lys Gly Asp Lys Ala Ala Pro Ala Cys... Fold 4 2 2 2 2 4 4 4 2 S1 GGA AAU AAA GGA GAC AAA GCC GCC CCU GCG UGU... S2 GGG AAC AAA GAA GAU AAG GCC GCU CCA GGG UGG... s s v Glu Gly Trp
8
Xuhua Xia At Nondegenerate Sites Glycine codon: GGA GGC GGG GGT nondegenerate site At nondegenerate sites, all nucleotide substitutions are nonsynonymous and subject to roughly the same selection pressure (similar fixation probabilities) Gly Asn Lys Gly Asp Lys Ala Ala Pro Ala Cys... S1 GGA AAU AAA GGA GAC AAA GCC GCC CCU GCG UGU... S2 GGG AAC AAA GAA GAU AAG GCC GCU CCA GGG UGG... s v Glu Gly Trp
9
Xuhua Xia At Two-fold Degenerate Sites At two-fold degenerate sites, all transitional substitutions are synonymous, and all transversional substitutions are nonsynonymous GAAHis GAGHis GACGln GATGln 2-fold degenerate site A transition is about 40 time as like to become fixed as a transversion. Gly Asn Lys Gly Asp Lys Ala Ala Pro Ala Cys... Fold 4 2 2 2 2 4 4 4 2 S1 GGA AAU AAA GGA GAC AAA GCC GCC CCU GCG UGU... S2 GGG AAC AAA GAA GAU AAG GCC GCU CCA GGG UGG... s s s v Glu Gly Trp
10
Xuhua Xia Methylation and deamination H 3 C- Methyltransferase H 3 C- + Donor Acceptor
11
Xuhua Xia Methylation and DNA Repair in E. coli DNA alphabets: ACGT RNA alphabets: ACGU DNA duplication and Watson-Crick paring rule: A-T, C-G 3’--CTAG----CTAGGTAT----C-----C--CTAG-----------5’ |||| |||||||| ? ? |||| 5’--GATC----GATCCATA----U-----T--GATC-----... 3’ H 3 C H 3 C H 3 C H3CH3C mutS mutH mutL Spacing of GATC: consequences of being too far.
12
Xuhua Xia Methylation-Modification System TGGC*CA AC*CGGT Brevibacterium albidum dsDNA phage Bacterial Genome Restriction enzyme Transcription and Translation Bacterial Membrane ----TGG|CCA--- ----ACC|GGT--- Methylase
13
Xuhua Xia CpG-Specific DNA Methylation Mammalian DNA methyltransferase 1 (DNMT1) –NLS-containing domain –replication foci-directing domain –ZnD, Zn-binding domain –polybromo domain –CatD, the catalytic domain Fatemi, M., A. Hermann, S. Pradhan and A. Jeltsch, 2001 J Mol Biol 309: 1189-99. 1 343 350 613 746 1124 609 748 1110 NlsDZnDCatD CpG mCpG mCpG RFDDPBD 1620
14
Xuhua Xia CpG-Specific DNA Methylation 5’ATGCGA-------CCGA--------ACGGC--TAA 3’ |||||| |||| ||||| 3’TACGCT-------GGCT--------TGCCG--ATT 5’ H3CH3C H3CH3C H3CH3C Fully methylated Hemi-methylated Unmethylated Note: 5’CG3’ = CpG
15
Xuhua Xia Methylation and Gene Regulation Proteins with a methyl-CpG binding domain (MBD) –MBD1, MBD2, and MBD3 –MeCP2 Deacetylases: An enzyme that removes an acetyl group Histone deacetylases: deacetylate lysyl residues in histones (the half life of an acetyl group is ~10min). Acetylation removes a positive charge on the lysine - amino group and promote nucleosome melting (and gene expression). Deacetylation tend to decrease or turn off gene expression. ---mCpG----------------- MBD Histone deacetylase Condensed DNA with repressed transcription Wade, P. A., and A. P. Wolffe, 2001 Nat Struct Biol 8: 575-7. Lysine demethylation
16
Xuhua Xia H3CH3C Methylation and Mutation N N O NH 2 O Cytocine is converted to Thymine methylation Spontaneous deamination N N O H3CH3C O
17
Xuhua Xia Vertebrate mitochondrion
18
Xuhua Xia Spontaneous deamination
19
Xuhua Xia Transversion can erase transitions Transitions can erase transitions, and transversions can erase transversions. However, a transversion can erase many transitions occurring before it, and subsequent transitions cannot erase the transversion: AACGCTTGACG AACGCTTAACG AACGCTTGACG AACGCTTCACG AACGCTTTACG Although a transition could also erase 2n transversions occurring before it, this is rare because transversions are in generally much rarer than transitions. Transitions tend to be missed in counting much more frequently than transversions. AACGCTTGACG AACGCTTTACG AACGCTTAACG AACGCTTGACG
20
Xuhua Xia Summary Selection: Transitions are tolerated more than transversion by natural selection because –they are more likely synonymous in protein-coding sequences than transversions –they are less likely to disrupt RNA secondary structure than transversions. Mutation: Transitional mutation occurs more frequently than transversions because –Misincorporation during DNA replication occur more frequently between two purines or between two pyrimidines than between a purine and a pyrimidine –A purine is more likely to mutate chemically to another purine than to a pyrimidine (e.g., through spontaneous deamination). The same for pyrimidine. Bias in counting: Transitions tend to be missed in counting much more frequently than transversions (which necessitates the substitution models)
21
Xuhua Xia Nucleotide Substitutions ACACTCGGATTAGGCT ATACTCAGGTTAAGCT ACAATCCGGTTAAGCT T C C AGACTCGGATTAGGCT Observed sequences single multiple coincidental parallel convergent back Actual number of changes during the evolution of the two daughter sequences: 12 Observed number of differences between the two daughter sequences: 3. Correcting for multiple substitutions to to estimate the true number of changes, i.e., 12. From WHL
22
Xuhua Xia Substitution models and phylogenetics A substitution model is to model the evolutonary process so as to correct for multiple hits. A phylogenetic reconstruction method implicitly or explicitly assumes a substitution model. A phylogenetic method assuming a wrong substitution model will typically lead to wrong trees produced. An alignment with an inappropriate substitution score matrix will typically lead to inaccurate alignment (e.g., strong transition bias among sequences but a substitution score matrix without strong penalty against transversion) A G C T
23
AGCT Aa 1 a 2 a 3 G a 7 a 4 a 5 C a 8 a 9 a 6 T a 10 a 11 a 12 AGCT Aa 1 G a 2 C a 3 T G a 1 A a 4 C a 5 T C a 2 A a 4 G a 6 T T a 3 A a 5 G a 6 C The diagonal of a transition probability matrix is subject to the constraint that each row sums up to 1. JC69 i = 0.25 a i = c F81/TN84 A, C, G, T a i = c K80 i =0.25 a 1 = a 6 = a 7 = a 12 = a 2 = a 3 = a 4 = a 5 = a 8 = a 9 = a 10 = a 11 = HKY85 A, C, G, T a 1 = a 6 = a 7 = a 12 = a 2 = a 3 = a 4 = a 5 = a 8 = a 9 = a 10 = a 11 = TN93 A, C, G, T a 1 = a 7 = 1 a 6 = a 12 = 2 a 2 = a 3 = a 4 = a 5 = a 8 = a 9 = a 10 =a 11 = GTR Unrestricted: no equilibrium i
24
Xuhua Xia The TN93 model as an example - frequency parameters - rate ratio parameters In addition to illustrated assumptions, it also assumes that the frequency and rate ratio parameters do not change over time, i.e., the substitution process is stationary. A G C T T C A G
25
Xuhua Xia Substitution Models There are three types of substitution models in molecular evolution –Nucleotide-based –Amino acid-based –Codon-based Substitution models are characterized by two categories of parameters: the frequency parameters and the rate ratio parameters, and different models differ by their assumptions concerning these two categories of parameters. Substitution models, substitution score matrix and sequence alignment.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.