Codon models R CGT CGC R D GAC GCC A Synonymous substitution Nonsynonymous substitution
Na ï ve assumption: no selection against synonymous substitutions Selection sequence position rate of synonymous substitutions
Synonymous purifying selection (conservation) Protein folding Splicing regulatory elements mRNA structure Overlapping genes Codon bias Species 1 Species 2 Species 3 T A ACT GCC ACG GCT ACA GCA T A L T S I CTT ACA AGC ATC L T S I G R GGG CGT GGT CGG GGA CGA G R sequence position
How should we model synonymous selection?
Testing for synonymous selection H0: free from synonymous selection → constant Ks H1: under synonymous selection → variable Ks likelihood ratio test
Research objective Quantify and characterize the magnitude and role of synonymous purifying selection
Comparative sequence data S.cerevisiae S.paradoxusS.mikataeS.bayanusS.castelli > 20 million years 70%-90% coding DNA sequence identity
Comparative sequence data 5,135 datasets of multiple sequence alignments + phylogenies (5,182 of ~6,000 S. cerevisiae genes) Obtained from Wapinski et al., Nature 2007 GATCGATTC GATCGATTA GATCGGTCC GCTCGGTCC GATAGACATGATAGACAT ?
Under synonymous selection Not under synonymous selection 54.4% (2,794) 45.6% (2,341)
position Under significant synonymous selection Under synonymous selection Not under synonymous selection 42% (2,154) 45.6% (2,341) 12.4% (640)
Synonymous selection underlies codon bias Different organisms prefer specific codons over others that encode the same amino acid R:S. cerevisiae AGA48% AGG21% CGA7% CGC6% CGG4% CGU14%
Codon bias maintains translational efficiency Translation speed Translation accuracy
Codon adaptation index (CAI) quantifies codon bias Sharp and Li. Nucleic Acids Res, 1987
Genes under synonymous selection are codon biased
GAT CAA AAT TTT GCT TCA TCT GGT GAT CAA AAT TTT GCG TCG TCC GGA GAT CAA AAT TTT GCA TCT TCC GGC GAT CAA ACT TTT GCG TCC TCA GGC Codons under synonymous selection are biased *
Synonymous selection underlies codon bias position
Codon bias (synonymous selection) derives from protein structure Translation speedTranslation accuracy
S. cerevisiae mitochondrial NADP(+)-dependent isocitrate dehydrogenase (PDB: 2QFY) Codon bias at the protein 3D structure
S. cerevisiae mitochondrial NADP(+)-dependent isocitrate dehydrogenase (PDB: 2QFY) codon bias core > codon bias surface
S. cerevisiae mitochondrial NADP(+)-dependent isocitrate dehydrogenase (PDB: 2QFY) codon bias interface > codon bias surface
MDR1 is a member of the ABC transporter family. They pump drugs out of the cell utilizing ATP, which change conformation of the protein. These proteins were shown to induce multi-drug resistance in various cancers.
C3435T is a synonymous SNP that was reported to be a risk factor for several diseases such as Parkinson’s diseases, colon cancer, and renal epithelial tumor. It can be either because: 1.Change in mRNA level 2.Change in splicing 3.Linkage disequilibrium with other causative SNPs 4.Something else
FACS analysis. In purple – cell transfected with empty vector All other colors – cell trasfected with a vector containing MDR1 (various haplotypes) MDR1 pumps the drug (Bodipy) out of the cells. Bodipy
All other colors – cell trasfected with a vector containing MDR1 – various haplotypes The inhibitor works differently on the various haplotypes
Trypsin works differently on the various haplotypes
They showed that synonymous substitutions did not change protein levels but rather the structure. This was shown by differential response to specific antibodies. Important for linking SNPs to diseases.
Conservation of Ks in pol Mayrose et al. Bioinformatics/ISMB (2007)
DNA flap cPPT CTS ? Conservation of Ks in pol (zoom in)
cPPT A This region serves as a primer for the reverse transcriptase in the synthesis of the plus- strand DNA. cPPT
CTS = Central Termination Sequence A The CTS is involved in the nuclear import of the HIV-1 genome. CTS
???? In Pol one region is of unknown function
Kudla et al. showed that the levels of GFP – which is a protein whose gene can easily be inserted into a host genome and its levels can then be easily quantified, are strongly affected by the secondary structure of the 5 ’ end of the mRNA.
Stable mRNANon stable mRNA Non- stable mRNA secondary structure at the 5 ’ end -> higher GFP level.
Mechanism: stable secondary structures at the 5 ’ end of the mRNA obstruct ribosome binding to the mRNA and result with lower protein levels
Based on that we hypothesized that the 5 ’ end of the mRNA should show signals of strong synonymous selection. This is exactly what we found in our yeast data … In addition, we found that the codon bias is reduced at this region, as to allow non- stable mRNA structures.