Towards realistic codon models: among site variability and dependency of synonymous and nonsynonymous rates Itay Mayrose Adi Doron-Faigenboim Eran Bacharach & Tal Pupko Travel expenses supported by the Biosapiens project
Models of sequence evolution Describe How characters (nucleotides, amino acids, codons) evolve during evolution Alignment Phylogeny Inference of selection forces
AAAAACCCC AAA0.09 AAC CCC … The probability of changing from codon i to codon j … Codon Models Combine information from both DNA and protein levels AAAAACACAACCCAACACCCACCC AAGAAUACGACUCAGCAUCCGCCU AGAAGCAUAAUCCGACGCCUACUC AGGAGUAUGAUUCGGCGUCUGCUU GAAGACGCAGCCUAAUACUCAUCC GAGGAUGCGGCUUAGUAUUCGUCU GGAGGCGUAGUCUGAUGCUUAUUC GGGGGUGUGGUUUGGUGUUUGUUU
AAAAACCCC AAA0.09 AAC CCC … The probability of changing from codon i to codon j … Codon Models Combine information from both DNA and protein levels AAAAACACAACCCAACACCCACCC AAGAAUACGACUCAGCAUCCGCCU AGAAGCAUAAUCCGACGCCUACUC AGGAGUAUGAUUCGGCGUCUGCUU GAAGACGCAGCCUAAUACUCAUCC GAGGAUGCGGCUUAGUAUUCGUCU GGAGGCGUAGUCUGAUGCUUAUUC GGGGGUGUGGUUUGGUGUUUGUUU Synonymous (silent ) Non-synonymous (amino-acid altering)
Codon Models Combine information from both DNA and protein levels AAAAACACAACCCAACACCCACCC AAGAAUACGACUCAGCAUCCGCCU AGAAGCAUAAUCCGACGCCUACUC AGGAGUAUGAUUCGGCGUCUGCUU GAAGACGCAGCCUAAUACUCAUCC GAGGAUGCGGCUUAGUAUUCGUCU GGAGGCGUAGUCUGAUGCUUAUUC GGGGGUGUGGUUUGGUGUUUGUUU Synonymous (silent ) Non-synonymous (amino-acid altering) Purifying evolution Neutral evolution Positive Darwinian evolution
S1 AAG ACT GCC GGG CGT ATT S2 AAA ACA GCA GGA CGA ATC Purifying selection: Non-synonymous << Synonymous substitutions S1 K T A G R I S2 K T A G R I Histones Detecting selection pressure Synonymous = 6 Non-synonymous = 0
S1 AAG ACT GCC GGG CGT ATT S2 AAA ACA GAC GGA CAT ATG S1 K T A G R I S2 K T D G H M Detecting selection pressure Neutral selection: Non-synonymous = Synonymous substitutions Synonymous = 3 Non-synonymous = 3
S1 AAG ACT GCC GGG CGT ATT S2 AAT ATT GAC GAG CAT ATG S1 K T A G R I S2 N I D E H M Host-pathogen arm-race Detecting selection pressure Positive (Darwinian) selection : Non-synonymous >> Synonymous substitutions Synonymous = 0 Non-synonymous = 6
The Ka/Ks ratio Synonymous substitution rate Non-synonymous substitution rate Assume: Ks = neutral rate of evolution Purifying selection:Ka/Ks < 1 Neutral selection:Ka/Ks = 1 Positive selection:Ka/Ks > 1
Existing codon models Assume: Ka varies over sites Ks is the same for all sites and reflects the neutral rate of evolution Goldman & Yang (1994) Muse & Gaut (1994) Nielsen & Yang (1998) Wong, Sainudiin & Nielsen (2006) Doron-Faigenboim & Pupko (2007)
Existing codon models Assume: Ka varies over sites Ks is the same for all sites and reflects the neutral rate of evolution Goldman & Yang (1994) Muse & Gaut (1994) Nielsen & Yang (1998) Wong, Sainudiin & Nielsen (2006) Doron-Faigenboim & Pupko (2007) Model name: KaV-KsC
Existing codon models Assume: Ka varies over sites Ks is the same for all sites and reflects the neutral rate of evolution Goldman & Yang (1994) Muse & Gaut (1994) Nielsen & Yang (1998) Wong, Sainudiin & Nielsen (2006) Doron-Faigenboim & Pupko (2007) Ks constant?
Existing codon models Hellmann et al. (2003): Approximately 39% of synonymous sites in primates are subject to purifying selection Assume: Ka varies over sites Ks is the same for all sites and reflects the neutral rate of evolution Ks constant?
Selection against silent substitutions RNA stability Exonic splicing regulatory sequences RNA editing Overlapping genes Codon bias and GC content Translational efficiency Protein folding Human GAG GCT GCC GGG CGT ATT Mouse GGC ACT GCC GGG CGT ATT Dog GGG ACT GCC GGG CGT ATT Reviewed in Chamary, Parmley, and Hurst Nature Reviews Genetics (2006)
Evolutionary models for Ks conservation Model name: KaV-KsV Pond & Muse: both Ka and Ks can vary (two independent gamma distributions) Pond and Muse Mol Biol Evol (2005) “Site-to-site variation of synonymous substitution rates”
Evolutionary models for Ks conservation The KaV-KsV model assumes: Each position evolves independently But: Selection is often regional Site-specific Ka and Ks are very erratic
Evolutionary models for Ks conservation The KaV-KsV model assumes: Each position evolves independently But: Selection is often regional Site-specific Ka and Ks are very erratic KaKsKa/Ks True1.0 Estimated
Evolutionary models for Ks conservation The KaV-KsV model assumes: Each position evolves independently But: Selection is often regional Site-specific Ka and Ks are very erratic Our solution: Incorporate site-dependencies
Modeling dependencies among sites Ka at position n depends on the Ka at position n-1 & Ks at position n depends on the Ks at position n-1 Hidden states Observations GGG GGG GAA CTT CTA CTG TCA TCC TAC GCC GCG GCC ATC ATC ATC Ka Ks Two HMM chains
Modeling dependencies among sites Ka at position n depends on the Ka at position n-1 & Ks at position n depends on the Ks at position n-1 Hidden states Observations GGG GGG GAA CTT CTA CTG TCA TCC TAC GCC GCG GCC ATC ATC ATC Ka Ks Two HMM chains Model name: KaD-KsD
Models tested KaV-KsC: Variable nonsynonymous Constant synonymous KaV-KsV: Variable nonsynonymous Variable synonymous KaD-KsD: Dependent nonsynonymous Dependent synonymous Comparing the models
For each of the 9 coding genes of HIV-1: Comparing the models Parameters optimization Multiple sequence alignment Phylogenetic tree Model comparison (LRT)
HIV-1 gene Log-likelihood difference from KaV-KsC KaV−KsVKaD−KsD env gag nef pol rev tat vif vpr vpu Difference of 5 log-likelihoods is significant (p < 0.01) HIV-1 data HIV-1 genes exhibit a strong pattern of rate dependency Accounting for Ks variability is extremely justified for all HIV-1 genes
Inferring sites under positive selection KaV-KsC 491 KaV-KsV 295 KaD-KsD The most conservative 2.With the highest overlap with the other models
Inferring sites under positive selection False positive rate True positive rate KaD-KsD KaV-KsV KaV-KsC
Identifying cis regulatory elements 21 stretches in HIV-1 are under significant Ks selection regionFunction Pol DNA flap + cPPT + CTS Pol Overlap Vif Vif Overlap Vpr Nef88-993’ PPT Tat41-51Overlap Rev Env Overlap Tat & Rev Pol7-31? Vif1-21Overlap pol … 17 matched to known functional regions
Conservation of Ks in pol
Conservation of Ks in pol (zoom in) DNA flap cPPT CTS ?
Conservation of Ks in pol (zoom in) cPPT CTS DNA flap
pol-vif overlap pol vif vif and pol overlap but with different reading frames These regions exhibit a substantial reduction of Ks
pol-vif overlap pol vif Site 12 of vif has very high Ks. Why? Site 12
pol-vif overlap pol vif Site 999 Site 12 Site 12 of vif has very high Ks. Why? Site 999 in pol is under strong positive selection (Ka/Ks = 11.4)
Selection at overlapping regions regionFunction Pol DNA flap + cPPT + CTS Pol Overlap Vif Vif Overlap Vpr Nef88-993’ PPT Tat41-51Overlap Rev Env Overlap Tat & Rev Pol7-31? Vif1-21Overlap Pol … 21 stretches in HIV-1 are under significant Ks selection
Selection at overlapping regions Overlapped regions exhibit significant Ks conservation p-value < 10 -6
Selection at overlapping regions Overlapped regions exhibit significant Ks conservation But: significant Ka variability p-value < 10 -6
Next… Analyze specific Ks stretches in details Study Ks selection in other viruses Examine the extent of Ks selection across different lineages What is the meaning of the Ka/Ks>1 criterion? How should positive selection be defined?
Next… Analyze specific Ks stretches in details Study Ks selection in other viruses Examine the extent of Ks selection across different lineages What is the meaning of the Ka/Ks>1 criterion? How should positive selection be defined? Thank you