Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards realistic codon models: among site variability and dependency of synonymous and nonsynonymous rates Itay Mayrose Adi Doron-Faigenboim Eran Bacharach.

Similar presentations


Presentation on theme: "Towards realistic codon models: among site variability and dependency of synonymous and nonsynonymous rates Itay Mayrose Adi Doron-Faigenboim Eran Bacharach."— Presentation transcript:

1 Towards realistic codon models: among site variability and dependency of synonymous and nonsynonymous rates Itay Mayrose Adi Doron-Faigenboim Eran Bacharach & Tal Pupko Travel expenses supported by the Biosapiens project

2 Models of sequence evolution Describe How characters (nucleotides, amino acids, codons) evolve during evolution  Alignment  Phylogeny  Inference of selection forces

3 AAAAACCCC AAA0.09 AAC CCC … The probability of changing from codon i to codon j … Codon Models Combine information from both DNA and protein levels AAAAACACAACCCAACACCCACCC AAGAAUACGACUCAGCAUCCGCCU AGAAGCAUAAUCCGACGCCUACUC AGGAGUAUGAUUCGGCGUCUGCUU GAAGACGCAGCCUAAUACUCAUCC GAGGAUGCGGCUUAGUAUUCGUCU GGAGGCGUAGUCUGAUGCUUAUUC GGGGGUGUGGUUUGGUGUUUGUUU

4 AAAAACCCC AAA0.09 AAC CCC … The probability of changing from codon i to codon j … Codon Models Combine information from both DNA and protein levels AAAAACACAACCCAACACCCACCC AAGAAUACGACUCAGCAUCCGCCU AGAAGCAUAAUCCGACGCCUACUC AGGAGUAUGAUUCGGCGUCUGCUU GAAGACGCAGCCUAAUACUCAUCC GAGGAUGCGGCUUAGUAUUCGUCU GGAGGCGUAGUCUGAUGCUUAUUC GGGGGUGUGGUUUGGUGUUUGUUU Synonymous (silent ) Non-synonymous (amino-acid altering)

5 Codon Models Combine information from both DNA and protein levels AAAAACACAACCCAACACCCACCC AAGAAUACGACUCAGCAUCCGCCU AGAAGCAUAAUCCGACGCCUACUC AGGAGUAUGAUUCGGCGUCUGCUU GAAGACGCAGCCUAAUACUCAUCC GAGGAUGCGGCUUAGUAUUCGUCU GGAGGCGUAGUCUGAUGCUUAUUC GGGGGUGUGGUUUGGUGUUUGUUU Synonymous (silent ) Non-synonymous (amino-acid altering)  Purifying evolution  Neutral evolution  Positive Darwinian evolution

6 S1 AAG ACT GCC GGG CGT ATT S2 AAA ACA GCA GGA CGA ATC Purifying selection: Non-synonymous << Synonymous substitutions S1 K T A G R I S2 K T A G R I Histones Detecting selection pressure Synonymous = 6 Non-synonymous = 0

7 S1 AAG ACT GCC GGG CGT ATT S2 AAA ACA GAC GGA CAT ATG S1 K T A G R I S2 K T D G H M Detecting selection pressure Neutral selection: Non-synonymous = Synonymous substitutions Synonymous = 3 Non-synonymous = 3

8 S1 AAG ACT GCC GGG CGT ATT S2 AAT ATT GAC GAG CAT ATG S1 K T A G R I S2 N I D E H M Host-pathogen arm-race Detecting selection pressure Positive (Darwinian) selection : Non-synonymous >> Synonymous substitutions Synonymous = 0 Non-synonymous = 6

9 The Ka/Ks ratio Synonymous substitution rate Non-synonymous substitution rate Assume: Ks = neutral rate of evolution  Purifying selection:Ka/Ks < 1  Neutral selection:Ka/Ks = 1  Positive selection:Ka/Ks > 1

10 Existing codon models Assume:  Ka varies over sites  Ks is the same for all sites and reflects the neutral rate of evolution Goldman & Yang (1994) Muse & Gaut (1994) Nielsen & Yang (1998) Wong, Sainudiin & Nielsen (2006) Doron-Faigenboim & Pupko (2007)

11 Existing codon models Assume:  Ka varies over sites  Ks is the same for all sites and reflects the neutral rate of evolution Goldman & Yang (1994) Muse & Gaut (1994) Nielsen & Yang (1998) Wong, Sainudiin & Nielsen (2006) Doron-Faigenboim & Pupko (2007) Model name: KaV-KsC

12 Existing codon models Assume:  Ka varies over sites  Ks is the same for all sites and reflects the neutral rate of evolution Goldman & Yang (1994) Muse & Gaut (1994) Nielsen & Yang (1998) Wong, Sainudiin & Nielsen (2006) Doron-Faigenboim & Pupko (2007) Ks constant?

13 Existing codon models Hellmann et al. (2003): Approximately 39% of synonymous sites in primates are subject to purifying selection Assume:  Ka varies over sites  Ks is the same for all sites and reflects the neutral rate of evolution Ks constant?

14 Selection against silent substitutions RNA stability Exonic splicing regulatory sequences RNA editing Overlapping genes Codon bias and GC content Translational efficiency Protein folding Human GAG GCT GCC GGG CGT ATT Mouse GGC ACT GCC GGG CGT ATT Dog GGG ACT GCC GGG CGT ATT Reviewed in Chamary, Parmley, and Hurst Nature Reviews Genetics (2006)

15 Evolutionary models for Ks conservation Model name: KaV-KsV Pond & Muse: both Ka and Ks can vary (two independent gamma distributions) Pond and Muse Mol Biol Evol (2005) “Site-to-site variation of synonymous substitution rates”

16 Evolutionary models for Ks conservation The KaV-KsV model assumes: Each position evolves independently But: Selection is often regional Site-specific Ka and Ks are very erratic

17 Evolutionary models for Ks conservation The KaV-KsV model assumes: Each position evolves independently But: Selection is often regional Site-specific Ka and Ks are very erratic KaKsKa/Ks True1.0 Estimated1.20.81.5

18 Evolutionary models for Ks conservation The KaV-KsV model assumes: Each position evolves independently But: Selection is often regional Site-specific Ka and Ks are very erratic Our solution: Incorporate site-dependencies

19 Modeling dependencies among sites Ka at position n depends on the Ka at position n-1 & Ks at position n depends on the Ks at position n-1 Hidden states Observations 0.71.00.81.3 GGG GGG GAA CTT CTA CTG 0.7 0.80.3 0.10.2 0.1 TCA TCC TAC GCC GCG GCC ATC ATC ATC Ka Ks  Two HMM chains

20 Modeling dependencies among sites Ka at position n depends on the Ka at position n-1 & Ks at position n depends on the Ks at position n-1 Hidden states Observations 0.71.00.81.3 GGG GGG GAA CTT CTA CTG 0.7 0.80.3 0.10.2 0.1 TCA TCC TAC GCC GCG GCC ATC ATC ATC Ka Ks  Two HMM chains Model name: KaD-KsD

21 Models tested KaV-KsC: Variable nonsynonymous Constant synonymous KaV-KsV: Variable nonsynonymous Variable synonymous KaD-KsD: Dependent nonsynonymous Dependent synonymous Comparing the models

22 For each of the 9 coding genes of HIV-1: Comparing the models Parameters optimization Multiple sequence alignment Phylogenetic tree Model comparison (LRT)

23 HIV-1 gene Log-likelihood difference from KaV-KsC KaV−KsVKaD−KsD env9141080 gag362409 nef339380 pol13461565 rev228248 tat214228 vif239279 vpr130154 vpu188197 Difference of 5 log-likelihoods is significant (p < 0.01) HIV-1 data HIV-1 genes exhibit a strong pattern of rate dependency Accounting for Ks variability is extremely justified for all HIV-1 genes

24 Inferring sites under positive selection KaV-KsC 491 KaV-KsV 295 KaD-KsD 206 310 66 13 5 135 41 53 1.The most conservative 2.With the highest overlap with the other models

25 Inferring sites under positive selection False positive rate True positive rate KaD-KsD KaV-KsV KaV-KsC

26 Identifying cis regulatory elements 21 stretches in HIV-1 are under significant Ks selection regionFunction Pol898-947DNA flap + cPPT + CTS Pol986-1003Overlap Vif Vif173-186Overlap Vpr Nef88-993’ PPT Tat41-51Overlap Rev Env728-744Overlap Tat & Rev Pol7-31? Vif1-21Overlap pol … 17 matched to known functional regions

27 Conservation of Ks in pol

28 Conservation of Ks in pol (zoom in) DNA flap cPPT CTS ?

29 Conservation of Ks in pol (zoom in) cPPT CTS DNA flap

30 pol-vif overlap pol vif vif and pol overlap but with different reading frames These regions exhibit a substantial reduction of Ks

31 pol-vif overlap pol vif Site 12 of vif has very high Ks. Why? Site 12

32 pol-vif overlap pol vif Site 999 Site 12 Site 12 of vif has very high Ks. Why?  Site 999 in pol is under strong positive selection (Ka/Ks = 11.4)

33 Selection at overlapping regions regionFunction Pol898-947DNA flap + cPPT + CTS Pol986-1003Overlap Vif Vif173-186Overlap Vpr Nef88-993’ PPT Tat41-51Overlap Rev Env728-744Overlap Tat & Rev Pol7-31? Vif1-21Overlap Pol … 21 stretches in HIV-1 are under significant Ks selection

34 Selection at overlapping regions Overlapped regions exhibit significant Ks conservation p-value < 10 -6

35 Selection at overlapping regions Overlapped regions exhibit significant Ks conservation But: significant Ka variability p-value < 10 -6

36 Next…  Analyze specific Ks stretches in details  Study Ks selection in other viruses  Examine the extent of Ks selection across different lineages  What is the meaning of the Ka/Ks>1 criterion? How should positive selection be defined?

37 Next…  Analyze specific Ks stretches in details  Study Ks selection in other viruses  Examine the extent of Ks selection across different lineages  What is the meaning of the Ka/Ks>1 criterion? How should positive selection be defined? Thank you


Download ppt "Towards realistic codon models: among site variability and dependency of synonymous and nonsynonymous rates Itay Mayrose Adi Doron-Faigenboim Eran Bacharach."

Similar presentations


Ads by Google