Towards realistic codon models: among site variability and dependency of synonymous and nonsynonymous rates Itay Mayrose Adi Doron-Faigenboim Eran Bacharach.

Slides:



Advertisements
Similar presentations
The genetic code.
Advertisements

Codon models R CGT CGC R D GAC GCC A Synonymous substitution Nonsynonymous substitution.
Protein Synthesis (making proteins)
 -GLOBIN MUTATIONS AND SICKLE CELL DISORDER (SCD) - RESTRICTION FRAGMENT LENGTH POLYMORPHISMS (RFLP)
ATG GAG GAA GAA GAT GAA GAG ATC TTA TCG TCT TCC GAT TGC GAC GAT TCC AGC GAT AGT TAC AAG GAT GAT TCT CAA GAT TCT GAA GGA GAA AAC GAT AAC CCT GAG TGC GAA.
Supplementary Fig.1: oligonucleotide primer sequences.
Gene Mutations Worksheet
Transcription & Translation Worksheet
1 Detecting selection using phylogeny. 2 Evaluation of prediction methods  Comparing our results to experimentally verified sites Positive (hit)Negative.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
The Distribution of Fitness Effects of Mutations in Humans and Flies
Comparative Genome Analysis. Comparative yeast genomics Kellis et al (2003) Nature 423,
1 Functional prediction in proteins (purifying and positive selection)
Introduction to Molecular Biology. G-C and A-T pairing.
Adaptive Molecular Evolution Nonsynonymous vs Synonymous.
1 Essential Computing for Bioinformatics Bienvenido Vélez UPR Mayaguez Lecture 5 High-level Programming with Python Part II: Container Objects Reference:
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Maximum Likelihood Molecular Evolution. Maximum Likelihood The likelihood function is the simultaneous density of the observation, as a function of the.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Comparative Genomics & Annotation The Foundation of Comparative Genomics The main methodological tasks of CG Annotation: Protein Gene Finding RNA Structure.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
GENE MUTATIONS aka point mutations. DNA sequence ↓ mRNA sequence ↓ Polypeptide Gene mutations which affect only one gene Transcription Translation © 2010.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
IGEM Arsenic Bioremediation Possibly finished biobrick for ArsR by adding a RBS and terminator. Will send for sequencing today or Monday.
Nature and Action of the Gene
More on translation. How DNA codes proteins The primary structure of each protein (the sequence of amino acids in the polypeptide chains that make up.
Undifferentiated Differentiated (4 d) Supplemental Figure S1.
Functionality of pack-mule sequences in Rice genome Kousuke Hanada 9/21/’06.
Supplemental Table S1 For Site Directed Mutagenesis and cloning of constructs P9GF:5’ GAC GCT ACT TCA CTA TAG ATA GGA AGT TCA TTT C 3’ P9GR:5’ GAA ATG.
Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.
Rates and Fitness Effects of Mutations Adam Eyre-Walker (University of Sussex)
Introduction to Bioinformatics.
PART 1 - DNA REPLICATION PART 2 - TRANSCRIPTION AND TRANSLATION.
TRANSLATION: information transfer from RNA to protein the nucleotide sequence of the mRNA strand is translated into an amino acid sequence. This is accomplished.
Codon usage bias Ref: Chapter 9 Xuhua Xia dambe.bio.uottawa.ca.
Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.
Estimating evolutionary parameters for Neisseria meningitidis Based on the Czech MLST dataset.
Passing Genetic Notes in Class CC106 / Discussion D by John R. Finnerty.
Markert Biology  Molecules of DNA are composed of long chains of _______.
Supplementary materials
Definitions Mutation – any change in the genetic sequence.
Suppl. Figure 1 APP23 + X Terc +/- Terc +/-, APP23 + X Terc +/- G1Terc -/-, APP23 + X G1Terc -/- G2Terc -/-, APP23 + X G2Terc -/- G3Terc -/-, APP23 + and.
 Molecules of DNA are composed of long chains of _______.
Example 1 DNA Triplet mRNA Codon tRNA anticodon A U A T A U G C G
DNA, RNA and Protein.
THE ROLES OF DNA.
The response of amino acid frequencies to directional mutation pressure in mitochondrial genomes is related to the physical properties of the amino acids.
Protein Synthesis DNA RNA Protein.
Modelling Proteomes.
Pipelines for Computational Analysis (Bioinformatics)
Supplementary information Table-S1 (Xiao)
Sequence – 5’ to 3’ Tm ˚C Genome Position HV68 TMER7 Δ mt. Forward
GENE MUTATIONS aka point mutations © 2016 Paul Billiet ODWS.
Review Sheet: DNA, RNA & Protein Synthesis
Supplementary Figure 1 – cDNA analysis reveals that three splice site alterations generate multiple RNA isoforms. (A) c.430-1G>C (IVS 6) results in 3.
Huntington Disease (HD)
DNA By: Mr. Kauffman.
Tissue-Specific Reduction in Splicing Efficiency of IKBKAP Due to the Major Mutation Associated with Familial Dysautonomia  Math P. Cuajungco, Maire Leyne,
Models of Sequence Evolution
DNA and RNA.
Gene architecture and sequence annotation
What are the Patterns Of Nucleotide Substitution Within Coding and
More on translation.
Transcription You’re made of meat, which is made of protein.
Python.
Pedir alineamiento múltiple
Structure of the 5′ Portion of the Human Plakoglobin Gene
Station 2 Protein Synethsis.
DNA to proteins.
Presentation transcript:

Towards realistic codon models: among site variability and dependency of synonymous and nonsynonymous rates Itay Mayrose Adi Doron-Faigenboim Eran Bacharach & Tal Pupko Travel expenses supported by the Biosapiens project

Models of sequence evolution Describe How characters (nucleotides, amino acids, codons) evolve during evolution  Alignment  Phylogeny  Inference of selection forces

AAAAACCCC AAA0.09 AAC CCC … The probability of changing from codon i to codon j … Codon Models Combine information from both DNA and protein levels AAAAACACAACCCAACACCCACCC AAGAAUACGACUCAGCAUCCGCCU AGAAGCAUAAUCCGACGCCUACUC AGGAGUAUGAUUCGGCGUCUGCUU GAAGACGCAGCCUAAUACUCAUCC GAGGAUGCGGCUUAGUAUUCGUCU GGAGGCGUAGUCUGAUGCUUAUUC GGGGGUGUGGUUUGGUGUUUGUUU

AAAAACCCC AAA0.09 AAC CCC … The probability of changing from codon i to codon j … Codon Models Combine information from both DNA and protein levels AAAAACACAACCCAACACCCACCC AAGAAUACGACUCAGCAUCCGCCU AGAAGCAUAAUCCGACGCCUACUC AGGAGUAUGAUUCGGCGUCUGCUU GAAGACGCAGCCUAAUACUCAUCC GAGGAUGCGGCUUAGUAUUCGUCU GGAGGCGUAGUCUGAUGCUUAUUC GGGGGUGUGGUUUGGUGUUUGUUU Synonymous (silent ) Non-synonymous (amino-acid altering)

Codon Models Combine information from both DNA and protein levels AAAAACACAACCCAACACCCACCC AAGAAUACGACUCAGCAUCCGCCU AGAAGCAUAAUCCGACGCCUACUC AGGAGUAUGAUUCGGCGUCUGCUU GAAGACGCAGCCUAAUACUCAUCC GAGGAUGCGGCUUAGUAUUCGUCU GGAGGCGUAGUCUGAUGCUUAUUC GGGGGUGUGGUUUGGUGUUUGUUU Synonymous (silent ) Non-synonymous (amino-acid altering)  Purifying evolution  Neutral evolution  Positive Darwinian evolution

S1 AAG ACT GCC GGG CGT ATT S2 AAA ACA GCA GGA CGA ATC Purifying selection: Non-synonymous << Synonymous substitutions S1 K T A G R I S2 K T A G R I Histones Detecting selection pressure Synonymous = 6 Non-synonymous = 0

S1 AAG ACT GCC GGG CGT ATT S2 AAA ACA GAC GGA CAT ATG S1 K T A G R I S2 K T D G H M Detecting selection pressure Neutral selection: Non-synonymous = Synonymous substitutions Synonymous = 3 Non-synonymous = 3

S1 AAG ACT GCC GGG CGT ATT S2 AAT ATT GAC GAG CAT ATG S1 K T A G R I S2 N I D E H M Host-pathogen arm-race Detecting selection pressure Positive (Darwinian) selection : Non-synonymous >> Synonymous substitutions Synonymous = 0 Non-synonymous = 6

The Ka/Ks ratio Synonymous substitution rate Non-synonymous substitution rate Assume: Ks = neutral rate of evolution  Purifying selection:Ka/Ks < 1  Neutral selection:Ka/Ks = 1  Positive selection:Ka/Ks > 1

Existing codon models Assume:  Ka varies over sites  Ks is the same for all sites and reflects the neutral rate of evolution Goldman & Yang (1994) Muse & Gaut (1994) Nielsen & Yang (1998) Wong, Sainudiin & Nielsen (2006) Doron-Faigenboim & Pupko (2007)

Existing codon models Assume:  Ka varies over sites  Ks is the same for all sites and reflects the neutral rate of evolution Goldman & Yang (1994) Muse & Gaut (1994) Nielsen & Yang (1998) Wong, Sainudiin & Nielsen (2006) Doron-Faigenboim & Pupko (2007) Model name: KaV-KsC

Existing codon models Assume:  Ka varies over sites  Ks is the same for all sites and reflects the neutral rate of evolution Goldman & Yang (1994) Muse & Gaut (1994) Nielsen & Yang (1998) Wong, Sainudiin & Nielsen (2006) Doron-Faigenboim & Pupko (2007) Ks constant?

Existing codon models Hellmann et al. (2003): Approximately 39% of synonymous sites in primates are subject to purifying selection Assume:  Ka varies over sites  Ks is the same for all sites and reflects the neutral rate of evolution Ks constant?

Selection against silent substitutions RNA stability Exonic splicing regulatory sequences RNA editing Overlapping genes Codon bias and GC content Translational efficiency Protein folding Human GAG GCT GCC GGG CGT ATT Mouse GGC ACT GCC GGG CGT ATT Dog GGG ACT GCC GGG CGT ATT Reviewed in Chamary, Parmley, and Hurst Nature Reviews Genetics (2006)

Evolutionary models for Ks conservation Model name: KaV-KsV Pond & Muse: both Ka and Ks can vary (two independent gamma distributions) Pond and Muse Mol Biol Evol (2005) “Site-to-site variation of synonymous substitution rates”

Evolutionary models for Ks conservation The KaV-KsV model assumes: Each position evolves independently But: Selection is often regional Site-specific Ka and Ks are very erratic

Evolutionary models for Ks conservation The KaV-KsV model assumes: Each position evolves independently But: Selection is often regional Site-specific Ka and Ks are very erratic KaKsKa/Ks True1.0 Estimated

Evolutionary models for Ks conservation The KaV-KsV model assumes: Each position evolves independently But: Selection is often regional Site-specific Ka and Ks are very erratic Our solution: Incorporate site-dependencies

Modeling dependencies among sites Ka at position n depends on the Ka at position n-1 & Ks at position n depends on the Ks at position n-1 Hidden states Observations GGG GGG GAA CTT CTA CTG TCA TCC TAC GCC GCG GCC ATC ATC ATC Ka Ks  Two HMM chains

Modeling dependencies among sites Ka at position n depends on the Ka at position n-1 & Ks at position n depends on the Ks at position n-1 Hidden states Observations GGG GGG GAA CTT CTA CTG TCA TCC TAC GCC GCG GCC ATC ATC ATC Ka Ks  Two HMM chains Model name: KaD-KsD

Models tested KaV-KsC: Variable nonsynonymous Constant synonymous KaV-KsV: Variable nonsynonymous Variable synonymous KaD-KsD: Dependent nonsynonymous Dependent synonymous Comparing the models

For each of the 9 coding genes of HIV-1: Comparing the models Parameters optimization Multiple sequence alignment Phylogenetic tree Model comparison (LRT)

HIV-1 gene Log-likelihood difference from KaV-KsC KaV−KsVKaD−KsD env gag nef pol rev tat vif vpr vpu Difference of 5 log-likelihoods is significant (p < 0.01) HIV-1 data HIV-1 genes exhibit a strong pattern of rate dependency Accounting for Ks variability is extremely justified for all HIV-1 genes

Inferring sites under positive selection KaV-KsC 491 KaV-KsV 295 KaD-KsD The most conservative 2.With the highest overlap with the other models

Inferring sites under positive selection False positive rate True positive rate KaD-KsD KaV-KsV KaV-KsC

Identifying cis regulatory elements 21 stretches in HIV-1 are under significant Ks selection regionFunction Pol DNA flap + cPPT + CTS Pol Overlap Vif Vif Overlap Vpr Nef88-993’ PPT Tat41-51Overlap Rev Env Overlap Tat & Rev Pol7-31? Vif1-21Overlap pol … 17 matched to known functional regions

Conservation of Ks in pol

Conservation of Ks in pol (zoom in) DNA flap cPPT CTS ?

Conservation of Ks in pol (zoom in) cPPT CTS DNA flap

pol-vif overlap pol vif vif and pol overlap but with different reading frames These regions exhibit a substantial reduction of Ks

pol-vif overlap pol vif Site 12 of vif has very high Ks. Why? Site 12

pol-vif overlap pol vif Site 999 Site 12 Site 12 of vif has very high Ks. Why?  Site 999 in pol is under strong positive selection (Ka/Ks = 11.4)

Selection at overlapping regions regionFunction Pol DNA flap + cPPT + CTS Pol Overlap Vif Vif Overlap Vpr Nef88-993’ PPT Tat41-51Overlap Rev Env Overlap Tat & Rev Pol7-31? Vif1-21Overlap Pol … 21 stretches in HIV-1 are under significant Ks selection

Selection at overlapping regions Overlapped regions exhibit significant Ks conservation p-value < 10 -6

Selection at overlapping regions Overlapped regions exhibit significant Ks conservation But: significant Ka variability p-value < 10 -6

Next…  Analyze specific Ks stretches in details  Study Ks selection in other viruses  Examine the extent of Ks selection across different lineages  What is the meaning of the Ka/Ks>1 criterion? How should positive selection be defined?

Next…  Analyze specific Ks stretches in details  Study Ks selection in other viruses  Examine the extent of Ks selection across different lineages  What is the meaning of the Ka/Ks>1 criterion? How should positive selection be defined? Thank you